Difference between revisions of "General troubleshooting"

From ArchWiki
Jump to: navigation, search
(Session permissions: polkit)
(See also: Added urls that were in the "See also" 3rd level subsection above.)
 
(123 intermediate revisions by 24 users not shown)
Line 1: Line 1:
 
[[Category:System administration]]
 
[[Category:System administration]]
 
[[Category:System recovery]]
 
[[Category:System recovery]]
{{stub}}
+
[[Category:Getting and installing Arch]]
 +
[[es:General troubleshooting]]
 +
[[ja:一般的なトラブルシューティング]]
 +
[[ru:General troubleshooting]]
 +
{{Related articles start}}
 +
{{Related|Reporting bug guidelines}}
 +
{{Related|Step-by-step debugging guide}}
 +
{{Related|Debug - Getting Traces}}
 +
{{Related|IRC Collaborative Debugging}}
 +
{{Related articles end}}
  
 
This article explains some methods for general troubleshooting. For application specific issues, please reference the particular wiki page for that program.
 
This article explains some methods for general troubleshooting. For application specific issues, please reference the particular wiki page for that program.
  
== Attention To Detail ==
+
== General procedures ==
In order to resolve an issue that you're having with [[Main_Page|Arch Linux]], it is ''absolutely crucial'' to have a firm understanding of how that specific system functions. How it works, and what does it need to run without error? If you cannot comfortably answer these question then it is strongly advised that you review the [[Main_Page|Archwiki]] article for the application/service that you are having troubles with.Once you feel like you've understood the specific system, it will be easier for you to pin-point the problem. Saying, ''"Program X doesn't work"'' is unacceptable. Precision is key.
 
  
The following gives a number of questions for you to ask yourself whenever dealing with a malfunctioning system. Under each question there are notes explaining how you should be answering each question, followed by some light examples on how to easily gather data output and what tools can be used to review logs and the journal.
+
=== Attention to detail ===
  
== Questions / Checklist ==
+
In order to resolve an issue that you are having, it is ''absolutely crucial'' to have a firm basic understanding of how that specific subsystem functions. How does it work, and what does it need to run without error? If you cannot comfortably answer these question then you would best review the [[Table of contents|Archwiki]] article for the subsystem that you are having trouble with. Once you feel like you've understood it, it will be easier for you to pinpoint the cause of the problem.
;1. What <u>is</u> the issue(s)?:Be ''<u>as precise as possible</u>''. This will help you not get confused and/or side-tracked when looking up specific information.
 
;2. Are there error messages? (if any):Copy and paste <u>''full outputs''</u> that contain '''error messages''' related to your issue into a separate file, such as {{ic|$HOME/issue.log}}. For example, to forward the output of the following [[mkinitcpio]] command to {{ic|$HOME/issue.log}}:
 
$ mkinitcpio -p linux >> $HOME/issue.log''
 
;3. Can you reproduce the issue?:If so, give ''exact'' '''step-by-step''' instructions/commands needed to do so.
 
;4. When did you first encounter these issues and what was changed between then and when the system was operating without error?:If it occurred right after an update then, list '''<u>all packages that were updated</u>'''. Include ''version numbers'', also, paste the entire update from [[pacman]].log ({{ic|/var/log/pacman.log}}). Also take note of the statuses of ''any'' service(s) needed to support the malfunctioning application(s) using [[systemd]]'s systemctl tools. For example, to forward the output of the following [[Systemd#Basic_systemctl_usage|systemd]] command to {{ic|$HOME/issue.log}}:
 
$ systemctl status dhcpcd@eth0.service >> $HOME/issue.log
 
{{Note|Using {{ic|'''>>'''}} will ensure any previous text in {{ic|$HOME/issue.log}} will not be overwritten.}}
 
  
== Remember ==
+
=== Questions/checklist ===
; When attempting to resolve an issue, '''never''' approach it as:Application '''X''' does not work.
 
; Instead, look at it in its entirety:Application '''X''' produces '''Y''' error(s) when performing '''Z''' tasks under conditions '''A''' and '''B''
 
  
== Additional Support ==
+
The following gives a number of questions for you whenever dealing with a malfunctioning system. Under each question there are notes explaining how you should be answering each question, followed by some light examples on how to easily gather data output and what tools can be used to review logs and the journal.
With all the information in front of you. You should have a good idea as to what is going on with the system.
 
And you can now start working on a proper fix.
 
  
If you require any additional support, it can be found at irc.freenode.net #archlinux
+
# What is the issue(s)?
 +
#: Be ''as precise as possible''. This will help you not get confused and/or side-tracked when looking up specific information.
 +
# Are there error messages? (if any)
 +
#: Copy and paste ''full outputs'' that contain '''error messages''' related to your issue into a separate file, such as {{ic|$HOME/issue.log}}. For example, to forward the output of the following [[mkinitcpio]] command to {{ic|$HOME/issue.log}}:
 +
#: {{bc|$ mkinitcpio -p linux >> $HOME/issue.log''}}
 +
# Can you reproduce the issue?
 +
#: If so, give ''exact'' '''step-by-step''' instructions/commands needed to do so.
 +
# When did you first encounter these issues and what was changed between then and when the system was operating without error?
 +
#:If it occurred right after an update then, list '''all packages that were updated'''. Include ''version numbers'', also, paste the entire update from [[pacman]].log ({{ic|/var/log/pacman.log}}). Also take note of the statuses of ''any'' service(s) needed to support the malfunctioning application(s) using [[systemd]]'s systemctl tools. For example, to forward the output of the following [[Systemd#Basic_systemctl_usage|systemd]] command to {{ic|$HOME/issue.log}}:
 +
#: {{bc|$ systemctl status dhcpcd@eth0.service >> $HOME/issue.log}}
 +
#: {{Note|Using {{ic|'''>>'''}} will ensure any previous text in {{ic|$HOME/issue.log}} will not be overwritten.}}
  
==Session permissions==
+
=== Approach ===
{{Note|You must be using [[systemd]] as your init system for local sessions to work - which is required for polkit permissions and ACLs for various devices (see {{ic|/usr/lib/udev/rules.d/70-uaccess.rules}})}}
+
 
 +
Rather than approaching an issue by stating,
 +
 
 +
''Application X does not work.''
 +
 
 +
you will find it more helpful to formulate your issue in the context of the system as a whole, as:
 +
 
 +
''Application X produces Y error(s) when performing Z tasks under conditions A and B.
 +
 
 +
=== Additional support ===
 +
 
 +
With all the information in front of you you should have a good idea as to what is going on with the system and you can now start working on a proper fix.
 +
 
 +
If you require any additional support, it can be found on [https://bbs.archlinux.org the forums] or IRC at irc.freenode.net #archlinux See [[IRC channels]] for other options.
 +
 
 +
When asking for support post the '''complete''' output/logs, not just what you think are the significant sections. Sources of information include:
 +
 
 +
* Full output of any command involved - don't just select what you think is relevant.
 +
* Output from systemd's {{ic|journalctl}}. For more extensive output, use the {{ic|1=systemd.log_level=debug}} boot parameter.
 +
* Log files (have a look in {{ic|/var/log}})
 +
* Relevant configuration files
 +
* Drivers involved
 +
* Versions of packages involved
 +
* Kernel: {{ic|dmesg}}. For a boot problem, at least the last 10 lines displayed, preferably more
 +
* Networking: Exact output of commands involved, and any configuration files
 +
* Xorg: {{ic|/var/log/Xorg.0.log}}, and prior logs if you have overwritten the problematic one
 +
* Pacman: If a recent upgrade broke something, look in {{ic|/var/log/pacman.log}}
 +
 
 +
One of the better ways to post this information is to use an online pastebin. You can [[install]] the {{pkg|pbpst}} or {{pkg|gist}} package to automatically upload information. For example, to upload the content of your systemd journal from this boot you would do:
 +
 
 +
# journalctl -xb | pbpst -S
 +
 
 +
A link will then be output that you can paste to the forum or IRC.
 +
 
 +
Additionally, before posting your question, you may wish to review [http://www.catb.org/esr/faqs/smart-questions.html how to ask smart questions]. See also [[Code of conduct]].
 +
 
 +
== Boot problems ==
 +
 
 +
Diagnosing errors during the [[boot process]] involves changing the [[kernel parameters]], and rebooting the system.
 +
 
 +
If booting the system is not possible, boot from a [https://www.archlinux.org/download/ live image] and [[change root]] to the existing system.
 +
 
 +
=== Console messages ===
 +
 
 +
After the boot process, the screen is cleared and the login prompt appears, leaving users unable to read init output and error messages. This default behavior may be modified using methods outlined in the sections below.
 +
 
 +
Note that regardless of the chosen option, kernel messages can be displayed for inspection after booting by using {{ic|dmesg}} or all logs from the current boot with {{ic|journalctl -b}}.
 +
 
 +
==== Flow control ====
 +
 
 +
This is basic management that applies to most terminal emulators, including virtual consoles (vc):
 +
 
 +
* Press {{ic|Ctrl+S}} to pause the output
 +
* And {{ic|Ctrl+Q}} to resume it
 +
 
 +
This pauses not only the output, but also programs which try to print to the terminal, as they will block on the {{ic|write()}} calls for as long as the output is paused. If your ''init'' appears frozen, make sure the system console is not paused.
 +
 
 +
To see error messages which are already displayed, see [[Getty#Have boot messages stay on tty1]].
 +
 
 +
==== Scrollback ====
 +
 
 +
Scrollback allows the user to go back and view text which has scrolled off the screen of a text console. This is made possible by a buffer created between the video adapter and the display device called the scrollback buffer.  By default, the key combinations of {{ic|Shift+PageUp}} and {{ic|Shift+PageDown}} scroll the buffer up and down.
 +
 
 +
If scrolling up all the way does not show you enough information, you need to expand your scrollback buffer to hold more output. This is done by tweaking the kernel's framebuffer console (fbcon) with the [[kernel parameter]] {{ic|1=fbcon=scrollback:Nk}} where {{ic|N}} is the desired buffer size is kilobytes. The default size is 32k.
 +
 
 +
If this does not work, your framebuffer console may not be properly enabled. Check the [https://www.kernel.org/doc/Documentation/fb/fbcon.txt Framebuffer Console documentation] for other parameters, e.g. for changing the framebuffer driver.
 +
 
 +
==== Debug output ====
 +
 
 +
Most kernel messages are hidden during boot. You can see more of these messages by adding different kernel parameters. The simplest ones are:
 +
 
 +
* {{ic|debug}} enables debug messages for both the kernel and [[systemd]]
 +
* {{ic|ignore_loglevel}} forces ''all'' kernel messages to be printed
 +
 
 +
Other parameters you can add that might be useful in certain situations are:
 +
* {{ic|1=earlyprintk=vga,keep}} prints kernel messages very early in the boot process, in case the kernel would crash before output is shown. You must change {{ic|vga}} to {{ic|efi}} for [[EFI]] systems
 +
* {{ic|1=log_buf_len=16M}} allocates a larger (16MB) kernel message buffer, to ensure that debug output is not overwritten
 +
 
 +
There are also a number of separate debug parameters for enabling debugging in specific subsystems e.g. {{ic|bootmem_debug}}, {{ic|sched_debug}}. Check the [https://www.kernel.org/doc/Documentation/kernel-parameters.txt kernel parameter documentation] for specific information.
 +
 
 +
{{Note|If you cannot scroll back far enough to view the desired boot output, you should increase the size of the [[#Scrollback|scrollback buffer]].}}
 +
 
 +
=== Recovery shells ===
 +
 
 +
Getting an interactive shell at some stage in the boot process can help you pinpoint exactly where and why something is failing. There are several kernel parameters for doing so, but they all launch a normal shell which you can {{ic|exit}} to let the kernel resume what it was doing:
 +
* {{ic|rescue}} launches a shell shortly after the root filesystem is remounted read/write
 +
* {{ic|emergency}} launches a shell even earlier, before most filesystems are mounted
 +
* {{ic|1=init=/bin/sh}} (as a last resort) changes the init program to a root shell. {{ic|rescue}} and {{ic|emergency}} both rely on [[systemd]], but this should work even if ''systemd'' is broken
 +
 
 +
Another option is systemd's debug-shell which adds a root shell on {{ic|tty9}} (accessible with Ctrl+Alt+F9). It can be enabled by either adding {{ic|systemd.debug-shell}} to the [[kernel parameters]], or by [[enabling]] {{ic|debug-shell.service}}. Take care to disable the service when done to avoid the security risk of leaving a root shell open on every boot.
 +
 
 +
=== Blank screen with Intel video ===
 +
 
 +
This is most likely due to a problem with [[kernel mode setting]]. Try [[Kernel mode setting#Disabling modesetting|disabling modesetting]] or changing the [[Intel#KMS Issue: console is limited to small area|video port]].
 +
 
 +
=== Stuck while loading the kernel ===
 +
 
 +
Try disabling ACPI by adding the {{ic|1=acpi=off}} kernel parameter.
 +
 
 +
=== Debugging kernel modules ===
 +
 
 +
See [[Kernel modules#Obtaining information]].
 +
 
 +
=== Debugging hardware ===
 +
 
 +
* You can display extra debugging information about your hardware by following [[udev#Debug output]].
 +
* Ensure that [[Microcode]] updates are applied on your system.
 +
* Test your device's RAM with [http://www.memtest.org/ Memtest86+]. Unstable RAM may lead to some extremely odd issues, ranging from random crashes to data corruption.
 +
 
 +
== Kernel panics ==
 +
 
 +
A ''kernel panic'' occurs when the Linux kernel enters an unrecoverable failure state. The state typically originates from buggy hardware drivers resulting in the machine being deadlocked, non-responsive, and requiring a reboot. Just prior to deadlock, a diagnostic message is generated, consisting of: the ''machine state'' when the failure ocurred, a ''call trace'' leading to the kernel function that recognized the failure, and a listing of currently loaded modules. Thankfully, kernel panics don't happen very often using ''mainline'' versions of the kernel--such as those supplied by the official repositories--but when they do happen, you need to know how to deal with them.
 +
 +
{{Note|Kernel panics are sometimes referred to as ''oops'' or ''kernel oops''.  While both panics and oops occur as the result of a failure state, an ''oops'' is more general in that it does not ''necessarily'' result in a deadlocked machine--sometimes the kernel can recover from an oops by killing the offending task and carrying on.}}
 +
 
 +
{{Tip|Pass the kernel parameter {{ic|1=oops=panic}} at boot or write {{ic|1}} to {{ic|/proc/sys/kernel/panic_on_oops}} to force a recoverable oops to issue a panic instead.  This is advisable is you are concerned about the small chance of system instability resulting from an oops recovery which may make future errors difficult to diagnose.}}
 +
 
 +
=== Examine panic message ===
 +
 
 +
If a kernel panic occurs very early in the boot process, you may see a message on the console containing "Kernel panic - not syncing:", but once [[Systemd]] is running, kernel messages will typically be captured and written to the system log. However, when a panic occurs, the diagnostic message output by the kernel is ''almost never'' written to the log file on disk because the machine deadlocks before {{ic|system-journald}} gets the chance. Therefore, the only way to examine the panic message is to view it on the console as it happens (without resorting to setting up a ''kdump crashkernel''). You can do this by booting with the following kernel parameters and attempting to reproduce the panic on tty1:
 +
 
 +
{{bc|1=systemd.journald.forward_to_console=1 console=tty1}}
 +
 
 +
{{Tip|In the event that the panic message scrolls away too quickly to examine, try passing the kernel parameter {{ic|1=pause_on_oops=''seconds''}} at boot.}}
 +
 
 +
==== Example scenario: bad module ====
 +
 
 +
It is possible to make a best guess as to what subsystem or module is causing the panic using the information in the diagnostic message. In this scenario, we have a panic on some imaginary machine during boot. Pay attention to the lines highlighted in '''bold''':
 +
 
 +
{{bc|'''kernel: BUG: unable to handle kernel NULL pointer dereference at (null)''' [1]
 +
'''kernel: IP: fw_core_init+0x18/0x1000 [firewire_core]''' [2]
 +
kernel: PGD 718d00067
 +
kernel: P4D 718d00067
 +
kernel: PUD 7b3611067
 +
kernel: PMD 0
 +
kernel:
 +
kernel: Oops: 0002 [#1] PREEMPT SMP
 +
'''kernel: Modules linked in: firewire_core(+) crc_itu_t cfg80211 rfkill ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG nf_conntrack_ipv4 ...''' [3]
 +
kernel: CPU: 6 PID: 1438 Comm: modprobe Tainted: P          O    4.13.3-1-ARCH #1
 +
kernel: Hardware name: Gigabyte Technology Co., Ltd. H97-D3H/H97-D3H-CF, BIOS F5 06/26/2014
 +
kernel: task: ffff9c667abd9e00 task.stack: ffffb53b8db34000
 +
kernel: RIP: 0010:fw_core_init+0x18/0x1000 [firewire_core]
 +
kernel: RSP: 0018:ffffb53b8db37c68 EFLAGS: 00010246
 +
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 +
kernel: RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffffc16d3af4
 +
kernel: RBP: ffffb53b8db37c70 R08: 0000000000000000 R09: ffffffffae113e95
 +
kernel: R10: ffffe93edfdb9680 R11: 0000000000000000 R12: ffffffffc16d9000
 +
kernel: R13: ffff9c6729bf8f60 R14: ffffffffc16d5710 R15: ffff9c6736e55840
 +
kernel: FS:  00007f301fc80b80(0000) GS:ffff9c675dd80000(0000) knlGS:0000000000000000
 +
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 +
kernel: CR2: 0000000000000000 CR3: 00000007c6456000 CR4: 00000000001406e0
 +
kernel: Call Trace:
 +
'''kernel:  do_one_initcall+0x50/0x190''' [4]
 +
kernel:  ? do_init_module+0x27/0x1f2
 +
kernel:  do_init_module+0x5f/0x1f2
 +
kernel:  load_module+0x23f3/0x2be0
 +
kernel:  SYSC_init_module+0x16b/0x1a0
 +
kernel:  ? SYSC_init_module+0x16b/0x1a0
 +
kernel:  SyS_init_module+0xe/0x10
 +
kernel:  entry_SYSCALL_64_fastpath+0x1a/0xa5
 +
kernel: RIP: 0033:0x7f301f3a2a0a
 +
kernel: RSP: 002b:00007ffcabbd1998 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
 +
kernel: RAX: ffffffffffffffda RBX: 0000000000c85a48 RCX: 00007f301f3a2a0a
 +
kernel: RDX: 000000000041aada RSI: 000000000001a738 RDI: 00007f301e7eb010
 +
kernel: RBP: 0000000000c8a520 R08: 0000000000000001 R09: 0000000000000085
 +
kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000c79208
 +
kernel: R13: 0000000000c8b4d8 R14: 00007f301e7fffff R15: 0000000000000030
 +
kernel: Code: <c7> 04 25 00 00 00 00 01 00 00 00 bb f4 ff ff ff e8 73 43 9c ec 48
 +
kernel: RIP: fw_core_init+0x18/0x1000 [firewire_core] RSP: ffffb53b8db37c68
 +
kernel: CR2: 0000000000000000
 +
kernel: ---[ end trace 71f4306ea1238f17 ]---
 +
'''kernel: Kernel panic - not syncing: Fatal exception''' [5]
 +
kernel: Kernel Offset: 0x80000000 from 0xffffffff810000000 (relocation range: 0xffffffff800000000-0xfffffffffbffffffff
 +
kernel: ---[ end Kernel panic - not syncing: Fatal exception}}
 +
 
 +
* [1] Indicates the type of error that caused the panic. In this case it was a programmer bug.
 +
* [2] Indicates that the panic happened in a function called ''fw_core_init'' in module ''firewire_core''.
 +
* [3] Indicates that ''firewire_core'' was the latest module to be loaded.
 +
* [4] Indicates that the function that called function ''fw_core_init'' was ''do_one_initcall''.
 +
* [5] Indicates that this ''oops'' message is, in fact, a kernel panic and the system is now deadlocked.
 +
 
 +
We can surmise then, that the panic occurred during the initialization routine of module ''firewire_core'' as it was loaded.  (We might assume then, that the machine's firewire hardware is incompatible with this version of the firewire driver module due to a programmer error, and will have to wait for a new release.) In the meantime, the easiest way to get the machine running again is to prevent the module from being loaded.  We can do this in one of two ways:
 +
 
 +
* If the module is being loaded during the execution of the ''initramfs'', reboot with the kernel parameter {{ic|1=rd.blacklist=firewire_core}}.
 +
* Otherwise reboot with the kernel parameter {{ic|1=module_blacklist=firewire_core}}.
 +
 
 +
=== Reboot into root shell and fix problem ===
 +
 
 +
You'll need a root shell to make changes to the system so the panic no longer occurs. If the panic occurs on boot, there are several strategies to obtain a root shell before the machine deadlocks:
 +
 
 +
* Reboot with the kernel parameter {{ic|emergency}}, {{ic|rd.emergency}}, or {{ic|-b}} to receive a prompt to login just after the root filesystem is mounted and {{ic|systemd}} is started.
 +
: {{Note|At this point, the root filesystem will be mounted '''read-only'''. Execute {{ic|# mount -o remount,rw /}} to make changes.}}
 +
* Reboot with the kernel parameter {{ic|rescue}}, {{ic|rd.rescue}}, {{ic|single}}, {{ic|s}}, {{ic|S}}, or {{ic|1}} to receive a prompt to login just after local filesystems are mounted.
 +
* Reboot with the kernel parameter {{ic|1=systemd.debug-shell=1}} to obtain a very early root shell on tty9.  Switch to it with by pressing {{ic|Ctrl-Alt-F9}}.
 +
* Experiment by rebooting with different sets of kernel parameters to possibly disable the kernel feature that is causing the panic.  Try the "old standbys" {{ic|1=acpi=off}} and {{ic|nolapic}}.
 +
: {{Tip|See {{ic|Documentation/admin-guide/kernel-parameters.txt}} in the Linux kernel source tree for all parameters.}}
 +
* As a last resort, boot with the '''Arch Linux Installation CD''' and mount the root filesystem on {{ic|/mnt}} then execute {{ic|# arch-chroot /mnt}}.
 +
 
 +
Disable the service or program that is causing the panic, roll-back a faulty update, or fix a configuration problem.
 +
 
 +
== Package management ==
 +
 
 +
See [[Pacman#Troubleshooting]] for general topics, and [[pacman/Package signing#Troubleshooting]] for issues with PGP keys.
 +
 
 +
== fuser ==
 +
 
 +
{{Expansion|Write an example how to use it.}}
 +
 
 +
''fuser'' is a command-line utility for identifying processes using resources such as files, filesystems and TCP/UDP ports.
 +
 +
''fuser'' is provided by the {{Pkg|psmisc}} package, which should be already installed as part of the {{Grp|base}} group.
 +
 
 +
== Session permissions ==
 +
 
 +
{{Note|You must be using [[systemd]] as your init system for local sessions to work.[https://www.archlinux.org/news/d-bus-now-launches-user-buses/] It is required for polkit permissions and ACLs for various devices (see {{ic|/usr/lib/udev/rules.d/70-uaccess.rules}} and  [http://enotty.pipebreaker.pl/2012/05/23/linux-automatic-user-acl-management/])}}
  
 
First, make sure you have a valid local session within X:
 
First, make sure you have a valid local session within X:
Line 36: Line 257:
 
  $ loginctl show-session $XDG_SESSION_ID
 
  $ loginctl show-session $XDG_SESSION_ID
  
This should contain {{ic|1=Remote=no}} and {{ic|1=Active=yes}} in the output. See [[xinitrc#Preserving the session]] for troubleshooting if it does not.
+
This should contain {{ic|1=Remote=no}} and {{ic|1=Active=yes}} in the output. If it does not, make sure that X runs on the same tty where the login occurred. This is required in order to preserve the logind session.
 +
 
 +
A D-Bus session should also be started along with X. See [[D-Bus#Starting the user session]] for more information on this.
 +
 
 +
Basic [[polkit]] actions do not require further set-up. Some polkit actions require further authentication, even with a local session. A polkit authentication agent needs to be running for this to work. See [[polkit#Authentication agents]] for more information on this.
 +
 
 +
== Message: "error while loading shared libraries" ==
 +
 
 +
{{Accuracy|Or the program needs to be rebuilt after a [[System_maintenance#Partial_upgrades_are_unsupported|soname bump]].}}
 +
 
 +
If, while using a program, you get an error similar to:
 +
 
 +
error while loading shared libraries: libusb-0.1.so.4: cannot open shared object file: No such file or directory
 +
 
 +
Use [[pacman]] or [[pkgfile]] to search for the package that owns the missing library:
 +
 
 +
{{hc|$ pacman -Fs libusb-0.1.so.4|
 +
extra/libusb-compat 0.1.5-1
 +
    usr/lib/libusb-0.1.so.4
 +
}}
 +
 
 +
In this case, the {{Pkg|libusb-compat}} package needs to be [[installed]].
 +
 
 +
The error could also mean that the package that you used to install your program does not list the library as a dependency in its [[PKGBUILD]]: if it is an official package, [[report a bug]]; if it is an [[AUR]] package, report it to the maintainer using its page in the AUR website.
 +
 
 +
== Message: "file: could not find any magic files!" ==
  
A dbus session should also be started along with X, in a way that exports a single {{ic|DBUS_SESSION_BUS_ADDRESS}} for every application in your session. If you use a desktop environment this will be handled for you, otherwise you can copy the code from {{ic|/etc/skel/.xinitrc}} that runs files in {{ic|/etc/X11/xinit/xinitrc.d}} to your {{ic|~/.xinitrc}}, and avoid using {{ic|dbus-launch}} or {{ic|ck-launch-session}}.
+
If you see this message, it likely indicates that a package update has corrupted the dynamic linker run-time bindings file and your system is now essentially crippled. You will not be able to recompile or reinstall the package responsible or rebuild the [[initramfs]] until you fix it.
  
Some polkit actions require further authentication, even with a local session. A polkit authentication agent needs to be running for this to work. There are two alternatives in the repositories:
+
=== Problem ===
  
* {{pkg|polkit-gnome}}, which provides {{ic|/usr/lib/polkit-gnome/polkit-gnome-authentication-agent-1}}
+
A package update likely added an invalid {{ic|''filename''.conf}} to the directory {{ic|/etc/ld.so.conf.d}} or edited {{ic|/etc/ld.so.conf}} incorrectly.  The result is that the dynamic linker run-time bindings file {{ic|/etc/ld.so.cache}} is being re-generated with invalid data. This can potentially cause all programs on the system that depend on shared libraries to fail (ie. almost all of them).
* {{pkg|polkit-kde}}, which provides {{ic|/usr/lib/kde4/libexec/polkit-kde-authentication-agent-1}}
 
  
==Single user mode==
+
=== Solution ===
If you cannot boot due to errors caused by a daemon, display manager or Xorg, you may be able use the single user [[Runlevels|runlevel]]:
 
{{poor writing}}
 
#Boot to single-user mode by appending {{ic|1}} or {{ic|s}} to the kernel line in GRUB.
 
#Then disable the [[systemd]] service that is causing the problem.
 
#Change to the multi-user mode systemd [[Systemd#Targets|target]].
 
#Then try to track down the issue by running the service manually.
 
  
==file: could not find any magic files!==
+
# Boot with the '''''Arch Linux Installation CD'''''.
{{Poor writing}}
+
# Mount your root {{ic|/}} filesystem on {{ic|/mnt}} and your {{ic|/boot}} filesystem on {{ic|/mnt/boot}} and chroot into the broken system by executing {{ic|# arch-chroot /mnt}}.
''Example:'' After an every-day routine update or following the installation of a package you are given the following error:
+
# Examine the file {{ic|/etc/ld.so.conf}} and remove any invalid lines found.
# file: could not find any magic files!
+
# Examine the files located in the directory {{ic|/etc/ld.so.conf.d/}} and remove any invalid files.
This will most likely leave your system crippled. And, any attempts made to recompile/reinstall the package(s) responsible for the breakage will fail. Also, any attempts made to try to rebuild the [[mkinitcpio|initramfs]] will result in the following:
+
# Rebuild the dynamic linker run-time bindings file {{ic|/etc/ld.so.cache}} by executing {{ic|# ldconfig}}.
# mkinitcpio -p linux
+
# Rebuild the [[Initramfs]] by executing {{ic|# mkinitcpio -p linux}}.
==> Building image from preset: 'default'
+
# Exit the chroot, unmount filesystems, and reboot back into your installed system.
  -> -k /boot/vmlinuz-linux -c /etc/mkinitcpio.conf -g /boot/initramfs-linux.img
 
file: could not find any magic files!
 
==> ERROR: invalid kernel specifier: `/boot/vmlinuz-linux'
 
==> Building image from preset: 'fallback'
 
  -> -k /boot/vmlinuz-linux -c /etc/mkinitcpio.conf -g /boot/initramfs-linux-fallback.img -S autodetect
 
file: could not find any magic files!
 
@==> ERROR: invalid kernel specifier: `/boot/vmlinuz-linux'
 
  
===Solution===
+
== See also ==
Typically a previously installed application had placed a configuration file within {{ic|/etc/ld.so.conf.d/}} or it had made changes to {{ic|/etc/ld.so.conf}} which are now invalid.
 
#Boot into the Arch Linux Live CD / Installation Media.
 
#Mount your root ({{ic|'''/'''}}) partition to {{ic|/mnt}} and using [[Change_Root#Change_root|arch-chroot]], [[Change_Root|chroot]] into your system.
 
{{Note|[[Change_Root#Change_root|arch-chroot]] leaves mounting the {{ic|/boot}} partition up to the user.}}
 
#Examine {{ic|/etc/ld.so.conf}} and remove any invalid lines found.
 
#Examine the files located inside the directory {{ic|/etc/ld.so.conf.d/}} and remove all invalid files.
 
#Rebuild the [[initramfs]].
 
# mkinitcpio -p linux
 
#Reboot back to your installed system.
 
#Once booted, reinstall the package that was responsible for leaving your system inoperable using:
 
# pacman -S <package>
 
  
==See also==
+
* [http://www.tuxradar.com/content/how-fix-most-common-linux-problems Fix the Most Common Problems]
*[[IRC Collaborative Debugging]]
+
* [https://www.reddit.com/r/archlinux/comments/tjjwr/archlinux_a_howto_in_troubleshooting_for_newcomers/ A how-to in troubleshooting for newcomers]
*[[Step By Step Debugging Guide]]
+
* [http://wiki.ultimatebootcd.com/index.php?title=Tools List of Tools for UBCD] - Memtest-like tools to add to grub.cfg on UltimateBootCD.com
*[http://www.maximumpc.com/article/features/linux_troubleshooting_guide_fix_most_common_problems Fix the Most Common Problems]
+
* [[Wikipedia:BIOS Boot partition|BIOS Boot partition]] on Wikipedia
 +
* [https://fedoraproject.org/wiki/QA/Sysrq\x20QA/Sysrq Using sysrq] on Fedoraproject.org
 +
* [http://freedesktop.org/wiki/Software/systemd/Debugging#Debug_Logging_to_a_Serial_Console Debug Logging to a Serial Console] on Freedesktop.org
 +
* [https://web.archive.org/web/20120217124742/http://www.lesswatts.org/projects/acpi/debug.php How to Isolate Linux ACPI Issues] on Archive.org

Latest revision as of 23:54, 17 October 2017

This article explains some methods for general troubleshooting. For application specific issues, please reference the particular wiki page for that program.

General procedures

Attention to detail

In order to resolve an issue that you are having, it is absolutely crucial to have a firm basic understanding of how that specific subsystem functions. How does it work, and what does it need to run without error? If you cannot comfortably answer these question then you would best review the Archwiki article for the subsystem that you are having trouble with. Once you feel like you've understood it, it will be easier for you to pinpoint the cause of the problem.

Questions/checklist

The following gives a number of questions for you whenever dealing with a malfunctioning system. Under each question there are notes explaining how you should be answering each question, followed by some light examples on how to easily gather data output and what tools can be used to review logs and the journal.

  1. What is the issue(s)?
    Be as precise as possible. This will help you not get confused and/or side-tracked when looking up specific information.
  2. Are there error messages? (if any)
    Copy and paste full outputs that contain error messages related to your issue into a separate file, such as $HOME/issue.log. For example, to forward the output of the following mkinitcpio command to $HOME/issue.log:
    $ mkinitcpio -p linux >> $HOME/issue.log
  3. Can you reproduce the issue?
    If so, give exact step-by-step instructions/commands needed to do so.
  4. When did you first encounter these issues and what was changed between then and when the system was operating without error?
    If it occurred right after an update then, list all packages that were updated. Include version numbers, also, paste the entire update from pacman.log (/var/log/pacman.log). Also take note of the statuses of any service(s) needed to support the malfunctioning application(s) using systemd's systemctl tools. For example, to forward the output of the following systemd command to $HOME/issue.log:
    $ systemctl status dhcpcd@eth0.service >> $HOME/issue.log
    Note: Using >> will ensure any previous text in $HOME/issue.log will not be overwritten.

Approach

Rather than approaching an issue by stating,

Application X does not work.

you will find it more helpful to formulate your issue in the context of the system as a whole, as:

Application X produces Y error(s) when performing Z tasks under conditions A and B.

Additional support

With all the information in front of you you should have a good idea as to what is going on with the system and you can now start working on a proper fix.

If you require any additional support, it can be found on the forums or IRC at irc.freenode.net #archlinux See IRC channels for other options.

When asking for support post the complete output/logs, not just what you think are the significant sections. Sources of information include:

  • Full output of any command involved - don't just select what you think is relevant.
  • Output from systemd's journalctl. For more extensive output, use the systemd.log_level=debug boot parameter.
  • Log files (have a look in /var/log)
  • Relevant configuration files
  • Drivers involved
  • Versions of packages involved
  • Kernel: dmesg. For a boot problem, at least the last 10 lines displayed, preferably more
  • Networking: Exact output of commands involved, and any configuration files
  • Xorg: /var/log/Xorg.0.log, and prior logs if you have overwritten the problematic one
  • Pacman: If a recent upgrade broke something, look in /var/log/pacman.log

One of the better ways to post this information is to use an online pastebin. You can install the pbpst or gist package to automatically upload information. For example, to upload the content of your systemd journal from this boot you would do:

# journalctl -xb | pbpst -S

A link will then be output that you can paste to the forum or IRC.

Additionally, before posting your question, you may wish to review how to ask smart questions. See also Code of conduct.

Boot problems

Diagnosing errors during the boot process involves changing the kernel parameters, and rebooting the system.

If booting the system is not possible, boot from a live image and change root to the existing system.

Console messages

After the boot process, the screen is cleared and the login prompt appears, leaving users unable to read init output and error messages. This default behavior may be modified using methods outlined in the sections below.

Note that regardless of the chosen option, kernel messages can be displayed for inspection after booting by using dmesg or all logs from the current boot with journalctl -b.

Flow control

This is basic management that applies to most terminal emulators, including virtual consoles (vc):

  • Press Ctrl+S to pause the output
  • And Ctrl+Q to resume it

This pauses not only the output, but also programs which try to print to the terminal, as they will block on the write() calls for as long as the output is paused. If your init appears frozen, make sure the system console is not paused.

To see error messages which are already displayed, see Getty#Have boot messages stay on tty1.

Scrollback

Scrollback allows the user to go back and view text which has scrolled off the screen of a text console. This is made possible by a buffer created between the video adapter and the display device called the scrollback buffer. By default, the key combinations of Shift+PageUp and Shift+PageDown scroll the buffer up and down.

If scrolling up all the way does not show you enough information, you need to expand your scrollback buffer to hold more output. This is done by tweaking the kernel's framebuffer console (fbcon) with the kernel parameter fbcon=scrollback:Nk where N is the desired buffer size is kilobytes. The default size is 32k.

If this does not work, your framebuffer console may not be properly enabled. Check the Framebuffer Console documentation for other parameters, e.g. for changing the framebuffer driver.

Debug output

Most kernel messages are hidden during boot. You can see more of these messages by adding different kernel parameters. The simplest ones are:

  • debug enables debug messages for both the kernel and systemd
  • ignore_loglevel forces all kernel messages to be printed

Other parameters you can add that might be useful in certain situations are:

  • earlyprintk=vga,keep prints kernel messages very early in the boot process, in case the kernel would crash before output is shown. You must change vga to efi for EFI systems
  • log_buf_len=16M allocates a larger (16MB) kernel message buffer, to ensure that debug output is not overwritten

There are also a number of separate debug parameters for enabling debugging in specific subsystems e.g. bootmem_debug, sched_debug. Check the kernel parameter documentation for specific information.

Note: If you cannot scroll back far enough to view the desired boot output, you should increase the size of the scrollback buffer.

Recovery shells

Getting an interactive shell at some stage in the boot process can help you pinpoint exactly where and why something is failing. There are several kernel parameters for doing so, but they all launch a normal shell which you can exit to let the kernel resume what it was doing:

  • rescue launches a shell shortly after the root filesystem is remounted read/write
  • emergency launches a shell even earlier, before most filesystems are mounted
  • init=/bin/sh (as a last resort) changes the init program to a root shell. rescue and emergency both rely on systemd, but this should work even if systemd is broken

Another option is systemd's debug-shell which adds a root shell on tty9 (accessible with Ctrl+Alt+F9). It can be enabled by either adding systemd.debug-shell to the kernel parameters, or by enabling debug-shell.service. Take care to disable the service when done to avoid the security risk of leaving a root shell open on every boot.

Blank screen with Intel video

This is most likely due to a problem with kernel mode setting. Try disabling modesetting or changing the video port.

Stuck while loading the kernel

Try disabling ACPI by adding the acpi=off kernel parameter.

Debugging kernel modules

See Kernel modules#Obtaining information.

Debugging hardware

  • You can display extra debugging information about your hardware by following udev#Debug output.
  • Ensure that Microcode updates are applied on your system.
  • Test your device's RAM with Memtest86+. Unstable RAM may lead to some extremely odd issues, ranging from random crashes to data corruption.

Kernel panics

A kernel panic occurs when the Linux kernel enters an unrecoverable failure state. The state typically originates from buggy hardware drivers resulting in the machine being deadlocked, non-responsive, and requiring a reboot. Just prior to deadlock, a diagnostic message is generated, consisting of: the machine state when the failure ocurred, a call trace leading to the kernel function that recognized the failure, and a listing of currently loaded modules. Thankfully, kernel panics don't happen very often using mainline versions of the kernel--such as those supplied by the official repositories--but when they do happen, you need to know how to deal with them.

Note: Kernel panics are sometimes referred to as oops or kernel oops. While both panics and oops occur as the result of a failure state, an oops is more general in that it does not necessarily result in a deadlocked machine--sometimes the kernel can recover from an oops by killing the offending task and carrying on.
Tip: Pass the kernel parameter oops=panic at boot or write 1 to /proc/sys/kernel/panic_on_oops to force a recoverable oops to issue a panic instead. This is advisable is you are concerned about the small chance of system instability resulting from an oops recovery which may make future errors difficult to diagnose.

Examine panic message

If a kernel panic occurs very early in the boot process, you may see a message on the console containing "Kernel panic - not syncing:", but once Systemd is running, kernel messages will typically be captured and written to the system log. However, when a panic occurs, the diagnostic message output by the kernel is almost never written to the log file on disk because the machine deadlocks before system-journald gets the chance. Therefore, the only way to examine the panic message is to view it on the console as it happens (without resorting to setting up a kdump crashkernel). You can do this by booting with the following kernel parameters and attempting to reproduce the panic on tty1:

systemd.journald.forward_to_console=1 console=tty1
Tip: In the event that the panic message scrolls away too quickly to examine, try passing the kernel parameter pause_on_oops=seconds at boot.

Example scenario: bad module

It is possible to make a best guess as to what subsystem or module is causing the panic using the information in the diagnostic message. In this scenario, we have a panic on some imaginary machine during boot. Pay attention to the lines highlighted in bold:

kernel: BUG: unable to handle kernel NULL pointer dereference at (null) [1]
kernel: IP: fw_core_init+0x18/0x1000 [firewire_core] [2]
kernel: PGD 718d00067 
kernel: P4D 718d00067 
kernel: PUD 7b3611067 
kernel: PMD 0 
kernel: 
kernel: Oops: 0002 [#1] PREEMPT SMP
kernel: Modules linked in: firewire_core(+) crc_itu_t cfg80211 rfkill ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG nf_conntrack_ipv4 ... [3] 
kernel: CPU: 6 PID: 1438 Comm: modprobe Tainted: P           O    4.13.3-1-ARCH #1
kernel: Hardware name: Gigabyte Technology Co., Ltd. H97-D3H/H97-D3H-CF, BIOS F5 06/26/2014
kernel: task: ffff9c667abd9e00 task.stack: ffffb53b8db34000
kernel: RIP: 0010:fw_core_init+0x18/0x1000 [firewire_core]
kernel: RSP: 0018:ffffb53b8db37c68 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffffc16d3af4
kernel: RBP: ffffb53b8db37c70 R08: 0000000000000000 R09: ffffffffae113e95
kernel: R10: ffffe93edfdb9680 R11: 0000000000000000 R12: ffffffffc16d9000
kernel: R13: ffff9c6729bf8f60 R14: ffffffffc16d5710 R15: ffff9c6736e55840
kernel: FS:  00007f301fc80b80(0000) GS:ffff9c675dd80000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000000 CR3: 00000007c6456000 CR4: 00000000001406e0
kernel: Call Trace:
kernel:  do_one_initcall+0x50/0x190 [4]
kernel:  ? do_init_module+0x27/0x1f2
kernel:  do_init_module+0x5f/0x1f2
kernel:  load_module+0x23f3/0x2be0
kernel:  SYSC_init_module+0x16b/0x1a0
kernel:  ? SYSC_init_module+0x16b/0x1a0
kernel:  SyS_init_module+0xe/0x10
kernel:  entry_SYSCALL_64_fastpath+0x1a/0xa5
kernel: RIP: 0033:0x7f301f3a2a0a
kernel: RSP: 002b:00007ffcabbd1998 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
kernel: RAX: ffffffffffffffda RBX: 0000000000c85a48 RCX: 00007f301f3a2a0a
kernel: RDX: 000000000041aada RSI: 000000000001a738 RDI: 00007f301e7eb010
kernel: RBP: 0000000000c8a520 R08: 0000000000000001 R09: 0000000000000085
kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000c79208
kernel: R13: 0000000000c8b4d8 R14: 00007f301e7fffff R15: 0000000000000030
kernel: Code: <c7> 04 25 00 00 00 00 01 00 00 00 bb f4 ff ff ff e8 73 43 9c ec 48 
kernel: RIP: fw_core_init+0x18/0x1000 [firewire_core] RSP: ffffb53b8db37c68
kernel: CR2: 0000000000000000
kernel: ---[ end trace 71f4306ea1238f17 ]---
kernel: Kernel panic - not syncing: Fatal exception [5]
kernel: Kernel Offset: 0x80000000 from 0xffffffff810000000 (relocation range: 0xffffffff800000000-0xfffffffffbffffffff
kernel: ---[ end Kernel panic - not syncing: Fatal exception
  • [1] Indicates the type of error that caused the panic. In this case it was a programmer bug.
  • [2] Indicates that the panic happened in a function called fw_core_init in module firewire_core.
  • [3] Indicates that firewire_core was the latest module to be loaded.
  • [4] Indicates that the function that called function fw_core_init was do_one_initcall.
  • [5] Indicates that this oops message is, in fact, a kernel panic and the system is now deadlocked.

We can surmise then, that the panic occurred during the initialization routine of module firewire_core as it was loaded. (We might assume then, that the machine's firewire hardware is incompatible with this version of the firewire driver module due to a programmer error, and will have to wait for a new release.) In the meantime, the easiest way to get the machine running again is to prevent the module from being loaded. We can do this in one of two ways:

  • If the module is being loaded during the execution of the initramfs, reboot with the kernel parameter rd.blacklist=firewire_core.
  • Otherwise reboot with the kernel parameter module_blacklist=firewire_core.

Reboot into root shell and fix problem

You'll need a root shell to make changes to the system so the panic no longer occurs. If the panic occurs on boot, there are several strategies to obtain a root shell before the machine deadlocks:

  • Reboot with the kernel parameter emergency, rd.emergency, or -b to receive a prompt to login just after the root filesystem is mounted and systemd is started.
Note: At this point, the root filesystem will be mounted read-only. Execute # mount -o remount,rw / to make changes.
  • Reboot with the kernel parameter rescue, rd.rescue, single, s, S, or 1 to receive a prompt to login just after local filesystems are mounted.
  • Reboot with the kernel parameter systemd.debug-shell=1 to obtain a very early root shell on tty9. Switch to it with by pressing Ctrl-Alt-F9.
  • Experiment by rebooting with different sets of kernel parameters to possibly disable the kernel feature that is causing the panic. Try the "old standbys" acpi=off and nolapic.
Tip: See Documentation/admin-guide/kernel-parameters.txt in the Linux kernel source tree for all parameters.
  • As a last resort, boot with the Arch Linux Installation CD and mount the root filesystem on /mnt then execute # arch-chroot /mnt.

Disable the service or program that is causing the panic, roll-back a faulty update, or fix a configuration problem.

Package management

See Pacman#Troubleshooting for general topics, and pacman/Package signing#Troubleshooting for issues with PGP keys.

fuser

Tango-view-fullscreen.pngThis article or section needs expansion.Tango-view-fullscreen.png

Reason: Write an example how to use it. (Discuss in Talk:General troubleshooting#)

fuser is a command-line utility for identifying processes using resources such as files, filesystems and TCP/UDP ports.

fuser is provided by the psmisc package, which should be already installed as part of the base group.

Session permissions

Note: You must be using systemd as your init system for local sessions to work.[1] It is required for polkit permissions and ACLs for various devices (see /usr/lib/udev/rules.d/70-uaccess.rules and [2])

First, make sure you have a valid local session within X:

$ loginctl show-session $XDG_SESSION_ID

This should contain Remote=no and Active=yes in the output. If it does not, make sure that X runs on the same tty where the login occurred. This is required in order to preserve the logind session.

A D-Bus session should also be started along with X. See D-Bus#Starting the user session for more information on this.

Basic polkit actions do not require further set-up. Some polkit actions require further authentication, even with a local session. A polkit authentication agent needs to be running for this to work. See polkit#Authentication agents for more information on this.

Message: "error while loading shared libraries"

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: Or the program needs to be rebuilt after a soname bump. (Discuss in Talk:General troubleshooting#)

If, while using a program, you get an error similar to:

error while loading shared libraries: libusb-0.1.so.4: cannot open shared object file: No such file or directory

Use pacman or pkgfile to search for the package that owns the missing library:

$ pacman -Fs libusb-0.1.so.4
extra/libusb-compat 0.1.5-1
    usr/lib/libusb-0.1.so.4

In this case, the libusb-compat package needs to be installed.

The error could also mean that the package that you used to install your program does not list the library as a dependency in its PKGBUILD: if it is an official package, report a bug; if it is an AUR package, report it to the maintainer using its page in the AUR website.

Message: "file: could not find any magic files!"

If you see this message, it likely indicates that a package update has corrupted the dynamic linker run-time bindings file and your system is now essentially crippled. You will not be able to recompile or reinstall the package responsible or rebuild the initramfs until you fix it.

Problem

A package update likely added an invalid filename.conf to the directory /etc/ld.so.conf.d or edited /etc/ld.so.conf incorrectly. The result is that the dynamic linker run-time bindings file /etc/ld.so.cache is being re-generated with invalid data. This can potentially cause all programs on the system that depend on shared libraries to fail (ie. almost all of them).

Solution

  1. Boot with the Arch Linux Installation CD.
  2. Mount your root / filesystem on /mnt and your /boot filesystem on /mnt/boot and chroot into the broken system by executing # arch-chroot /mnt.
  3. Examine the file /etc/ld.so.conf and remove any invalid lines found.
  4. Examine the files located in the directory /etc/ld.so.conf.d/ and remove any invalid files.
  5. Rebuild the dynamic linker run-time bindings file /etc/ld.so.cache by executing # ldconfig.
  6. Rebuild the Initramfs by executing # mkinitcpio -p linux.
  7. Exit the chroot, unmount filesystems, and reboot back into your installed system.

See also