General troubleshooting
This article explains some methods for general troubleshooting. For application specific issues, please reference the particular wiki page for that program.
General procedures
Attention to detail
In order to resolve an issue that you are having, it is absolutely crucial to have a firm basic understanding of how that specific subsystem functions. How does it work, and what does it need to run without error? If you cannot comfortably answer these question then you would best review the Archwiki article for the subsystem that you are having trouble with. Once you feel like you have understood it, it will be easier for you to pinpoint the cause of the problem.
Questions/checklist
The following gives a number of questions for you whenever dealing with a malfunctioning system. Under each question there are notes explaining how you should be answering each question, followed by some light examples on how to easily gather data output and what tools can be used to review logs and the journal.
- What is the issue(s)?
- Be as precise as possible. This will help you not get confused and/or side-tracked when looking up specific information.
- Are there error messages? (if any)
- Copy and paste full outputs that contain error messages related to your issue into a separate file, such as
$HOME/issue.log
. For example, to forward the output of the following mkinitcpio command to$HOME/issue.log
: $ mkinitcpio -p linux >> $HOME/issue.log
- Copy and paste full outputs that contain error messages related to your issue into a separate file, such as
- Can you reproduce the issue?
- If so, give exact step-by-step instructions/commands needed to do so. For command line situations, you might start by simply recording your commands, and their output. For example, with the script(1) command, which is part of util-linux. To get rid of the non printable, and other control, characters embedded in the
typescript
file, basic unix tools can be used. For example,tr
. ANSI sequences are more difficult to handle because they are more complex. Usingsed
is described at https://stackpointer.io/unix/unix-linux-remove-ansi-escape-sequences/464/.
- If so, give exact step-by-step instructions/commands needed to do so. For command line situations, you might start by simply recording your commands, and their output. For example, with the script(1) command, which is part of util-linux. To get rid of the non printable, and other control, characters embedded in the
- When did you first encounter these issues and what was changed between then and when the system was operating without error?
- If it occurred right after an update then, list all packages that were updated. Include version numbers, also, paste the entire update from pacman.log (
/var/log/pacman.log
). Also take note of the statuses of any service(s) needed to support the malfunctioning application(s) using systemd's systemctl tools. For example, to forward the output of the following systemd command to$HOME/issue.log
: $ systemctl status dhcpcd@eth0.service >> $HOME/issue.log
- Note: Using
>>
will ensure any previous text in$HOME/issue.log
will not be overwritten.
- If it occurred right after an update then, list all packages that were updated. Include version numbers, also, paste the entire update from pacman.log (
Approach
Rather than approaching an issue by stating,
Application X does not work.
you will find it more helpful to formulate your issue in the context of the system as a whole, as:
Application X produces Y error(s) when performing Z tasks under conditions A and B.
Additional support
With all the information in front of you you should have a good idea as to what is going on with the system and you can now start working on a proper fix.
If you require any additional support, it can be found on the forums or IRC at irc.freenode.net #archlinux. See IRC channels for other options.
When asking for support post the complete output/logs, not just what you think are the significant sections. Sources of information include:
- Full output of any command involved - do not just select what you think is relevant.
- Output from systemd's
journalctl
. For more extensive output, use thesystemd.log_level=debug
boot parameter. - Log files (have a look in
/var/log
) - Relevant configuration files
- Drivers involved
- Versions of packages involved
- Kernel:
dmesg
. For a boot problem, at least the last 10 lines displayed, preferably more - Networking: Exact output of commands involved, and any configuration files
- Xorg:
/var/log/Xorg.0.log
, and prior logs if you have overwritten the problematic one - Pacman: If a recent upgrade broke something, look in
/var/log/pacman.log
One of the better ways to post this information is to use an online pastebin. You can install the pbpstAUR or gist package to automatically upload information. For example, to upload the content of your systemd journal from this boot you would do:
# journalctl -xb | pbpst -S
A link will then be output that you can paste to the forum or IRC.
Additionally, before posting your question, you may wish to review how to ask smart questions. See also Code of conduct.
Boot problems
Diagnosing errors during the boot process involves changing the kernel parameters, and rebooting the system.
If booting the system is not possible, boot from a live image and change root to the existing system.
Console messages
After the boot process, the screen is cleared and the login prompt appears, leaving users unable to read init output and error messages. This default behavior may be modified using methods outlined in the sections below.
Note that regardless of the chosen option, kernel messages can be displayed for inspection after booting by using journalctl -k
or dmesg
. To display all logs from the current boot use journalctl -b
.
Flow control
This is basic management that applies to most terminal emulators, including virtual consoles (vc):
- Press
Ctrl+s
to pause the output - And
Ctrl+q
to resume it
This pauses not only the output, but also programs which try to print to the terminal, as they will block on the write()
calls for as long as the output is paused. If your init appears frozen, make sure the system console is not paused.
To see error messages which are already displayed, see Getty#Have boot messages stay on tty1.
Debug output
Most kernel messages are hidden during boot. You can see more of these messages by adding different kernel parameters. The simplest ones are:
debug
enables debug messages for both the kernel and systemdignore_loglevel
forces all kernel messages to be printed
Other parameters you can add that might be useful in certain situations are:
earlyprintk=vga,keep
prints kernel messages very early in the boot process, in case the kernel would crash before output is shown. You must changevga
toefi
for EFI systemslog_buf_len=16M
allocates a larger (16 MiB) kernel message buffer, to ensure that debug output is not overwritten
There are also a number of separate debug parameters for enabling debugging in specific subsystems e.g. bootmem_debug
, sched_debug
. Check the kernel parameter documentation for specific information.
netconsole
netconsole is a kernel module that sends all kernel log messages (i.e. dmesg) over the network to another computer, without involving user space (e.g. syslogd). Name "netconsole" is a misnomer because it is not really a "console", more like a remote logging service.
It can be used either built-in or as a module. Built-in netconsole initializes immediately after NIC cards and will bring up the specified interface as soon as possible. The module is mainly used for capturing kernel panic output from a headless machine, or in other situations where the user space is no more functional.
Recovery shells
Getting an interactive shell at some stage in the boot process can help you pinpoint exactly where and why something is failing. There are several kernel parameters for doing so, but they all launch a normal shell which you can exit
to let the kernel resume what it was doing:
rescue
launches a shell shortly after the root filesystem is remounted read/writeemergency
launches a shell even earlier, before most filesystems are mountedinit=/bin/sh
(as a last resort) changes the init program to a root shell.rescue
andemergency
both rely on systemd, but this should work even if systemd is broken
Another option is systemd's debug-shell which adds a root shell on tty9
(accessible with Ctrl+Alt+F9
). It can be enabled by either adding systemd.debug_shell
to the kernel parameters, or by enabling debug-shell.service
.
Blank screen with Intel video
This is most likely due to a problem with kernel mode setting. Try disabling modesetting or changing the video port.
Stuck while loading the kernel
Try disabling ACPI by adding the acpi=off
kernel parameter.
Debugging kernel modules
See Kernel modules#Obtaining information.
Debugging hardware
- You can display extra debugging information about your hardware by following udev#Debug output.
- Ensure that Microcode updates are applied on your system.
- To test the RAM, see Stress testing#Stressing memory.
Kernel panics
A kernel panic occurs when the Linux kernel enters an unrecoverable failure state. The state typically originates from buggy hardware drivers resulting in the machine being deadlocked, non-responsive, and requiring a reboot. Just prior to deadlock, a diagnostic message is generated, consisting of: the machine state when the failure occurred, a call trace leading to the kernel function that recognized the failure, and a listing of currently loaded modules. Thankfully, kernel panics do not happen very often using mainline versions of the kernel--such as those supplied by the official repositories--but when they do happen, you need to know how to deal with them.
oops=panic
at boot or write 1
to /proc/sys/kernel/panic_on_oops
to force a recoverable oops to issue a panic instead. This is advisable if you are concerned about the small chance of system instability resulting from an oops recovery which may make future errors difficult to diagnose.Examine panic message
If a kernel panic occurs very early in the boot process, you may see a message on the console containing "Kernel panic - not syncing:", but once Systemd is running, kernel messages will typically be captured and written to the system log. However, when a panic occurs, the diagnostic message output by the kernel is almost never written to the log file on disk because the machine deadlocks before system-journald
gets the chance. Therefore, the only way to examine the panic message is to view it on the console as it happens (without resorting to setting up a kdump crashkernel). You can do this by booting with the following kernel parameters and attempting to reproduce the panic on tty1:
systemd.journald.forward_to_console=1 console=tty1
pause_on_oops=seconds
at boot.Example scenario: bad module
It is possible to make a best guess as to what subsystem or module is causing the panic using the information in the diagnostic message. In this scenario, we have a panic on some imaginary machine during boot. Pay attention to the lines highlighted in bold:
kernel: BUG: unable to handle kernel NULL pointer dereference at (null) [1] kernel: IP: fw_core_init+0x18/0x1000 [firewire_core] [2] kernel: PGD 718d00067 kernel: P4D 718d00067 kernel: PUD 7b3611067 kernel: PMD 0 kernel: kernel: Oops: 0002 [#1] PREEMPT SMP kernel: Modules linked in: firewire_core(+) crc_itu_t cfg80211 rfkill ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG nf_conntrack_ipv4 ... [3] kernel: CPU: 6 PID: 1438 Comm: modprobe Tainted: P O 4.13.3-1-ARCH #1 kernel: Hardware name: Gigabyte Technology Co., Ltd. H97-D3H/H97-D3H-CF, BIOS F5 06/26/2014 kernel: task: ffff9c667abd9e00 task.stack: ffffb53b8db34000 kernel: RIP: 0010:fw_core_init+0x18/0x1000 [firewire_core] kernel: RSP: 0018:ffffb53b8db37c68 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 kernel: RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffffc16d3af4 kernel: RBP: ffffb53b8db37c70 R08: 0000000000000000 R09: ffffffffae113e95 kernel: R10: ffffe93edfdb9680 R11: 0000000000000000 R12: ffffffffc16d9000 kernel: R13: ffff9c6729bf8f60 R14: ffffffffc16d5710 R15: ffff9c6736e55840 kernel: FS: 00007f301fc80b80(0000) GS:ffff9c675dd80000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000000 CR3: 00000007c6456000 CR4: 00000000001406e0 kernel: Call Trace: kernel: do_one_initcall+0x50/0x190 [4] kernel: ? do_init_module+0x27/0x1f2 kernel: do_init_module+0x5f/0x1f2 kernel: load_module+0x23f3/0x2be0 kernel: SYSC_init_module+0x16b/0x1a0 kernel: ? SYSC_init_module+0x16b/0x1a0 kernel: SyS_init_module+0xe/0x10 kernel: entry_SYSCALL_64_fastpath+0x1a/0xa5 kernel: RIP: 0033:0x7f301f3a2a0a kernel: RSP: 002b:00007ffcabbd1998 EFLAGS: 00000246 ORIG_RAX: 00000000000000af kernel: RAX: ffffffffffffffda RBX: 0000000000c85a48 RCX: 00007f301f3a2a0a kernel: RDX: 000000000041aada RSI: 000000000001a738 RDI: 00007f301e7eb010 kernel: RBP: 0000000000c8a520 R08: 0000000000000001 R09: 0000000000000085 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000c79208 kernel: R13: 0000000000c8b4d8 R14: 00007f301e7fffff R15: 0000000000000030 kernel: Code: <c7> 04 25 00 00 00 00 01 00 00 00 bb f4 ff ff ff e8 73 43 9c ec 48 kernel: RIP: fw_core_init+0x18/0x1000 [firewire_core] RSP: ffffb53b8db37c68 kernel: CR2: 0000000000000000 kernel: ---[ end trace 71f4306ea1238f17 ]--- kernel: Kernel panic - not syncing: Fatal exception [5] kernel: Kernel Offset: 0x80000000 from 0xffffffff810000000 (relocation range: 0xffffffff800000000-0xfffffffffbffffffff kernel: ---[ end Kernel panic - not syncing: Fatal exception
- [1] Indicates the type of error that caused the panic. In this case it was a programmer bug.
- [2] Indicates that the panic happened in a function called fw_core_init in module firewire_core.
- [3] Indicates that firewire_core was the latest module to be loaded.
- [4] Indicates that the function that called function fw_core_init was do_one_initcall.
- [5] Indicates that this oops message is, in fact, a kernel panic and the system is now deadlocked.
We can surmise then, that the panic occurred during the initialization routine of module firewire_core as it was loaded. (We might assume then, that the machine's firewire hardware is incompatible with this version of the firewire driver module due to a programmer error, and will have to wait for a new release.) In the meantime, the easiest way to get the machine running again is to prevent the module from being loaded. We can do this in one of two ways:
- If the module is being loaded during the execution of the initramfs, reboot with the kernel parameter
rd.blacklist=firewire_core
. - Otherwise reboot with the kernel parameter
module_blacklist=firewire_core
.
Reboot into root shell and fix problem
You will need a root shell to make changes to the system so the panic no longer occurs. If the panic occurs on boot, there are several strategies to obtain a root shell before the machine deadlocks:
- Reboot with the kernel parameter
emergency
,rd.emergency
, or-b
to receive a prompt to login just after the root filesystem is mounted andsystemd
is started.
- Note: At this point, the root filesystem will be mounted read-only. Execute
mount -o remount,rw /
as the root user to make changes.
- Reboot with the kernel parameter
rescue
,rd.rescue
,single
,s
,S
, or1
to receive a prompt to login just after local filesystems are mounted. - Reboot with the kernel parameter
systemd.debug_shell
to obtain a very early root shell on tty9. Switch to it with by pressingCtrl+Alt+F9
. - Experiment by rebooting with different sets of kernel parameters to possibly disable the kernel feature that is causing the panic. Try the "old standbys"
acpi=off
andnolapic
.
- Tip: See kernel-parameters.html for all kernel parameters.
- As a last resort, boot with the Arch Linux Installation CD and mount the root filesystem on
/mnt
then executearch-chroot /mnt
as the root user. - Disable the service or program that is causing the panic, roll-back a faulty update, or fix a configuration problem.
Package management
See Pacman#Troubleshooting for general topics, and pacman/Package signing#Troubleshooting for issues with PGP keys.
Fixing a broken system
If you performed a partial upgrade that broke things, try updating all packages, and if successful, possibly reboot:
# pacman -Syu
If you usually boot into a GUI and that's failing, perhaps you can press Ctrl+Alt+F1
through Ctrl+Alt+F6
and get to a working tty to run pacman through.
If the system is broken enough that you are unable to run pacman, boot using a monthly Arch ISO from a USB flash drive, an optical disc or a network with PXE. (Don't follow any of the rest of the installation guide.)
Mount your root filesystem:
[ISO] # mount /dev/rootFilesystemDevice /mnt
Mount any other partitions that you created separately, adding the prefix /mnt
to all of them, i.e.:
[ISO] # mount /dev/bootDevice /mnt/boot
Try using your system's pacman:
[ISO] # arch-chroot /mnt [chroot] # pacman -Syu
If that fails, exit the chroot, and try:
[ISO] # pacman -Syu --sysroot /mnt
If that fails, try:
[ISO] # pacman -Syu --root /mnt --cachedir /mnt/var/cache/pacman/pkg
IRC collaborative debugging
For requesting help from an IRC help channel (like #archlinux), you can use collaborative debugging services (like pastebin) to give IRC users details about problems you are seeing or configuration files you need referenced.
IRC usage
When you tell the people in the chat-room what your problem is, sometimes they will need to know additional information. This could be the output (for example) of a command or the contents of a configuration file. It is a general rule for IRC channels to never paste text greater than three lines. When you need to do more, paste services (e.g. pastebin) allow temporary use of storing text information. To prevent from having to write the information down physically and then type it manually into an IRC channel this is where it becomes useful to use collaborative debugging program that can send the information to a paste service. There are several tools that can be used that can send information to a pastebin service.
Output errors/messages to file
Many of these programs will need to have a file to upload. If you are using a program that you need to share its output you can put it in a text file by doing:
program > program-output.txt 2>&1
For example:
fdisk -l > partitions.txt 2>&1
It will redirect all output to a text file (both standard output and standard error output) and can be uploaded to a pastebin service.
Console installer questions
Occasionally you might need to actually show a picture of what your question is about (e.g. if you have a question about a console-based installer). For this you can use fbshot. fbshot is a framebuffer screenshot program. To take a screenshot of the first console (Ctrl+Alt+F1
):
fbshot -c 1 console1.png
Then you can use links and a image-hosting website to upload the image.
fuser
fuser is a command-line utility for identifying processes using resources such as files, filesystems and TCP/UDP ports.
fuser is provided by the psmisc package, which should be already installed as a dependency of the base meta package. See fuser(1) for details.
Session permissions
/usr/lib/udev/rules.d/70-uaccess.rules
and [2])First, make sure you have a valid local session within X:
$ loginctl show-session $XDG_SESSION_ID
This should contain Remote=no
and Active=yes
in the output. If it does not, make sure that X runs on the same tty where the login occurred. This is required in order to preserve the logind session.
Basic polkit actions do not require further set-up. Some polkit actions require further authentication, even with a local session. A polkit authentication agent needs to be running for this to work. See polkit#Authentication agents for more information on this.
If, while using a program, you get an error similar to:
error while loading shared libraries: libusb-0.1.so.4: cannot open shared object file: No such file or directory
Use pacman or pkgfile to search for the package that owns the missing library:
$ pacman -F libusb-0.1.so.4
extra/libusb-compat 0.1.5-1 usr/lib/libusb-0.1.so.4
In this case, the libusb-compat package needs to be installed.
The error could also mean that the package that you used to install your program does not list the library as a dependency in its PKGBUILD: if it is an official package, report a bug; if it is an AUR package, report it to the maintainer using its page in the AUR website.
Message: "file: could not find any magic files!"
If you see this message, it likely indicates that a package update has corrupted the dynamic linker run-time bindings file and your system is now essentially crippled. You will not be able to recompile or reinstall the package responsible or rebuild the initramfs until you fix it.
Problem
A package update likely added an invalid filename.conf
to the directory /etc/ld.so.conf.d
or edited /etc/ld.so.conf
incorrectly. The result is that the dynamic linker run-time bindings file /etc/ld.so.cache
is being re-generated with invalid data. This can potentially cause all programs on the system that depend on shared libraries to fail (ie. almost all of them).
Solution
- Boot with the Arch Linux Installation CD.
- Mount your root
/
filesystem on/mnt
and your/boot
filesystem on/mnt/boot
and chroot into the broken system by executingarch-chroot /mnt
as the root user. - Examine the file
/etc/ld.so.conf
and remove any invalid lines found. - Examine the files located in the directory
/etc/ld.so.conf.d/
and remove any invalid files. - Rebuild the dynamic linker run-time bindings file
/etc/ld.so.cache
by executingldconfig
as the root user. - Rebuild the Initramfs by executing
mkinitcpio -p linux
as the root user. - Exit the chroot, unmount filesystems, and reboot back into your installed system.
See also
- A how-to in troubleshooting for newcomers
- List of Tools for UBCD - Memtest-like tools to add to grub.cfg on UltimateBootCD.com
- Wikipedia:BIOS Boot partition
- REISUB
- Debug Logging to a Serial Console on Freedesktop.org
- How to Isolate Linux ACPI Issues on Archive.org