General troubleshooting: Difference between revisions

From ArchWiki
(Add zh-hans link.)
(→‎General procedures: link to Systemd/Journal#Filtering output for the filtering options)
 
(153 intermediate revisions by 29 users not shown)
Line 1: Line 1:
[[Category:System administration]]
[[Category:System administration]]
[[Category:System recovery]]
[[Category:System recovery]]
[[Category:Getting and installing Arch]]
[[Category:Installation process]]
[[es:General troubleshooting]]
[[es:General troubleshooting]]
[[fr:General troubleshooting]]
[[ja:一般的なトラブルシューティング]]
[[ja:一般的なトラブルシューティング]]
[[pt:General troubleshooting]]
[[ru:General troubleshooting]]
[[ru:General troubleshooting]]
[[zh-hans:General troubleshooting]]
[[zh-hans:General troubleshooting]]
Line 9: Line 11:
{{Related|Reporting bug guidelines}}
{{Related|Reporting bug guidelines}}
{{Related|Step-by-step debugging guide}}
{{Related|Step-by-step debugging guide}}
{{Related|Debug - Getting Traces}}
{{Related|Debugging/Getting traces}}
{{Related|IRC Collaborative Debugging}}
{{Related articles end}}
{{Related articles end}}
This article explains some methods for general troubleshooting. For application specific issues, please reference the particular wiki page for that program.
This article explains some methods for general troubleshooting. For application specific issues, please reference the particular wiki page for that program.


== General procedures ==
== General procedures ==


=== Attention to detail ===
{{Expansion|Given the name of this page, basic "no-brainer" solutions should be mentioned somewhere, such as doing a cold boot (maybe explicitly pointing out it's done differently for a basic tower vs a laptop), updating to the latest versions, etc…}}
 
In order to resolve an issue that you are having, it is ''absolutely crucial'' to have a firm basic understanding of how that specific subsystem functions. How does it work, and what does it need to run without error? If you cannot comfortably answer these question then you would best review the [[Table of contents|Archwiki]] article for the subsystem that you are having trouble with. Once you feel like you have understood it, it will be easier for you to pinpoint the cause of the problem.
 
=== Questions/checklist ===
 
The following gives a number of questions for you whenever dealing with a malfunctioning system. Under each question there are notes explaining how you should be answering each question, followed by some light examples on how to easily gather data output and what tools can be used to review logs and the journal.
 
# What is the issue(s)?
#: Be ''as precise as possible''. This will help you not get confused and/or side-tracked when looking up specific information.
# Are there error messages? (if any)
#: Copy and paste ''full outputs'' that contain '''error messages''' related to your issue into a separate file, such as {{ic|$HOME/issue.log}}. For example, to forward the output of the following [[mkinitcpio]] command to {{ic|$HOME/issue.log}}:
#: {{bc|$ mkinitcpio -p linux >> $HOME/issue.log}}
# Can you reproduce the issue?
#: If so, give ''exact'' '''step-by-step''' instructions/commands needed to do so.
# When did you first encounter these issues and what was changed between then and when the system was operating without error?
#:If it occurred right after an update then, list '''all packages that were updated'''. Include ''version numbers'', also, paste the entire update from [[pacman]].log ({{ic|/var/log/pacman.log}}). Also take note of the statuses of ''any'' service(s) needed to support the malfunctioning application(s) using [[systemd]]'s systemctl tools. For example, to forward the output of the following [[Systemd#Basic_systemctl_usage|systemd]] command to {{ic|$HOME/issue.log}}:
#: {{bc|$ systemctl status dhcpcd@eth0.service >> $HOME/issue.log}}
#: {{Note|Using {{ic|'''>>'''}} will ensure any previous text in {{ic|$HOME/issue.log}} will not be overwritten.}}
 
=== Approach ===
 
Rather than approaching an issue by stating,
 
''Application X does not work.''


you will find it more helpful to formulate your issue in the context of the system as a whole, as:
It is crucial to ''always'' read any error messages that appear. Sometimes it may be hard, e.g with graphical applications, to get a proper error message.


''Application X produces Y error(s) when performing Z tasks under conditions A and B.''
# Run the application in a terminal so it is possible to inspect the output.
## Increase the verbosity (usually {{ic|--verbose}}/{{ic|-v}}/{{ic|-V}} or {{ic|--debug}}/{{ic|-d}}) if there is still not enough information to debug.
## Sometimes there is no such parameter and it needs to be specified as a directive in the applications' configuration file.
## An application may also use log files, which are usually located in {{ic|/var/log}}, {{ic|$HOME/.cache}} or {{ic|$HOME/.local}}
## If there is no way to increase the verbosity, it is always possible to run [[strace]] and similar.
# Check the [[journal]]. It is possible that an error may also leave traces in the journal, especially if it depends on other applications.
## ''dmesg'' reads from the kernel ring buffer. This is useful if the disk is for some reason inaccessible but this may also result in incomplete logs because the kernel ring buffer is not infinite in size. Use ''journalctl'' if possible.
## ''journalctl'' has more [[systemd/Journal#Filtering output|filtering options]] than ''dmesg'' and uses human-readable timestamps by default.
# It is always recommended to check the relevant issue trackers to see if there are known issues with already existing solutions.
## Depending on upstreams' choices, there is usually an issue tracker and sometimes also a forum or even e.g an IRC channel.
## There is the [https://gitlab.archlinux.org/groups/archlinux/packaging/-/issues Arch Linux bug tracker], which should be primarily used for packaging bugs.


=== Additional support ===
=== Additional support ===


With all the information in front of you you should have a good idea as to what is going on with the system and you can now start working on a proper fix.
If you require any additional support, you may ask on [https://bbs.archlinux.org the forums] or on [[Arch IRC channels#Collaborative debugging|IRC]].
 
If you require any additional support, it can be found on [https://bbs.archlinux.org the forums] or IRC at irc.freenode.net #archlinux. See [[IRC channels]] for other options.


{{Note|[[Code of conduct#Arch Linux distribution support .2Aonly.2A|Support is provided for Arch Linux ''only'']] and not [[Arch-based distributions]].}}
{{Note|[[Code of conduct#arch-linux-distribution-support-only|Support is provided for Arch Linux ONLY]] and not [[Arch-based distributions]].}}


When asking for support post the '''complete''' output/logs, not just what you think are the significant sections. Sources of information include:
When asking for support post the '''complete''' output/logs, not just what you think are the significant sections. Sources of information include:


* Full output of any command involved - do not just select what you think is relevant.
* Full output of any command involved - do not just select what you think is relevant.
* Output from systemd's {{ic|journalctl}}. For more extensive output, use the {{ic|1=systemd.log_level=debug}} boot parameter.
* systemd's [[journal]].
* Log files (have a look in {{ic|/var/log}})
** For more extensive output, use the {{ic|1=systemd.log_level=debug}} boot parameter. This will produce a tremendous amount of output, so only enable it if it really needed.
** Do not use the {{ic|-x}} parameter because this needlessly clutters the output and makes it harder to read.
** Use {{ic|-b}} unless you need logs from a previous boot. Not specifying this may lead to extremely large pastes that may even be too big for any pastebins.
* Relevant configuration files
* Relevant configuration files
* Drivers involved
* Drivers involved
* Versions of packages involved
* Versions of packages involved
* Kernel: {{ic|dmesg}}. For a boot problem, at least the last 10 lines displayed, preferably more
* Kernel: {{ic|journalctl -k}} or {{ic|dmesg}} (both with root privileges).
* Networking: Exact output of commands involved, and any configuration files
* [[Xorg]]: depending on the setup the [[display manager]] in use is relevant here, too.
* Xorg: {{ic|/var/log/Xorg.0.log}}, and prior logs if you have overwritten the problematic one
** {{ic|Xorg.log}} may be located in one of several places: the system journal, {{ic|/var/log/}} or {{ic|$HOME/.local/share/xorg/}}.
* Pacman: If a recent upgrade broke something, look in {{ic|/var/log/pacman.log}}
** Some display managers like [[LightDM]] may also place the {{ic|Xorg.log}} in its own log directory.
 
* [[Pacman]]: If a recent upgrade broke something, look in {{ic|/var/log/pacman.log}}.
One of the better ways to post this information is to use an online pastebin. You can [[install]] the {{pkg|pbpst}} or {{pkg|gist}} package to automatically upload information. For example, to upload the content of your systemd journal from this boot you would do:
** It may be useful to use ''pacman''<nowiki/>'s {{ic|--debug}} parameter.


# journalctl -xb | pbpst -S
One of the better ways to post this information is to use a [[Arch IRC channels#Collaborative debugging|pastebin]].


A link will then be output that you can paste to the forum or IRC.
A link will then be output that you can paste to the forum or IRC.


Additionally, before posting your question, you may wish to review [http://www.catb.org/esr/faqs/smart-questions.html how to ask smart questions]. See also [[Code of conduct]].
Additionally, you may wish to review [http://jdebp.info/FGA/problem-report-standard-litany.html how to properly report issues] before asking.


== Boot problems ==
== Boot problems ==


Diagnosing errors during the [[boot process]] involves changing the [[kernel parameters]], and rebooting the system.
{{Expansion|From [[Talk:Installation guide#Buggy graphics driver]], maybe add something about trying {{ic|nomodeset}} on some hardware?}}
 
When diagnosing boot problems, it is very important to know in which stage the boot fails.
 
# Firmware (UEFI or BIOS)
## Usually only has very basic tools for debugging.
## Make sure [[Secure Boot]] is disabled.
# [[Boot loader]]
## One of the most common things done here is the changing of kernel parameters.
# [[initramfs]]
## Usually provides an emergency shell.
## Depending on the hooks chosen, either the dmesg or the journal is available within it.
# The actual system
## Depending on how badly it is broken, a simple invocation of the [[#Recovery shells|debug shell]] may suffice here.


If booting the system is not possible, boot from a [https://www.archlinux.org/download/ live image] and [[change root]] to the existing system.
If the debugging tools provided by any stage are not enough to fix the broken component, try using a e.g [[Install Arch Linux on a removable medium|USB stick with the latest Arch Linux ISO]] on it.


=== Console messages ===
=== Console messages ===
Line 86: Line 85:
After the boot process, the screen is cleared and the login prompt appears, leaving users unable to read init output and error messages. This default behavior may be modified using methods outlined in the sections below.
After the boot process, the screen is cleared and the login prompt appears, leaving users unable to read init output and error messages. This default behavior may be modified using methods outlined in the sections below.


Note that regardless of the chosen option, kernel messages can be displayed for inspection after booting by using {{ic|dmesg}} or all logs from the current boot with {{ic|journalctl -b}}.
Note that regardless of the chosen option, kernel messages can be displayed for inspection after booting by using {{ic|journalctl -k}} or {{ic|dmesg}}. To display all logs from the current boot use {{ic|journalctl -b}}.


==== Flow control ====
==== Flow control ====


This is basic management that applies to most terminal emulators, including virtual consoles (vc):
This is basic management that applies to most terminal emulators, including virtual consoles (VC):


* Press {{ic|Ctrl+S}} to pause the output
* Press {{ic|Ctrl+s}} to pause the output.
* And {{ic|Ctrl+Q}} to resume it
* And {{ic|Ctrl+q}} to resume it.


This pauses not only the output, but also programs which try to print to the terminal, as they will block on the {{ic|write()}} calls for as long as the output is paused. If your ''init'' appears frozen, make sure the system console is not paused.
This pauses not only the output, but also programs which try to print to the terminal, as they will block on the {{ic|write()}} calls for as long as the output is paused. If your ''init'' appears frozen, make sure the system console is not paused.
Line 99: Line 98:
To see error messages which are already displayed, see [[Getty#Have boot messages stay on tty1]].
To see error messages which are already displayed, see [[Getty#Have boot messages stay on tty1]].


==== Scrollback ====
==== Printing more kernel messages ====
 
Most kernel messages are hidden during boot. You can see more of these messages by adding different kernel parameters. The simplest ones are:
 
* {{ic|debug}}, which has the following effects:
** The kernel will raise its console [https://docs.kernel.org/core-api/printk-basics.html logging level] such that all messages in the kernel log buffer will be printed to the console. [https://docs.kernel.org/admin-guide/kernel-parameters.html]
** [[systemd]] will raise its log level such that it will log debug messages that otherwise would not be produced anywhere. [https://github.com/systemd/systemd/blob/v251/src/basic/log.c#L1125-L1126]
* {{ic|ignore_loglevel}}, which has the same effect on the kernel as {{ic|debug}} or {{ic|1=loglevel=8}} (since debug messages are at {{ic|7}}), but prevents the log level from being raised later in the boot.
 
Other parameters you can add that might be useful in certain situations are:
 
* {{ic|1=earlyprintk=vga,keep}} prints kernel messages very early in the boot process, in case the kernel would crash before output is shown. You must change {{ic|vga}} to {{ic|efi}} for [[EFI]] systems.
* {{ic|1=log_buf_len=16M}} allocates a larger (16 MiB) kernel message buffer, to ensure that debug output is not overwritten.
 
==== Producing debug kernel messages ====
 
[[#Printing more kernel messages]] indicates how to print of the kernel log buffer to the console, but that buffer itself won't contain any messages it didn't already (aside from the debug systemd output). This heading discusses methods for getting more detailed information out of the kernel log.
 
===== Dynamic debugging =====
 
Messages printed with [https://docs.kernel.org/core-api/printk-basics.html#c.pr_debug pr_debug] or related functions such as {{ic|dev_dbg()}}, {{ic|drm_dbg()}}, and {{ic|bt_dev_dbg()}} will not be produced unless you either:
 
* Modify the kernel source to define {{ic|DEBUG}} where desired.
* Utilize the kernel's [https://docs.kernel.org/admin-guide/dynamic-debug-howto.html dynamic debug] feature to enable debugging messages.
 
This section will discuss how to use dynamic debug, which is useful if you have already looked at your kernel log with everything up to informational logs, and would like even more debugging information from a particular location.
 
Firstly, you must be running a kernel that was compiled with the {{ic|CONFIG_DYNAMIC_DEBUG}} kernel configuration option set. This is already the case for {{Pkg|linux}}, so no action is required if you are using that kernel.
 
Then, you need to know where you want to see debug messages from. A couple of options are:
* Going with the kernel module name, if the issue seems to be isolated to a module. For example, to troubleshoot [[Intel graphics]], you might concern yourself with the {{ic|i915}} DRM [[kernel module]].
* Going with a directory in the kernel that corresponds with functionality you are interested in. You will want to check out (or navigate online) the [[kernel]] source code to understand the structure. As an example, to inspect debug messages for all DRM kernel modules, you could go with the path [https://github.com/torvalds/linux/tree/v5.19/drivers/gpu/drm drivers/gpu/drm].
 
Using that "source" of messages, you have to come up with a dynamic debug query that indicates which debug messages to enable, of the format:
 
''match_type'' ''match_parameter'' ''flags''
 
Where:
* ''match_type'' is the type of match to make. Corresponding to the two options given earlier, this could be {{ic|module}} or {{ic|file}}.
* ''match_parameter'' is the module or file path to watch. In the latter case, using asterisks for wildcards is permissible.
* ''flags'' dictates what to do with the match. This could be {{ic|+p}} to start printing its messages, or {{ic|-p}} to undo that.
 
Some examples of queries are:
 
* {{ic|module i915 +p}} to print debug messages from the {{ic|i915}} kernel module.
* {{ic|file drivers/gpu/drm/* +p}} to print debug messages from DRM drivers.
* {{ic|file * +p}} to print debug messages.


Scrollback allows the user to go back and view text which has scrolled off the screen of a text console. This is made possible by a buffer created between the video adapter and the display device called the scrollback buffer.  By default, the key combinations of {{ic|Shift+PageUp}} and {{ic|Shift+PageDown}} scroll the buffer up and down.
Finally, to actually enact the query, you can either:


If scrolling up all the way does not show you enough information, you need to expand your scrollback buffer to hold more output. This is done by tweaking the kernel's framebuffer console (fbcon) with the [[kernel parameter]] {{ic|1=fbcon=scrollback:Nk}} where {{ic|N}} is the desired buffer size is kilobytes. The default size is 32k.
* Do so during runtime, by running:


If this does not work, your framebuffer console may not be properly enabled. Check the [https://www.kernel.org/doc/Documentation/fb/fbcon.txt Framebuffer Console documentation] for other parameters, e.g. for changing the framebuffer driver.
# echo "''query''" > /sys/kernel/debug/dynamic_debug/control


==== Debug output ====
:This assumes that [[Wikipedia:debugfs|debugfs]] is mounted at {{ic|/sys/kernel/debug/}}, which you can verify using {{ic|mount | grep debugfs}}. [https://stackoverflow.com/a/63682160]
* Do so at boot, by adding the {{ic|1=dyndbg="''query''"}} [[kernel parameter]]


Most kernel messages are hidden during boot. You can see more of these messages by adding different kernel parameters. The simplest ones are:
This is a greatly simplified overview of dynamic debug's capabilities; see [https://docs.kernel.org/admin-guide/dynamic-debug-howto.html#command-language-reference the documentation] for further details.
 
===== Subsystem-specific debugging =====


* {{ic|debug}} enables debug messages for both the kernel and [[systemd]]
There are also a number of separate debug parameters for enabling debugging in specific subsystems e.g. {{ic|bootmem_debug}}, {{ic|sched_debug}}. Also, {{ic|initcall_debug}} can be useful to investigate boot freezes. (Look for calls that did not return.) Check the [https://docs.kernel.org/admin-guide/kernel-parameters.html kernel parameter documentation] for specific information.
* {{ic|ignore_loglevel}} forces ''all'' kernel messages to be printed


Other parameters you can add that might be useful in certain situations are:
==== netconsole ====
* {{ic|1=earlyprintk=vga,keep}} prints kernel messages very early in the boot process, in case the kernel would crash before output is shown. You must change {{ic|vga}} to {{ic|efi}} for [[EFI]] systems
* {{ic|1=log_buf_len=16M}} allocates a larger (16MB) kernel message buffer, to ensure that debug output is not overwritten


There are also a number of separate debug parameters for enabling debugging in specific subsystems e.g. {{ic|bootmem_debug}}, {{ic|sched_debug}}. Check the [https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt kernel parameter documentation] for specific information.
[https://docs.kernel.org/networking/netconsole.html netconsole] is a kernel module that sends all kernel log messages (i.e. dmesg) over the network to another computer, without involving user space (e.g. syslogd). Name "netconsole" is a misnomer because it is not really a "console", more like a remote logging service.


{{Note|If you cannot scroll back far enough to view the desired boot output, you should increase the size of the [[#Scrollback|scrollback buffer]].}}
It can be used either built-in or as a module. Built-in ''netconsole'' initializes immediately after NIC cards and will bring up the specified interface as soon as possible. The module is mainly used for capturing kernel panic output from a headless machine, or in other situations where the user space is no more functional.


=== Recovery shells ===
=== Recovery shells ===


Getting an interactive shell at some stage in the boot process can help you pinpoint exactly where and why something is failing. There are several kernel parameters for doing so, but they all launch a normal shell which you can {{ic|exit}} to let the kernel resume what it was doing:
Getting an interactive shell at some stage in the boot process can help you pinpoint exactly where and why something is failing. There are several kernel parameters for doing so, but they all launch a normal shell which you can {{ic|exit}} to let the kernel resume what it was doing:
* {{ic|rescue}} launches a shell shortly after the root filesystem is remounted read/write
* {{ic|emergency}} launches a shell even earlier, before most filesystems are mounted
* {{ic|1=init=/bin/sh}} (as a last resort) changes the init program to a root shell. {{ic|rescue}} and {{ic|emergency}} both rely on [[systemd]], but this should work even if ''systemd'' is broken
Another option is systemd's debug-shell which adds a root shell on {{ic|tty9}} (accessible with Ctrl+Alt+F9). It can be enabled by either adding {{ic|systemd.debug-shell}} to the [[kernel parameters]], or by [[enabling]] {{ic|debug-shell.service}}. Take care to disable the service when done to avoid the security risk of leaving a root shell open on every boot.
=== Blank screen with Intel video ===


This is most likely due to a problem with [[kernel mode setting]]. Try [[Kernel mode setting#Disabling modesetting|disabling modesetting]] or changing the [[Intel#KMS Issue: console is limited to small area|video port]].
* {{ic|rescue}} launches a shell shortly after the root file system is remounted read/write
* {{ic|emergency}} launches a shell even earlier, before most file systems are mounted
* {{ic|1=init=/bin/sh}} (as a last resort) changes the init program to a root shell. {{ic|rescue}} and {{ic|emergency}} both rely on [[systemd]], but this should work even if ''systemd'' is broken.


=== Stuck while loading the kernel ===
Another option is systemd's debug-shell which adds a root shell on {{ic|tty9}} (accessible with {{ic|Ctrl+Alt+F9}}). It can be enabled by either adding {{ic|systemd.debug_shell}} to the [[kernel parameters]], or by [[enabling]] {{ic|debug-shell.service}}.


Try disabling ACPI by adding the {{ic|1=acpi=off}} kernel parameter.
{{Warning|Remember to disable the service when done to avoid the security risk of leaving a root shell open on every boot.}}


=== Debugging kernel modules ===
=== Debugging kernel modules ===
Line 147: Line 187:
* You can display extra debugging information about your hardware by following [[udev#Debug output]].
* You can display extra debugging information about your hardware by following [[udev#Debug output]].
* Ensure that [[Microcode]] updates are applied on your system.
* Ensure that [[Microcode]] updates are applied on your system.
* Test your device's RAM with [http://www.memtest.org/ Memtest86+]. Unstable RAM may lead to some extremely odd issues, ranging from random crashes to data corruption.
* To test the RAM, see [[Stress testing#MemTest86+]].
* To see if your system is overheating, use [[lm_sensors]].
* To check your storage health, see [[S.M.A.R.T.]]
 
== Debugging freezes ==
 
Unfortunately, freezes are usually hard to debug and some of them take a lot of time to reproduce. There are some types of freezes which are easier to debug than others:
 
* Is sound still playing? If so, just the display may be frozen. This may be a problem with the video driver.
* Is the machine still responding? Try [[SSH]] if switching to another [[Tty|TTY]] does not work.
* Is the disk activity LED (if present) indicating that a lot is being written to disk? Heavy swapping may temporarily freeze the system. See [https://unix.stackexchange.com/a/340567 this StackExchange answer] for information about freezes on large writes.


== Kernel panics ==
If nothing else helps, try a '''clean''' shutdown. Pressing the power button ''once'' may unfreeze the system and show the classic "shutdown screen" which displays all the units that are getting stopped. Alternatively, using the magic [[SysRq]] keys may also help to achieve a clean shutdown. This is very important because the [[journal]] may contain hints why the machine froze. The journal may not be written to disk on an unclean shutdown. Hard freezes in which the whole machine is unresponsible are harder to debug since logs can not be written to disk in time.


A ''kernel panic'' occurs when the Linux kernel enters an unrecoverable failure state. The state typically originates from buggy hardware drivers resulting in the machine being deadlocked, non-responsive, and requiring a reboot. Just prior to deadlock, a diagnostic message is generated, consisting of: the ''machine state'' when the failure ocurred, a ''call trace'' leading to the kernel function that recognized the failure, and a listing of currently loaded modules. Thankfully, kernel panics do not happen very often using ''mainline'' versions of the kernel--such as those supplied by the official repositories--but when they do happen, you need to know how to deal with them.
Remote logging may help if the freeze does not permit writing anything to disk. A crude remote logging solution, which needs to be invoked from another device, can be used for basic debugging:
{{Note|Kernel panics are sometimes referred to as ''oops'' or ''kernel oops''.  While both panics and oops occur as the result of a failure state, an ''oops'' is more general in that it does not ''necessarily'' result in a deadlocked machine--sometimes the kernel can recover from an oops by killing the offending task and carrying on.}}


{{Tip|Pass the kernel parameter {{ic|1=oops=panic}} at boot or write {{ic|1}} to {{ic|/proc/sys/kernel/panic_on_oops}} to force a recoverable oops to issue a panic instead. This is advisable if you are concerned about the small chance of system instability resulting from an oops recovery which may make future errors difficult to diagnose.}}
  $ ssh ''freezing_host'' journalctl -f


=== Examine panic message ===
Many fatal freezes in which the whole system does not respond anymore and require a forced shutdown may be related to buggy firmware, drivers or hardware. Trying a different kernel (see [[Kernel#Debugging regressions]]) or even a different Linux distribution or operating system, updating the firmware and running hardware diagnostics may help finding the problem.


If a kernel panic occurs very early in the boot process, you may see a message on the console containing "Kernel panic - not syncing:", but once [[Systemd]] is running, kernel messages will typically be captured and written to the system log. However, when a panic occurs, the diagnostic message output by the kernel is ''almost never'' written to the log file on disk because the machine deadlocks before {{ic|system-journald}} gets the chance. Therefore, the only way to examine the panic message is to view it on the console as it happens (without resorting to setting up a ''kdump crashkernel''). You can do this by booting with the following kernel parameters and attempting to reproduce the panic on tty1:
{{Tip|It is recommended to try to update the firmware of the device, since these updates may fix strange issues.}}


{{bc|1=systemd.journald.forward_to_console=1 console=tty1}}
If a freeze does not permit gathering any kind of logs or other information required for debugging, try reproducing the freeze in a live environment. If a graphical environment is required to reproduce the freeze or if the freeze can be reproduced on the archiso, use the live environment of a different distribution, which is preferably not based on Arch Linux to eliminate the possibility that the freeze is related to the version or patches of the kernel.
Should the freeze still happen in a live environment, chances are that it ''may'' be hardware-related. If it does not happen anymore, it is necessary to be aware of the differences of both systems. Different configurations, differences in versions and kernel parameters and other, similar changes may have fixed the freeze.


{{Tip|In the event that the panic message scrolls away too quickly to examine, try passing the kernel parameter {{ic|1=pause_on_oops=''seconds''}} at boot.}}
However, a blinking caps lock LED may indicate a [[kernel panic]]. Some setups may not show the TTY when a kernel panic occurred, which may be confusing and can be interpreted as another kind of freeze.


==== Example scenario: bad module ====
== Debugging regressions ==


It is possible to make a best guess as to what subsystem or module is causing the panic using the information in the diagnostic message. In this scenario, we have a panic on some imaginary machine during boot. Pay attention to the lines highlighted in '''bold''':
{{Warning|This tends to cause [[partial upgrade]]s, which are a necessary evil in this specific case. Proceed with caution and prepare a [[Installation guide#Prepare an installation medium|method to recover your system]], just in case this scenario prevents a normal boot.}}


{{bc|'''kernel: BUG: unable to handle kernel NULL pointer dereference at (null)''' [1]
If an update causes an issue but [[downgrading]] the specific package fixes it, it is likely a [[wikipedia:Software regression|regression]]. If this happened after a normal full system upgrade, check your {{ic|pacman.log}} to determine which package(s) may have caused the issue. The most important part of debugging regressions is checking if the issue was already fixed, as this can save much time. To do so, first ensure the application is fully updated (e.g ensure the application is the same version as in the [[official repositories]]). If it already is or if updating it does not fix the issue, try using the actual latest version, usually a [[Arch User Repository#What is the difference between foo and foo-git packages?|-git]] version, which may already be packaged in the [[AUR]]. If this fixes the issue and the version with the fixes is not yet in the official repositories, wait until the new version arrives in them and then switch back to it.
'''kernel: IP: fw_core_init+0x18/0x1000 [firewire_core]''' [2]
kernel: PGD 718d00067
kernel: P4D 718d00067
kernel: PUD 7b3611067
kernel: PMD 0
kernel:
kernel: Oops: 0002 [#1] PREEMPT SMP
'''kernel: Modules linked in: firewire_core(+) crc_itu_t cfg80211 rfkill ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG nf_conntrack_ipv4 ...''' [3]  
kernel: CPU: 6 PID: 1438 Comm: modprobe Tainted: P          O    4.13.3-1-ARCH #1
kernel: Hardware name: Gigabyte Technology Co., Ltd. H97-D3H/H97-D3H-CF, BIOS F5 06/26/2014
kernel: task: ffff9c667abd9e00 task.stack: ffffb53b8db34000
kernel: RIP: 0010:fw_core_init+0x18/0x1000 [firewire_core]
kernel: RSP: 0018:ffffb53b8db37c68 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffffc16d3af4
kernel: RBP: ffffb53b8db37c70 R08: 0000000000000000 R09: ffffffffae113e95
kernel: R10: ffffe93edfdb9680 R11: 0000000000000000 R12: ffffffffc16d9000
kernel: R13: ffff9c6729bf8f60 R14: ffffffffc16d5710 R15: ffff9c6736e55840
kernel: FS:  00007f301fc80b80(0000) GS:ffff9c675dd80000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000000 CR3: 00000007c6456000 CR4: 00000000001406e0
kernel: Call Trace:
'''kernel:  do_one_initcall+0x50/0x190''' [4]
kernel:  ? do_init_module+0x27/0x1f2
kernel:  do_init_module+0x5f/0x1f2
kernel:  load_module+0x23f3/0x2be0
kernel:  SYSC_init_module+0x16b/0x1a0
kernel:  ? SYSC_init_module+0x16b/0x1a0
kernel:  SyS_init_module+0xe/0x10
kernel:  entry_SYSCALL_64_fastpath+0x1a/0xa5
kernel: RIP: 0033:0x7f301f3a2a0a
kernel: RSP: 002b:00007ffcabbd1998 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
kernel: RAX: ffffffffffffffda RBX: 0000000000c85a48 RCX: 00007f301f3a2a0a
kernel: RDX: 000000000041aada RSI: 000000000001a738 RDI: 00007f301e7eb010
kernel: RBP: 0000000000c8a520 R08: 0000000000000001 R09: 0000000000000085
kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000c79208
kernel: R13: 0000000000c8b4d8 R14: 00007f301e7fffff R15: 0000000000000030
kernel: Code: <c7> 04 25 00 00 00 00 01 00 00 00 bb f4 ff ff ff e8 73 43 9c ec 48
kernel: RIP: fw_core_init+0x18/0x1000 [firewire_core] RSP: ffffb53b8db37c68
kernel: CR2: 0000000000000000
kernel: ---[ end trace 71f4306ea1238f17 ]---
'''kernel: Kernel panic - not syncing: Fatal exception''' [5]
kernel: Kernel Offset: 0x80000000 from 0xffffffff810000000 (relocation range: 0xffffffff800000000-0xfffffffffbffffffff
kernel: ---[ end Kernel panic - not syncing: Fatal exception}}


* [1] Indicates the type of error that caused the panic. In this case it was a programmer bug.
If the issue still persists, debug the issue and/or [[bisect]] the application and report the bug on the upstream bug tracker so it can be fixed.
* [2] Indicates that the panic happened in a function called ''fw_core_init'' in module ''firewire_core''.
* [3] Indicates that ''firewire_core'' was the latest module to be loaded.
* [4] Indicates that the function that called function ''fw_core_init'' was ''do_one_initcall''.
* [5] Indicates that this ''oops'' message is, in fact, a kernel panic and the system is now deadlocked.


We can surmise then, that the panic occurred during the initialization routine of module ''firewire_core'' as it was loaded.  (We might assume then, that the machine's firewire hardware is incompatible with this version of the firewire driver module due to a programmer error, and will have to wait for a new release.) In the meantime, the easiest way to get the machine running again is to prevent the module from being loaded. We can do this in one of two ways:
{{Note|The kernel needs a [[Kernel#Debugging regressions|slightly different approach]] when debugging regressions.}}


* If the module is being loaded during the execution of the ''initramfs'', reboot with the kernel parameter {{ic|1=rd.blacklist=firewire_core}}.
== Cannot use some peripherals after kernel upgrade ==
* Otherwise reboot with the kernel parameter {{ic|1=module_blacklist=firewire_core}}.


=== Reboot into root shell and fix problem ===
This will manifest commonly (but probably not only) as:


You will need a root shell to make changes to the system so the panic no longer occurs. If the panic occurs on boot, there are several strategies to obtain a root shell before the machine deadlocks:
* newly plugged USB devices showing up with ''dmesg'' but not in {{ic|/dev/}},
* file systems unable to be mounted if they were not already used before the kernel update,
* the inability to use a wired/wireless connection on a laptop if it was not already used before the kernel update,  
* {{ic|FATAL: Module ''module'' not found in directory /lib/module/''kernelversion''}} when using ''modprobe'' to load a module that was not already used before the kernel package update.


* Reboot with the kernel parameter {{ic|emergency}}, {{ic|rd.emergency}}, or {{ic|-b}} to receive a prompt to login just after the root filesystem is mounted and {{ic|systemd}} is started.
As partially covered in [[System maintenance#Restart or reboot after upgrades]], the kernel is not updated when you update the package but only once you reboot afterwards. Meanwhile, the kernel modules, located in {{ic|/usr/lib/modules/''kernelversion''/}} are removed by pacman when installing the new kernel. As explained in {{Bug|16702}}, this approach avoids leaving files on the system not handled by the package manager but leads to the aforementioned symptoms. To fix them, reboot systematically after updating the kernel. The long-term evolution, yet to be implemented, will be to use versioned kernel packages : the main blocker being how to handle the removal of the previous kernel versions once they are not needed anymore.
: {{Note|At this point, the root filesystem will be mounted '''read-only'''. Execute {{ic|# mount -o remount,rw /}} to make changes.}}
* Reboot with the kernel parameter {{ic|rescue}}, {{ic|rd.rescue}}, {{ic|single}}, {{ic|s}}, {{ic|S}}, or {{ic|1}} to receive a prompt to login just after local filesystems are mounted.
* Reboot with the kernel parameter {{ic|1=systemd.debug-shell=1}} to obtain a very early root shell on tty9.  Switch to it with by pressing {{ic|Ctrl-Alt-F9}}.
* Experiment by rebooting with different sets of kernel parameters to possibly disable the kernel feature that is causing the panic.  Try the "old standbys" {{ic|1=acpi=off}} and {{ic|nolapic}}.
: {{Tip|See {{ic|Documentation/admin-guide/kernel-parameters.txt}} in the Linux kernel source tree for all parameters.}}
* As a last resort, boot with the '''Arch Linux Installation CD''' and mount the root filesystem on {{ic|/mnt}} then execute {{ic|# arch-chroot /mnt}}.


Disable the service or program that is causing the panic, roll-back a faulty update, or fix a configuration problem.
Another solution is available as {{Pkg|kernel-modules-hook}}, where two pacman hooks use rsync to keep the kernel modules on the file system after the kernel update and {{ic|linux-modules-cleanup.service}} that marks the old modules for removal four weeks after once [[enable]]d.


== Package management ==
== Package management ==


See [[Pacman#Troubleshooting]] for general topics, and [[pacman/Package signing#Troubleshooting]] for issues with PGP keys.
See [[Pacman#Troubleshooting]] for general topics, and [[pacman/Package signing#Troubleshooting]] for issues with PGP keys.
=== Fixing a broken system ===
If a [[partial upgrade]] was performed, try updating your whole system. A reboot may be required.
# pacman -Syu
If you usually boot into a GUI and that is failing, perhaps you can press {{ic|Ctrl+Alt+F1}} through {{ic|Ctrl+Alt+F6}} and get to a working tty to run ''pacman'' through.
If the system is broken enough that you are unable to run ''pacman'', [[Installation guide#Pre-installation|boot using a monthly Arch ISO from a USB flash drive, an optical disc or a network with PXE]].  (Do not follow any of the rest of the installation guide.)
Mount your root file system:
[ISO] # mount /dev/''rootFileSystemDevice'' /mnt
Mount any other partitions that you created separately, adding the prefix {{ic|/mnt}} to all of them, i.e.:
[ISO] # mount /dev/''bootDevice'' /mnt/boot
Try using your system's ''pacman'':
[ISO] # arch-chroot /mnt
[chroot] # pacman -Syu
If that fails, exit the ''chroot'', and try:
[ISO] # pacman -Syu --sysroot /mnt
If that fails, try:
[ISO] # pacman -Syu --root /mnt --cachedir /mnt/var/cache/pacman/pkg


== fuser ==
== fuser ==
Line 248: Line 276:
{{Expansion|Needs more information about its usage}}
{{Expansion|Needs more information about its usage}}


''fuser'' is a command-line utility for identifying processes using resources such as files, filesystems and TCP/UDP ports.
''fuser'' is a command-line utility for identifying processes using resources such as files, file systems and TCP/UDP ports.
 
''fuser'' is provided by the {{Pkg|psmisc}} package, which should be already installed as part of the {{Grp|base}} group. See {{man|1|fuser}} for detail.
''fuser'' is provided by the {{Pkg|psmisc}} package, which should be already installed as a dependency of the {{Pkg|base}} [[meta package]]. See {{man|1|fuser}} for details.


== Session permissions ==
== Session permissions ==


{{Note|You must be using [[systemd]] as your init system for local sessions to work.[https://www.archlinux.org/news/d-bus-now-launches-user-buses/] It is required for polkit permissions and ACLs for various devices (see {{ic|/usr/lib/udev/rules.d/70-uaccess.rules}} and  [http://enotty.pipebreaker.pl/2012/05/23/linux-automatic-user-acl-management/])}}
{{Note|You must be using [[systemd]] as your init system for local sessions to work.[https://archlinux.org/news/d-bus-now-launches-user-buses/] It is required for polkit permissions and ACLs for various devices (see {{ic|/usr/lib/udev/rules.d/70-uaccess.rules}} and  [https://enotty.pipebreaker.pl/2012/05/23/linux-automatic-user-acl-management/])}}


First, make sure you have a valid local session within X:
First, make sure you have a valid local session within X:
Line 261: Line 289:


This should contain {{ic|1=Remote=no}} and {{ic|1=Active=yes}} in the output. If it does not, make sure that X runs on the same tty where the login occurred. This is required in order to preserve the logind session.
This should contain {{ic|1=Remote=no}} and {{ic|1=Active=yes}} in the output. If it does not, make sure that X runs on the same tty where the login occurred. This is required in order to preserve the logind session.
A D-Bus session should also be started along with X. See [[D-Bus#Starting the user session]] for more information on this.


Basic [[polkit]] actions do not require further set-up. Some polkit actions require further authentication, even with a local session. A polkit authentication agent needs to be running for this to work. See [[polkit#Authentication agents]] for more information on this.
Basic [[polkit]] actions do not require further set-up. Some polkit actions require further authentication, even with a local session. A polkit authentication agent needs to be running for this to work. See [[polkit#Authentication agents]] for more information on this.
Line 268: Line 294:
== Message: "error while loading shared libraries" ==
== Message: "error while loading shared libraries" ==


{{Accuracy|Or the program needs to be rebuilt after a [[System_maintenance#Partial_upgrades_are_unsupported|soname bump]].}}
If, while using a program, you get an error similar to:
 
If, while using a program, you get an error similar to:  


  error while loading shared libraries: libusb-0.1.so.4: cannot open shared object file: No such file or directory
  error while loading shared libraries: libusb-0.1.so.4: cannot open shared object file: No such file or directory
Line 276: Line 300:
Use [[pacman]] or [[pkgfile]] to search for the package that owns the missing library:
Use [[pacman]] or [[pkgfile]] to search for the package that owns the missing library:


{{hc|$ pacman -Fs libusb-0.1.so.4|
{{hc|$ pacman -F libusb-0.1.so.4|
extra/libusb-compat 0.1.5-1
extra/libusb-compat 0.1.5-1
     usr/lib/libusb-0.1.so.4
     usr/lib/libusb-0.1.so.4
}}
}}


In this case, the {{Pkg|libusb-compat}} package needs to be [[installed]].
In this case, the {{Pkg|libusb-compat}} package needs to be [[install]]ed. Alternatively, the program requesting the library may need to be rebuilt following a [[System maintenance#Partial upgrades are unsupported|soname bump]].


The error could also mean that the package that you used to install your program does not list the library as a dependency in its [[PKGBUILD]]: if it is an official package, [[report a bug]]; if it is an [[AUR]] package, report it to the maintainer using its page in the AUR website.
The error could also mean that the package that you used to install your program does not list the library as a dependency in its [[PKGBUILD]]: if it is an official package, [[report a bug]]; if it is an [[AUR]] package, report it to the maintainer using its page in the AUR website.
== Message: "file: could not find any magic files!" ==
If you see this message, it likely indicates that a package update has corrupted the dynamic linker run-time bindings file and your system is now essentially crippled.  You will not be able to recompile or reinstall the package responsible or rebuild the [[initramfs]] until you fix it.
=== Problem ===
A package update likely added an invalid {{ic|''filename''.conf}} to the directory {{ic|/etc/ld.so.conf.d}} or edited {{ic|/etc/ld.so.conf}} incorrectly.  The result is that the dynamic linker run-time bindings file {{ic|/etc/ld.so.cache}} is being re-generated with invalid data. This can potentially cause all programs on the system that depend on shared libraries to fail (ie. almost all of them).
=== Solution ===
# Boot with the '''''Arch Linux Installation CD'''''.
# Mount your root {{ic|/}} filesystem on {{ic|/mnt}} and your {{ic|/boot}} filesystem on {{ic|/mnt/boot}} and chroot into the broken system by executing {{ic|# arch-chroot /mnt}}.
# Examine the file {{ic|/etc/ld.so.conf}} and remove any invalid lines found.
# Examine the files located in the directory {{ic|/etc/ld.so.conf.d/}} and remove any invalid files.
# Rebuild the dynamic linker run-time bindings file {{ic|/etc/ld.so.cache}} by executing {{ic|# ldconfig}}.
# Rebuild the [[Initramfs]] by executing {{ic|# mkinitcpio -p linux}}.
# Exit the chroot, unmount filesystems, and reboot back into your installed system.


== See also ==
== See also ==


* [http://www.tuxradar.com/content/how-fix-most-common-linux-problems Fix the Most Common Problems]{{Dead link|2018|02|05}}
* [https://www.reddit.com/r/archlinux/comments/tjjwr/archlinux_a_howto_in_troubleshooting_for_newcomers/ A how-to in troubleshooting for newcomers]
* [https://www.reddit.com/r/archlinux/comments/tjjwr/archlinux_a_howto_in_troubleshooting_for_newcomers/ A how-to in troubleshooting for newcomers]
* [http://wiki.ultimatebootcd.com/index.php?title=Tools List of Tools for UBCD] - Memtest-like tools to add to grub.cfg on UltimateBootCD.com
* [https://wiki.ultimatebootcd.com/index.php?title=Tools List of Tools for UBCD] - Memtest-like tools to add to grub.cfg on UltimateBootCD.com
* [[Wikipedia:BIOS Boot partition]]
* [[Wikipedia:BIOS Boot partition]]
* [[REISUB]]
* [[REISUB]]
* [http://freedesktop.org/wiki/Software/systemd/Debugging#Debug_Logging_to_a_Serial_Console Debug Logging to a Serial Console] on Freedesktop.org
* [https://freedesktop.org/wiki/Software/systemd/Debugging/#Debug_Logging_to_a_Serial_Console Debug Logging to a Serial Console] on Freedesktop.org
* [https://web.archive.org/web/20120217124742/http://www.lesswatts.org/projects/acpi/debug.php How to Isolate Linux ACPI Issues] on Archive.org
* [https://web.archive.org/web/20120217124742/http://www.lesswatts.org/projects/acpi/debug.php How to Isolate Linux ACPI Issues] on Archive.org

Latest revision as of 06:24, 12 March 2024

This article explains some methods for general troubleshooting. For application specific issues, please reference the particular wiki page for that program.

General procedures

This article or section needs expansion.

Reason: Given the name of this page, basic "no-brainer" solutions should be mentioned somewhere, such as doing a cold boot (maybe explicitly pointing out it's done differently for a basic tower vs a laptop), updating to the latest versions, etc… (Discuss in Talk:General troubleshooting)

It is crucial to always read any error messages that appear. Sometimes it may be hard, e.g with graphical applications, to get a proper error message.

  1. Run the application in a terminal so it is possible to inspect the output.
    1. Increase the verbosity (usually --verbose/-v/-V or --debug/-d) if there is still not enough information to debug.
    2. Sometimes there is no such parameter and it needs to be specified as a directive in the applications' configuration file.
    3. An application may also use log files, which are usually located in /var/log, $HOME/.cache or $HOME/.local
    4. If there is no way to increase the verbosity, it is always possible to run strace and similar.
  2. Check the journal. It is possible that an error may also leave traces in the journal, especially if it depends on other applications.
    1. dmesg reads from the kernel ring buffer. This is useful if the disk is for some reason inaccessible but this may also result in incomplete logs because the kernel ring buffer is not infinite in size. Use journalctl if possible.
    2. journalctl has more filtering options than dmesg and uses human-readable timestamps by default.
  3. It is always recommended to check the relevant issue trackers to see if there are known issues with already existing solutions.
    1. Depending on upstreams' choices, there is usually an issue tracker and sometimes also a forum or even e.g an IRC channel.
    2. There is the Arch Linux bug tracker, which should be primarily used for packaging bugs.

Additional support

If you require any additional support, you may ask on the forums or on IRC.

When asking for support post the complete output/logs, not just what you think are the significant sections. Sources of information include:

  • Full output of any command involved - do not just select what you think is relevant.
  • systemd's journal.
    • For more extensive output, use the systemd.log_level=debug boot parameter. This will produce a tremendous amount of output, so only enable it if it really needed.
    • Do not use the -x parameter because this needlessly clutters the output and makes it harder to read.
    • Use -b unless you need logs from a previous boot. Not specifying this may lead to extremely large pastes that may even be too big for any pastebins.
  • Relevant configuration files
  • Drivers involved
  • Versions of packages involved
  • Kernel: journalctl -k or dmesg (both with root privileges).
  • Xorg: depending on the setup the display manager in use is relevant here, too.
    • Xorg.log may be located in one of several places: the system journal, /var/log/ or $HOME/.local/share/xorg/.
    • Some display managers like LightDM may also place the Xorg.log in its own log directory.
  • Pacman: If a recent upgrade broke something, look in /var/log/pacman.log.
    • It may be useful to use pacman's --debug parameter.

One of the better ways to post this information is to use a pastebin.

A link will then be output that you can paste to the forum or IRC.

Additionally, you may wish to review how to properly report issues before asking.

Boot problems

This article or section needs expansion.

Reason: From Talk:Installation guide#Buggy graphics driver, maybe add something about trying nomodeset on some hardware? (Discuss in Talk:General troubleshooting)

When diagnosing boot problems, it is very important to know in which stage the boot fails.

  1. Firmware (UEFI or BIOS)
    1. Usually only has very basic tools for debugging.
    2. Make sure Secure Boot is disabled.
  2. Boot loader
    1. One of the most common things done here is the changing of kernel parameters.
  3. initramfs
    1. Usually provides an emergency shell.
    2. Depending on the hooks chosen, either the dmesg or the journal is available within it.
  4. The actual system
    1. Depending on how badly it is broken, a simple invocation of the debug shell may suffice here.

If the debugging tools provided by any stage are not enough to fix the broken component, try using a e.g USB stick with the latest Arch Linux ISO on it.

Console messages

After the boot process, the screen is cleared and the login prompt appears, leaving users unable to read init output and error messages. This default behavior may be modified using methods outlined in the sections below.

Note that regardless of the chosen option, kernel messages can be displayed for inspection after booting by using journalctl -k or dmesg. To display all logs from the current boot use journalctl -b.

Flow control

This is basic management that applies to most terminal emulators, including virtual consoles (VC):

  • Press Ctrl+s to pause the output.
  • And Ctrl+q to resume it.

This pauses not only the output, but also programs which try to print to the terminal, as they will block on the write() calls for as long as the output is paused. If your init appears frozen, make sure the system console is not paused.

To see error messages which are already displayed, see Getty#Have boot messages stay on tty1.

Printing more kernel messages

Most kernel messages are hidden during boot. You can see more of these messages by adding different kernel parameters. The simplest ones are:

  • debug, which has the following effects:
    • The kernel will raise its console logging level such that all messages in the kernel log buffer will be printed to the console. [1]
    • systemd will raise its log level such that it will log debug messages that otherwise would not be produced anywhere. [2]
  • ignore_loglevel, which has the same effect on the kernel as debug or loglevel=8 (since debug messages are at 7), but prevents the log level from being raised later in the boot.

Other parameters you can add that might be useful in certain situations are:

  • earlyprintk=vga,keep prints kernel messages very early in the boot process, in case the kernel would crash before output is shown. You must change vga to efi for EFI systems.
  • log_buf_len=16M allocates a larger (16 MiB) kernel message buffer, to ensure that debug output is not overwritten.

Producing debug kernel messages

#Printing more kernel messages indicates how to print of the kernel log buffer to the console, but that buffer itself won't contain any messages it didn't already (aside from the debug systemd output). This heading discusses methods for getting more detailed information out of the kernel log.

Dynamic debugging

Messages printed with pr_debug or related functions such as dev_dbg(), drm_dbg(), and bt_dev_dbg() will not be produced unless you either:

  • Modify the kernel source to define DEBUG where desired.
  • Utilize the kernel's dynamic debug feature to enable debugging messages.

This section will discuss how to use dynamic debug, which is useful if you have already looked at your kernel log with everything up to informational logs, and would like even more debugging information from a particular location.

Firstly, you must be running a kernel that was compiled with the CONFIG_DYNAMIC_DEBUG kernel configuration option set. This is already the case for linux, so no action is required if you are using that kernel.

Then, you need to know where you want to see debug messages from. A couple of options are:

  • Going with the kernel module name, if the issue seems to be isolated to a module. For example, to troubleshoot Intel graphics, you might concern yourself with the i915 DRM kernel module.
  • Going with a directory in the kernel that corresponds with functionality you are interested in. You will want to check out (or navigate online) the kernel source code to understand the structure. As an example, to inspect debug messages for all DRM kernel modules, you could go with the path drivers/gpu/drm.

Using that "source" of messages, you have to come up with a dynamic debug query that indicates which debug messages to enable, of the format:

match_type match_parameter flags

Where:

  • match_type is the type of match to make. Corresponding to the two options given earlier, this could be module or file.
  • match_parameter is the module or file path to watch. In the latter case, using asterisks for wildcards is permissible.
  • flags dictates what to do with the match. This could be +p to start printing its messages, or -p to undo that.

Some examples of queries are:

  • module i915 +p to print debug messages from the i915 kernel module.
  • file drivers/gpu/drm/* +p to print debug messages from DRM drivers.
  • file * +p to print debug messages.

Finally, to actually enact the query, you can either:

  • Do so during runtime, by running:
# echo "query" > /sys/kernel/debug/dynamic_debug/control
This assumes that debugfs is mounted at /sys/kernel/debug/, which you can verify using mount . [3]

This is a greatly simplified overview of dynamic debug's capabilities; see the documentation for further details.

Subsystem-specific debugging

There are also a number of separate debug parameters for enabling debugging in specific subsystems e.g. bootmem_debug, sched_debug. Also, initcall_debug can be useful to investigate boot freezes. (Look for calls that did not return.) Check the kernel parameter documentation for specific information.

netconsole

netconsole is a kernel module that sends all kernel log messages (i.e. dmesg) over the network to another computer, without involving user space (e.g. syslogd). Name "netconsole" is a misnomer because it is not really a "console", more like a remote logging service.

It can be used either built-in or as a module. Built-in netconsole initializes immediately after NIC cards and will bring up the specified interface as soon as possible. The module is mainly used for capturing kernel panic output from a headless machine, or in other situations where the user space is no more functional.

Recovery shells

Getting an interactive shell at some stage in the boot process can help you pinpoint exactly where and why something is failing. There are several kernel parameters for doing so, but they all launch a normal shell which you can exit to let the kernel resume what it was doing:

  • rescue launches a shell shortly after the root file system is remounted read/write
  • emergency launches a shell even earlier, before most file systems are mounted
  • init=/bin/sh (as a last resort) changes the init program to a root shell. rescue and emergency both rely on systemd, but this should work even if systemd is broken.

Another option is systemd's debug-shell which adds a root shell on tty9 (accessible with Ctrl+Alt+F9). It can be enabled by either adding systemd.debug_shell to the kernel parameters, or by enabling debug-shell.service.

Warning: Remember to disable the service when done to avoid the security risk of leaving a root shell open on every boot.

Debugging kernel modules

See Kernel modules#Obtaining information.

Debugging hardware

Debugging freezes

Unfortunately, freezes are usually hard to debug and some of them take a lot of time to reproduce. There are some types of freezes which are easier to debug than others:

  • Is sound still playing? If so, just the display may be frozen. This may be a problem with the video driver.
  • Is the machine still responding? Try SSH if switching to another TTY does not work.
  • Is the disk activity LED (if present) indicating that a lot is being written to disk? Heavy swapping may temporarily freeze the system. See this StackExchange answer for information about freezes on large writes.

If nothing else helps, try a clean shutdown. Pressing the power button once may unfreeze the system and show the classic "shutdown screen" which displays all the units that are getting stopped. Alternatively, using the magic SysRq keys may also help to achieve a clean shutdown. This is very important because the journal may contain hints why the machine froze. The journal may not be written to disk on an unclean shutdown. Hard freezes in which the whole machine is unresponsible are harder to debug since logs can not be written to disk in time.

Remote logging may help if the freeze does not permit writing anything to disk. A crude remote logging solution, which needs to be invoked from another device, can be used for basic debugging:

$ ssh freezing_host journalctl -f

Many fatal freezes in which the whole system does not respond anymore and require a forced shutdown may be related to buggy firmware, drivers or hardware. Trying a different kernel (see Kernel#Debugging regressions) or even a different Linux distribution or operating system, updating the firmware and running hardware diagnostics may help finding the problem.

Tip: It is recommended to try to update the firmware of the device, since these updates may fix strange issues.

If a freeze does not permit gathering any kind of logs or other information required for debugging, try reproducing the freeze in a live environment. If a graphical environment is required to reproduce the freeze or if the freeze can be reproduced on the archiso, use the live environment of a different distribution, which is preferably not based on Arch Linux to eliminate the possibility that the freeze is related to the version or patches of the kernel. Should the freeze still happen in a live environment, chances are that it may be hardware-related. If it does not happen anymore, it is necessary to be aware of the differences of both systems. Different configurations, differences in versions and kernel parameters and other, similar changes may have fixed the freeze.

However, a blinking caps lock LED may indicate a kernel panic. Some setups may not show the TTY when a kernel panic occurred, which may be confusing and can be interpreted as another kind of freeze.

Debugging regressions

Warning: This tends to cause partial upgrades, which are a necessary evil in this specific case. Proceed with caution and prepare a method to recover your system, just in case this scenario prevents a normal boot.

If an update causes an issue but downgrading the specific package fixes it, it is likely a regression. If this happened after a normal full system upgrade, check your pacman.log to determine which package(s) may have caused the issue. The most important part of debugging regressions is checking if the issue was already fixed, as this can save much time. To do so, first ensure the application is fully updated (e.g ensure the application is the same version as in the official repositories). If it already is or if updating it does not fix the issue, try using the actual latest version, usually a -git version, which may already be packaged in the AUR. If this fixes the issue and the version with the fixes is not yet in the official repositories, wait until the new version arrives in them and then switch back to it.

If the issue still persists, debug the issue and/or bisect the application and report the bug on the upstream bug tracker so it can be fixed.

Note: The kernel needs a slightly different approach when debugging regressions.

Cannot use some peripherals after kernel upgrade

This will manifest commonly (but probably not only) as:

  • newly plugged USB devices showing up with dmesg but not in /dev/,
  • file systems unable to be mounted if they were not already used before the kernel update,
  • the inability to use a wired/wireless connection on a laptop if it was not already used before the kernel update,
  • FATAL: Module module not found in directory /lib/module/kernelversion when using modprobe to load a module that was not already used before the kernel package update.

As partially covered in System maintenance#Restart or reboot after upgrades, the kernel is not updated when you update the package but only once you reboot afterwards. Meanwhile, the kernel modules, located in /usr/lib/modules/kernelversion/ are removed by pacman when installing the new kernel. As explained in FS#16702, this approach avoids leaving files on the system not handled by the package manager but leads to the aforementioned symptoms. To fix them, reboot systematically after updating the kernel. The long-term evolution, yet to be implemented, will be to use versioned kernel packages : the main blocker being how to handle the removal of the previous kernel versions once they are not needed anymore.

Another solution is available as kernel-modules-hook, where two pacman hooks use rsync to keep the kernel modules on the file system after the kernel update and linux-modules-cleanup.service that marks the old modules for removal four weeks after once enabled.

Package management

See Pacman#Troubleshooting for general topics, and pacman/Package signing#Troubleshooting for issues with PGP keys.

Fixing a broken system

If a partial upgrade was performed, try updating your whole system. A reboot may be required.

# pacman -Syu

If you usually boot into a GUI and that is failing, perhaps you can press Ctrl+Alt+F1 through Ctrl+Alt+F6 and get to a working tty to run pacman through.

If the system is broken enough that you are unable to run pacman, boot using a monthly Arch ISO from a USB flash drive, an optical disc or a network with PXE. (Do not follow any of the rest of the installation guide.)

Mount your root file system:

[ISO] # mount /dev/rootFileSystemDevice /mnt

Mount any other partitions that you created separately, adding the prefix /mnt to all of them, i.e.:

[ISO] # mount /dev/bootDevice /mnt/boot

Try using your system's pacman:

[ISO] # arch-chroot /mnt
[chroot] # pacman -Syu

If that fails, exit the chroot, and try:

[ISO] # pacman -Syu --sysroot /mnt

If that fails, try:

[ISO] # pacman -Syu --root /mnt --cachedir /mnt/var/cache/pacman/pkg

fuser

This article or section needs expansion.

Reason: Needs more information about its usage (Discuss in Talk:General troubleshooting)

fuser is a command-line utility for identifying processes using resources such as files, file systems and TCP/UDP ports.

fuser is provided by the psmisc package, which should be already installed as a dependency of the base meta package. See fuser(1) for details.

Session permissions

Note: You must be using systemd as your init system for local sessions to work.[4] It is required for polkit permissions and ACLs for various devices (see /usr/lib/udev/rules.d/70-uaccess.rules and [5])

First, make sure you have a valid local session within X:

$ loginctl show-session $XDG_SESSION_ID

This should contain Remote=no and Active=yes in the output. If it does not, make sure that X runs on the same tty where the login occurred. This is required in order to preserve the logind session.

Basic polkit actions do not require further set-up. Some polkit actions require further authentication, even with a local session. A polkit authentication agent needs to be running for this to work. See polkit#Authentication agents for more information on this.

Message: "error while loading shared libraries"

If, while using a program, you get an error similar to:

error while loading shared libraries: libusb-0.1.so.4: cannot open shared object file: No such file or directory

Use pacman or pkgfile to search for the package that owns the missing library:

$ pacman -F libusb-0.1.so.4
extra/libusb-compat 0.1.5-1
    usr/lib/libusb-0.1.so.4

In this case, the libusb-compat package needs to be installed. Alternatively, the program requesting the library may need to be rebuilt following a soname bump.

The error could also mean that the package that you used to install your program does not list the library as a dependency in its PKGBUILD: if it is an official package, report a bug; if it is an AUR package, report it to the maintainer using its page in the AUR website.

See also