Difference between revisions of "Boot debugging"

From ArchWiki
Jump to: navigation, search
(Mounting and Chrooting broken system)
(Recovery shells: the two things do the same thing)
 
(92 intermediate revisions by 21 users not shown)
Line 1: Line 1:
 
[[Category:Boot process]]
 
[[Category:Boot process]]
 
[[Category:System recovery]]
 
[[Category:System recovery]]
[[it:Boot Debugging]]
+
[[it:Boot debugging]]
{{Note|Content moved from [[GRUB#Advanced_Debugging]].}}
+
[[ja:ブートデバッグ]]
The kernel provides for a convenient way to configure all sorts of advanced settings to enable you to quickly and conveniently boot into your existing system with varying levels of debugging output [http://www.kernel.org/doc/Documentation/kernel-parameters.txt extended kernel parameters]. It is very easy and useful to create several levels of debugging just by adding additional entries to your bootloader configuration. And if you ever have issues or problems down the road due to a power-failure or hardware failure, it can save you hours of trouble, and of course nothing can beat debugging output when it comes to learning about your system.
+
{{Related articles start}}
 +
{{Related|Arch boot process}}
 +
{{Related|Boot loaders}}
 +
{{Related|Netconsole}}
 +
{{Related|Kernel modules}}
 +
{{Related|mkinitcpio}}
 +
{{Related|Kernel Mode Setting}}
 +
{{Related|systemd}}
 +
{{Related articles end}}
  
== Useful Entries ==
+
A lot happens during the boot process, so it is a common time for errors to manifest. There are many methods for diagnosing and fixing boot problems, but most involve changing the kernel parameters and rebooting the system. Ensure that you are familiar with how to change your [[kernel parameters]]. For common issues, see [[General troubleshooting#Boot problems]].
  
If you are interested in debugging, then you deserve some grub entries for powerusers, here are a few examples that you can add to your bootloader configuration (grub-legacy config {{ic|/boot/grub/menu.lst}} used as an example).
+
== Console clearing ==
  
{{bc|<nowiki>
+
If all you want is to be able to see error messages that are already being displayed, you should [[disable clearing of boot messages]].
title Shutdown the Computer
+
halt
+
  
title Reboot the Computer
+
== Debug output ==
reboot
+
  
title Command Line
+
Most kernel messages are hidden during boot. You can see more of these messages by adding different kernel parameters. The simplest ones are:
commandline
+
  
title Install GRUB to hd0 MBR
+
* {{ic|debug}} enables debug messages for both the kernel and [[systemd]]
root (hd0,0)
+
* {{ic|ignore_loglevel}} forces ''all'' kernel messages to be printed
setup (hd0)
+
  
title Matrix
+
Other parameters you can add that might be useful in certain situations are:
color green/black light-green/green
+
* {{ic|1=earlyprintk=vga,keep}} prints kernel messages very early in the boot process, in case the kernel would crash before output is shown. You must change {{ic|vga}} to {{ic|efi}} for [[EFI]] systems
 +
* {{ic|1=log_buf_len=16M}} allocates a larger (16MB) kernel message buffer, to ensure that debug output is not overwritten
  
title Scan for /boot/grub/menu.lst
+
There are also a number of separate debug parameters for enabling debugging in specific subsystems e.g. {{ic|bootmem_debug}}, {{ic|sched_debug}}. Check the [https://www.kernel.org/doc/Documentation/kernel-parameters.txt kernel parameter documentation] for specific information.
find --set-root --ignore-floppies /boot/grub/menu.lst
+
configfile /boot/grub/menu.lst
+
  
title Scan for /boot/menu.lst
+
{{Note|If you cannot scroll back far enough to view the desired boot output, you should increase the size of the [[scrollback buffer]].}}
find --set-root --ignore-floppies /menu.lst
+
configfile /boot/menu.lst
+
  
#x86test
+
== Recovery shells ==
#make shure, that you place x86test_zImage.bin to /boot/x86test_zImage.bin first
+
#before uncommenting the following lines, assuming that you have a separate
+
#boot-partition /boot on (hd0,0)
+
  
#title    Run x86test (CPU Info)
+
Getting an interactive shell at some stage in the boot process can help you pinpoint exactly where and why something is failing. There are several kernel parameters for doing so, but they all launch a normal shell which you can {{ic|exit}} to let the kernel resume what it was doing:
#root (hd0,0)
+
* {{ic|rescue}} launches a shell shortly after the root filesystem is remounted read/write
#kernel /x86test_zImage.bin
+
* {{ic|emergency}} launches a shell even earlier, before most filesystems are mounted
 +
* {{ic|1=init=/bin/sh}} (as a last resort) changes the init program to a root shell. {{ic|rescue}} and {{ic|emergency}} both rely on [[systemd]], but this should work even if ''systemd'' is broken
  
# http://www.memtest.org/
+
Another option is systemd's debug-shell which adds a root shell on {{ic|tty9}} (accessible with Ctrl+Alt+F9). It can be enabled by either adding {{ic|systemd.debug-shell}} to the [[kernel parameters]], or by [[enabling]] {{ic|debug-shell.service}}. Take care to disable the service when done to avoid the security risk of leaving a root shell open on every boot.
# make shure you install memtest86+ first before uncommenting
+
#one of the following lines, assuming that you have a separate
+
#boot-partition /boot on (hd0,0):
+
#the easiest way to get memtest86+ is to install the memtest86+  
+
#package from the extra repositories and then edit your /boot/grub/menu.lst
+
#as pacman tells you to: pacman creates a directory /boot/memtest86+
+
#with the memtest binary memtest.bin in it. (/boot/memtest86+/memtest.bin).
+
#After installing you should be fine uncommenting the following three lines.
+
  
#title    Run memtest86+ (Memory Testing)
+
== See also ==
#root (hd0,0)
+
#kernel /memtest86+/memtest.bin
+
  
#altenatively you can download the latest copy 'latest_copy.bin.gz'
+
* [http://www.memtest.org/ Memtest86+]
#from http://www.memtest.org/, unzip it with gunzip, move it to
+
#/boot and uncomment the following three lines
+
 
+
#title    Run memtest86+ (Memory Testing)
+
#root (hd0,0)
+
#kernel /latest_copy.bin
+
</nowiki>}}
+
 
+
== Light Debug ==
+
 
+
A quick way to see more verbose messages on your console is to boot up your bootloader entry after appending '''verbose''' to the kernel line.  This simple word added to your kernel line turns on more logging thanks to the {{ic|/etc/rc.sysinit}} file, which at the top of the file runs:
+
 
+
if /bin/grep -q " verbose" /proc/cmdline; then /bin/dmesg -n 8; fi
+
 
+
Very simple way to get a bit more messages and debug output in your logs.
+
 
+
title  Arch Linux DEBUG Light
+
kernel /vmlinuz-linux root=/dev/disk/by-label/ROOT ro rootwait verbose
+
initrd /initramfs-linux.img
+
 
+
== Medium Debug ==
+
 
+
This example {{ic|menu.lst}} entry turns on real logging that is set by the kernel and not in an init script.  Adding the '''debug''' kernel parameter to your kernel line is recognized by a lot of linux internals and enables quite a bit of debugging compared to the default.
+
 
+
title Arch Linux DEBUG Medium
+
kernel /vmlinuz-linux root=/dev/disk/by-label/ROOT ro rootdelay=5 panic=10 debug
+
initrd /initramfs-linux.img
+
 
+
== Heavy Debug ==
+
 
+
An even more impressive kernel parameter is the '''ignore_loglevel''', which causes the system to ignore any loglevel and keeps the internal loglevel at the maximum debugging level, basically rendering dmesg unable to lower the debug level.
+
 
+
title Arch Linux DEBUG Heavy
+
kernel /vmlinuz-linux root=/dev/disk/by-label/ROOT ro rootdelay=5 panic=10 debug ignore_loglevel
+
initrd /initramfs-linux.img
+
 
+
== Extreme Debug ==
+
 
+
If the "Heavy Debug" seemed like a lot of output, that's about 1/2 of the logging that occurs with this example. This does a couple things, it uses the '''earlyprintk''' parameter to setup your kernel for "early" "printing" of messages to your "vga" screen.  The '''keep''' just lets it stay on the screen longer. This will let you see logs that normally are hidden due to the boot-up process.
+
 
+
This also changes the log buffer length to 10MB, and also instructs that any fatal signals be printed with '''print_fatal_signals'''.  The last one, '''sched_debug''', you can look up in the very excellent kernel documentation on [http://www.kernel.org/doc/Documentation/kernel-parameters.txt kernel parameters].
+
 
+
title Arch Linux DEBUG Extreme
+
kernel /vmlinuz-linux root=/dev/disk/by-label/ROOT ro debug ignore_loglevel log_buf_len=10M print_fatal_signals=1 LOGLEVEL=8 earlyprintk=vga,keep sched_debug
+
initrd /initramfs-linux.img
+
 
+
== Insane Debug ==
+
 
+
The first few debugging examples showed some really nice kernel parameters to turn on really verbose debugging.  This kind of debugging is absolutely critical if you want to max out your system or just learn more about what is going on behind the scenes.  But there is a final trick that is my favorite, it is the ability to set both environment variables, and more importantly, module parameters at boot.
+
 
+
As an example, here is the boot line that I am using at the moment on an older Dell Desktop, just to illustrate module parameters and environment vars. 
+
 
+
title  Arch Linux X-256
+
kernel /vmlinuz-linux root=/dev/disk/by-label/ROOT ro rootwait pause_on_oops=5 panic=60 i915.modeset=1 no_console_suspend ipv6.disable=1 TERM=xterm-256color quiet 5
+
initrd /initramfs-linux.img
+
 
+
Since it is low on both memory and CPU, I disable ipv6. I also turn on kernel modesetting for the i915 video card, set my terminal to be xterm-256color, and boot straight into [[Xorg|X]].  This lets me use a very optimized arch-linux configuration, amazing how fast thanks to using [[SLiM|slim]] as the login manager, [[Ratpoison|ratpoison]] as my [[Display Manager|window manager]], and terminal with [[Tmux|tmux]] as my login shell, all from boot, as the pstree shows (plus [[Synergy]]!).
+
 
+
{{bc|<nowiki>
+
init,1 
+
  |-slim,3096
+
  |  |-X,3098 -nolisten tcp vt07 -auth /var/run/slim.auth
+
  |  `-ratpoison,3107,askapache
+
  |      |-terminal,5341 -x sh -c exec /usr/bin/tmux -2 -l -u -q attach -d -t tmux-askapache
+
  |      |  |-bash,11165
+
  |      |  |-tmux,5345 -2 -l -u -q attach -d -t tmux-askapache
+
  |      |  `-{terminal},5346
+
  |      `-xscreensaver,3113 -no-splash
+
  |-synergyc,6121,galileo -f --name galileo-fire --restart 10.66.66.2:26666
+
  |
+
  `-tmux,5348,askapache -2 -l -u -q attach -d -t tmux-askapache
+
      |-bash,5351
+
      |  `-ssh,9969 lug@askapache.com
+
      `-bash,5868
+
        `-vim,11149 -p sda1/grub/menu.lst /boot/grub/menu.lst
+
</nowiki>}}
+
 
+
That kind of optimized system is only possible if you first can figure out your system, by debugging both the kernel as previously illustrated, debugging the init process, and most importantly, by debugging the modules enabled for your system's hardware/firmware/software.  Debugging modules is challenging but worth the effort, and then you are able to do some truly insane debugging from grub like the following example, note that the actual grub entry is all on one line, but I split it into 4 lines so you could see it all.  This basically turns on every module on this little Dell desktop to be at the absolute max debug level.  There is so much logging when I boot this that the system grinds to a halt and is slower than a TI-89 calculator (See [[Improve Boot Performance]]).
+
 
+
title  Arch Linux DEBUG INSANE
+
kernel /vmlinuz-linux root=/dev/disk/by-label/ROOT ro rootwait ignore_loglevel debug debug_locks_verbose=1 sched_debug initcall_debug mminit_loglevel=4 udev.log_priority=8
+
        loglevel=8 earlyprintk=vga,keep log_buf_len=10M print_fatal_signals=1 apm.debug=Y i8042.debug=Y drm.debug=1 scsi_logging_level=1 usbserial.debug=Y
+
        option.debug=Y pl2303.debug=Y firewire_ohci.debug=1 hid.debug=1 pci_hotplug.debug=Y pci_hotplug.debug_acpi=Y shpchp.shpchp_debug=Y apic=debug
+
        show_lapic=all hpet=verbose lmb=debug pause_on_oops=5 panic=10 sysrq_always_enabled
+
initrd /initramfs-linux.img
+
 
+
A couple key items from that grub entry are '''sysrq_always_enabled''' which forces on the sysrq magic, which really is a lifesaver when debugging at this level as your machine will freeze/stop-responding sometimes and it is nice to use sysrq to kill all tasks, change the loglevel, unmount all filesystems, or do a hard reboot.  Another key parameter is the '''initcall_debug''', which debugs the init process in excruciating detail.  Very useful at times.  The last parametery I find very useful is the '''udev.log_priority=8''' to turn on [[Udev|udev]] logging. 
+
 
+
=== Module Parameters ===
+
 
+
In [[Kernel modules#Parameters]] you can find a nice bash function to be run as root that will show a list of all the loaded modules and all of their parameters, including the current value of the parameter.
+
 
+
=== Break Into Init ===
+
 
+
For instance, If you add '''break=y''' to your kernel cmdline, init will pause early in the [[Arch Boot Process|boot process]] (after loading modules) and launch an interactive sh shell which can be used for troubleshooting purposes. (Normal boot continues after logout.)  This is very similar to the shell that shows up if your computer gets turned off before it is able to shutdown properly.  But using this parameter lets you enter into this mode differently at will.
+
 
+
title  Arch Linux Init Break
+
kernel /vmlinuz-linux root=/dev/disk/by-label/ROOT ro rootwait break=y
+
initrd /initramfs-linux.img
+
 
+
=== Debugging init ===
+
 
+
This awesome parameter '''udev.log_priority=8''' does the same thing as editing the file {{ic|/etc/udev/udev.conf}} except it executes earlier, turning on debugging output for [[Udev|udev]].  If you want to know your hardware, that is the key parameter right there.  Another trick is if you change the {{ic|/etc/udev/udev.conf}} to be verbose, then you can make your initrd image include that file to turn on verbose udeb debugging by adding it to your {{ic|/etc/mkinitcpio.conf}} like:
+
FILES="/etc/modprobe.d/modprobe.conf /etc/udev/udev.conf"
+
, which on arch is as easy as
+
 
+
# mkinitcpio -p linux
+
 
+
Debugging [[Udev|udev]] is key because the [[Initrd|initrd]] performs a [[Change Root|root change]] at the end of its run to usually launch a program like /sbin/init as part of a chroot, and unless the new file system has a valid /dev directory, udev must be initialized before invoking chroot in order to provide {{ic|/dev/console}}. 
+
 
+
exec chroot . /sbin/init <dev/console >dev/console 2>&1
+
 
+
So basically, you are not able to view the logs that are generated before /dev/console is initialized by udev or by a special initrd you compiled yourself.  One method the kernel developers use to be able to still get the log messages generated before /dev/console is available is to provide an alternative console that you can enable or disable from grub.
+
 
+
=== Net Console ===
+
 
+
If you read through the [http://lxr.linux.no/linux+v2.6.32/Documentation/networking/netconsole.txt kernel documentation regarding debugging], you will hear about [[Netconsole|Netconsole]], which can be loaded from the kernel line in GRUB, compiled into your kernel, or loaded at runtime as a module.  Having a netconsole entry in your {{ic|menu.lst}} is most excellent for debugging slower computers like old laptops or thin-clients.  It is easy to use.  Just setup a 2nd computer (running arch) to accept syslog requests on a remote port, very fast and quick to do on arch-linux, 1 line to syslog.conf.  Then you could use a log-color-parser like ccze to view all syslog logs, or just tail your everything.log. Then on your laptop, boot up and select the netconsole entry from the grub menu, and you will start seeing as much logging as you want on your syslog system.  This logging lets you view even earlier log output than is available with the earlyprintk=vga kernel parameter, as netconsole is used by kernel hackers and developers, so it is very powerful.
+
 
+
title  Arch Linux DEBUG Netconsole
+
kernel /vmlinuz-linux root=/dev/disk/by-label/ROOT ro netconsole=514@10.0.0.2/12:34:56:78:9a:bc debug ignore_loglevel
+
initrd /initramfs-linux.img
+
 
+
=== Hijacking cmdline ===
+
 
+
If you do not have access to GRUB or the kernel boottime cmdline, like on a server or virtual machine, as long as you have root permissions you can still enable this kind of simplistic verbose logging using a neat hack.  While you cannot modify the {{ic|/proc/cmdline}} even as root, you can place your own cmdline file on top of /proc/cmdline, so that accessing /proc/cmdline actually accesses your file.
+
 
+
For example if I '''cat /proc/cmdline''', I have the following:
+
 
+
root=/dev/disk/by-label/ROOT ro console=tty1 logo.nologo quiet
+
 
+
So I use a simple sed command to replace '''quiet''' with '''verbose''' like:
+
 
+
sed 's/ quiet/ verbose/' /proc/cmdline > /root/cmdline
+
 
+
Then I bind mount /root/cmdline so that it becomes /proc/cmdline, using the '''-n''' option to mount so that this mount will not be recorded in the systems mtab.
+
 
+
mount -n --bind -o ro /root/cmdline /proc/cmdline
+
 
+
Now if I '''cat /proc/cmdline''', I have the following:
+
 
+
root=/dev/disk/by-label/ROOT ro console=tty1 logo.nologo verbose
+
 
+
== Troubleshooting ==
+
 
+
=== Repairing with Arch live-cd ===
+
In case grub is unable to boot your kernel, or if your initramfs is broken, you can boot into a safe system using an [https://www.archlinux.org/download/ Arch live-cd].  Once finished with repairs, unmount the broken system and reboot.
+
 
+
==== Mounting and Chrooting broken system ====
+
Once booted and at a console prompt, use the following to mount and repair your broken system (where {{ic|/dev/sda3}} is {{ic|/}} and {{ic|/dev/sda1}} is {{ic|/boot}}):
+
 
+
First create the mount-point and mount your root {{ic|/}} filesystem to it, then cd into it.
+
# mkdir /mnt/arch
+
# mount /dev/sda3 /mnt/arch
+
# cd /mnt/arch
+
Now create the proc, sysfs, and dev filesystems
+
# mount -t proc proc proc/
+
# mount -t sysfs sys sys/
+
# mount -o bind /dev dev/
+
Next mount the boot partition if you use one.
+
# mount /dev/sda1 boot/
+
Finally chroot into {{ic|/mnt/arch}} which will become {{ic|/}}.
+
# chroot .
+
Turn on networking
+
 
+
==== Reinstalling with Pacman ====
+
The author uses pacman in a chrooted broken system to reinstall the kernel, grub, initramfs, udev, and any other packages that may be broken and/or needed to get the system up and running.
+
 
+
This will reinstall the kernel and initramfs so check that {{ic|/etc/mkinitcpio.conf}} is correct or remove the file entirely and re-install mkinitcpio.
+
# pacman -Syyu mkinitcpio linux udev
+
 
+
Afterwards, unmount and reboot.
+
 
+
==See Also==
+
* [[Netconsole]]
+
* [[Syslinux]]
+
* [[GRUB2]]
+
* [[GRUB Legacy]]
+
* [[Kernel modules]]
+
* [[mkinitcpio]]
+
* [[Fstab]]
+
* [[Kernel Mode Setting]]
+
* [[LILO]]
+
* [[GUID Partition Table]]
+
* [[Systemd]]
+
 
+
==External Links==
+
* [http://www.memtest.org/ Memtest ]
+
 
* [http://wiki.ultimatebootcd.com/index.php?title=Tools List of Tools for UBCD] - Can be added to custom menu.lst like memtest
 
* [http://wiki.ultimatebootcd.com/index.php?title=Tools List of Tools for UBCD] - Can be added to custom menu.lst like memtest
* Official GRUB2 Manual - https://www.gnu.org/software/grub/manual/grub.html
 
* Ubuntu wiki page for GRUB2 - https://help.ubuntu.com/community/Grub2
 
* GRUB2 wiki page describing steps to compile for UEFI systems - https://help.ubuntu.com/community/UEFIBooting
 
 
* Wikipedia's page on [[Wikipedia:BIOS Boot partition|BIOS Boot partition]]
 
* Wikipedia's page on [[Wikipedia:BIOS Boot partition|BIOS Boot partition]]
 +
* [https://fedoraproject.org/wiki/QA/Sysrq QA/Sysrq] - Using sysrq
 +
* systemd documentation: [http://freedesktop.org/wiki/Software/systemd/Debugging#Debug_Logging_to_a_Serial_Console Debug Logging to a Serial Console]
 +
* [https://web.archive.org/web/20120217124742/http://www.lesswatts.org/projects/acpi/debug.php How to Isolate Linux ACPI Issues]

Latest revision as of 16:32, 6 March 2016

A lot happens during the boot process, so it is a common time for errors to manifest. There are many methods for diagnosing and fixing boot problems, but most involve changing the kernel parameters and rebooting the system. Ensure that you are familiar with how to change your kernel parameters. For common issues, see General troubleshooting#Boot problems.

Console clearing

If all you want is to be able to see error messages that are already being displayed, you should disable clearing of boot messages.

Debug output

Most kernel messages are hidden during boot. You can see more of these messages by adding different kernel parameters. The simplest ones are:

  • debug enables debug messages for both the kernel and systemd
  • ignore_loglevel forces all kernel messages to be printed

Other parameters you can add that might be useful in certain situations are:

  • earlyprintk=vga,keep prints kernel messages very early in the boot process, in case the kernel would crash before output is shown. You must change vga to efi for EFI systems
  • log_buf_len=16M allocates a larger (16MB) kernel message buffer, to ensure that debug output is not overwritten

There are also a number of separate debug parameters for enabling debugging in specific subsystems e.g. bootmem_debug, sched_debug. Check the kernel parameter documentation for specific information.

Note: If you cannot scroll back far enough to view the desired boot output, you should increase the size of the scrollback buffer.

Recovery shells

Getting an interactive shell at some stage in the boot process can help you pinpoint exactly where and why something is failing. There are several kernel parameters for doing so, but they all launch a normal shell which you can exit to let the kernel resume what it was doing:

  • rescue launches a shell shortly after the root filesystem is remounted read/write
  • emergency launches a shell even earlier, before most filesystems are mounted
  • init=/bin/sh (as a last resort) changes the init program to a root shell. rescue and emergency both rely on systemd, but this should work even if systemd is broken

Another option is systemd's debug-shell which adds a root shell on tty9 (accessible with Ctrl+Alt+F9). It can be enabled by either adding systemd.debug-shell to the kernel parameters, or by enabling debug-shell.service. Take care to disable the service when done to avoid the security risk of leaving a root shell open on every boot.

See also