This article aims to help users implement services to actively monitor, log, and report hardware errors. A machine check exception (MCE) is an error generated by the CPU when the CPU detects that a hardware error or failure has occurred.
Machine check exceptions (MCEs) can occur for a variety of reasons ranging from undesired or out-of-spec voltages from the power supply, from cosmic radiation flipping bits in memory DIMMs, or from other miscellaneous faults, including faulty software triggering hardware errors.
The mcelog daemon written by Andi Kleen is one of the methods in which one can handle MCEs. The Template:Package Official daemon can be found in the Template:Codeline repository and can be installed with pacman.
pacman -S mcelog
Make sure the files are owned by root:root.
chown root:root /etc/mcelog.conf chown root:root /etc/logrotate.d/mcelog # (or /etc/logrotate.d/mcelog.logrotate)
Running mcelog as a daemon
Additional configuration options
The following options are probably recommended:
syslog = yes syslog-error = yes socket-path = /var/run/mcelog-client
As of 2011-09-29, the Template:Package Official package from Template:Codeline does not generate a default/example configuration file at Template:Filename. The example configuration file from upstream (as of 2011-09-29) can be found below for reference:
As of 2011-09-29, the Template:Package Official package from Template:Codeline does not generate a default logrotate file at Template:Filename or at Template:Filename. The example logrotate file from upstream (as of 2011-09-29) can be found below for reference:
Hardware documentation from CPU manufacturers
- AMD64 Architecture Programmer's Manual, Volume 2: System Programming
- BIOS and Kernel Developer's Guide for AMD Athlon™ 64 and AMD Opteron™ Processors