Machine-check exception

From ArchWiki
Revision as of 14:17, 2 March 2014 by Fengchao (Talk | contribs) (RC info already changed to systemd.)

Jump to: navigation, search

This article aims to help users implement services to actively monitor, log, and report hardware errors. A machine check exception (MCE) is an error generated by the CPU when the CPU detects that a hardware error or failure has occurred.

Introduction

Machine check exceptions (MCEs) can occur for a variety of reasons ranging from undesired or out-of-spec voltages from the power supply, from cosmic radiation flipping bits in memory DIMMs or the CPU, or from other miscellaneous faults, including faulty software triggering hardware errors.

Installing mcelog

The mcelog daemon written by Andi Kleen is one of the tools one can use to gather MCE information.

Install the mcelog package from the official repositories.

Configuring mcelog

mcelog's configuration file is located at /etc/mcelog/mcelog.conf.

Running mcelog as a daemon

It is recommended by upstream to always run mcelog as a daemon, so edit /etc/mcelog/mcelog.conf and set daemon = yes.

Finally, start the mcelog service and enable it to start automatically on boot:

# systemctl start mcelog
# systemctl enable mcelog

Additional configuration options

The following option is probably recommended:

syslog = yes

Hardware documentation from CPU manufacturers

See Also