Difference between revisions of "Machine-check exception"

From ArchWiki
Jump to: navigation, search
(removed example config files because mcelog-1.0pre3-3 fixes the packing issue)
(Running mcelog as a daemon: update for systemd)
(18 intermediate revisions by 6 users not shown)
Line 1: Line 1:
[[Category:Hardware (English)]]
+
[[Category:CPU]]
[[Category:CPU (English)]]
+
[[Category:Kernel]]
[[Category:Kernel (English)]]
+
{{Out of date|mentions [[rc.conf]]}}
[[Category:Daemons and system services (English)]]
+
 
+
 
This article aims to help users implement services to actively monitor, log, and report hardware errors. A machine check exception (MCE) is an error generated by the CPU when the CPU detects that a hardware error or failure has occurred.
 
This article aims to help users implement services to actively monitor, log, and report hardware errors. A machine check exception (MCE) is an error generated by the CPU when the CPU detects that a hardware error or failure has occurred.
  
 
==Introduction==
 
==Introduction==
Machine check exceptions (MCEs) can occur for a variety of reasons ranging from undesired or out-of-spec voltages from the power supply, from cosmic radiation flipping bits in memory DIMMs, or from other miscellaneous faults, including faulty software triggering hardware errors.
+
Machine check exceptions (MCEs) can occur for a variety of reasons ranging from undesired or out-of-spec voltages from the power supply, from cosmic radiation flipping bits in memory DIMMs or the CPU, or from other miscellaneous faults, including faulty software triggering hardware errors.
  
 
==Installing mcelog==
 
==Installing mcelog==
The [http://www.mcelog.org/ mcelog] daemon written by Andi Kleen is one of the methods in which one can handle MCEs. The {{Package Official|mcelog}} daemon can be found in the {{Codeline|[community]}} repository and can be installed with [[pacman]].
+
The [http://www.mcelog.org/ mcelog] daemon written by Andi Kleen is one of the tools one can use to gather MCE information.
pacman -S mcelog
+
 
 +
[[pacman|Install]] the {{Pkg|mcelog}} package from the [[Official Repositories|official repositories]].
  
 
==Configuring mcelog==
 
==Configuring mcelog==
mcelog's configuration file is supposed to be located at {{Filename|/etc/mcelog.conf}}, but as of 2011-09-29, that file is not created after running {{Codeline|pacman -S mcelog}}.
+
mcelog's configuration file is located at {{ic|/etc/mcelog/mcelog.conf}}.
 
+
* Copy the example {{Filename|mcelog.conf}} file from here: [[#Example /etc/mcelog.conf]]
+
 
+
* Copy the example {{Filename|mcelog.logrotate}} file from here: [[#Example /etc/logrotate.d/mcelog]]
+
 
+
* Make sure the files are owned by root:root.
+
chown root:root /etc/mcelog.conf
+
chown root:root /etc/logrotate.d/mcelog    # (or /etc/logrotate.d/mcelog.logrotate)
+
  
 
===Running mcelog as a daemon===
 
===Running mcelog as a daemon===
It is recommended by upstream to always run mcelog as a daemon, so edit {{Filename|/etc/mcelog.conf}} and set {{Codeline|daemon <nowiki>=</nowiki> yes}}.
+
It is recommended by upstream to always run mcelog as a daemon, so edit {{ic|/etc/mcelog/mcelog.conf}} and set {{ic|1=daemon = yes}}.
  
Finally, {{Codeline|mcelog}} needs to be added to the {{Codeline|DAEMONS}} array in {{Filename|/etc/rc.conf}}.
+
Finally, start the mcelog service and enable it to start automatically on boot:
 +
# systemctl start mcelog
 +
# systemctl enable mcelog
  
 
===Additional configuration options===
 
===Additional configuration options===
The following options are probably recommended:
+
The following option is probably recommended:
 
  syslog = yes
 
  syslog = yes
socket-path = /var/run/mcelog-client
 
  
 
==Hardware documentation from CPU manufacturers==
 
==Hardware documentation from CPU manufacturers==
* [http://support.amd.com/us/Processor_TechDocs/24593.pdf AMD64 Architecture Programmer's Manual, Volume 2: System Programming]
+
* [http://support.amd.com/us/Processor_TechDocs/APM_v2_24593.pdf AMD64 Architecture Programmer's Manual, Volume 2: System Programming]
* [http://support.amd.com/us/Processor_TechDocs/26094.pdf BIOS and Kernel Developer's Guide for AMD Athlon™ 64 and AMD Opteron™ Processors]
+
* [http://support.amd.com/us/Processor_TechDocs/26094.PDF BIOS and Kernel Developer's Guide for AMD Athlon™ 64 and AMD Opteron™ Processors]
  
 
==See Also==
 
==See Also==

Revision as of 04:01, 16 November 2013

Tango-view-refresh-red.pngThis article or section is out of date.Tango-view-refresh-red.png

Reason: mentions rc.conf (Discuss in Talk:Machine-check exception#)

This article aims to help users implement services to actively monitor, log, and report hardware errors. A machine check exception (MCE) is an error generated by the CPU when the CPU detects that a hardware error or failure has occurred.

Introduction

Machine check exceptions (MCEs) can occur for a variety of reasons ranging from undesired or out-of-spec voltages from the power supply, from cosmic radiation flipping bits in memory DIMMs or the CPU, or from other miscellaneous faults, including faulty software triggering hardware errors.

Installing mcelog

The mcelog daemon written by Andi Kleen is one of the tools one can use to gather MCE information.

Install the mcelog package from the official repositories.

Configuring mcelog

mcelog's configuration file is located at /etc/mcelog/mcelog.conf.

Running mcelog as a daemon

It is recommended by upstream to always run mcelog as a daemon, so edit /etc/mcelog/mcelog.conf and set daemon = yes.

Finally, start the mcelog service and enable it to start automatically on boot:

# systemctl start mcelog
# systemctl enable mcelog

Additional configuration options

The following option is probably recommended:

syslog = yes

Hardware documentation from CPU manufacturers

See Also