Difference between revisions of "S.M.A.R.T."

From ArchWiki
Jump to: navigation, search
(Monitor devices)
(4 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
[[Category:Storage]]
 
[[Category:Storage]]
 +
[[ja:S.M.A.R.T.]]
 
S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) is a supplementary component build into many modern storage devices through which devices monitor, store, and analyze the health of their operation.  Statistics are collected (temperature, number of reallocated sectors, seek errors...) which software can use to measure the health of a device, predict possible device failure, and provide notifications on unsafe values.
 
S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) is a supplementary component build into many modern storage devices through which devices monitor, store, and analyze the health of their operation.  Statistics are collected (temperature, number of reallocated sectors, seek errors...) which software can use to measure the health of a device, predict possible device failure, and provide notifications on unsafe values.
  
 
== Smartmontools ==
 
== Smartmontools ==
  
The smartmontools package contains two utility programs ({{ic|smartctl}} and {{ic|smartd}}) to analyze and monitor storage devices.  Install {{Pkg|smartmontools}} from the [[Official Repositories|official repositories]].
+
The smartmontools package contains two utility programs ({{ic|smartctl}} and {{ic|smartd}}) to analyze and monitor storage devices.  Install {{Pkg|smartmontools}} from the [[Official repositories|official repositories]].
  
 
=== Detect if device has SMART support ===
 
=== Detect if device has SMART support ===
Line 25: Line 26:
 
Three type of health tests that can be performed on the device (all are safe to user data):
 
Three type of health tests that can be performed on the device (all are safe to user data):
  
# Short (the hightest-probability tests of detecting device problems)
+
# Short (runs tests that have a high probability of detecting device problems)
 
# Extended (or Long; a short check with complete disk surface examination)
 
# Extended (or Long; a short check with complete disk surface examination)
 
# Conveyance (identifies if damage incurred during transportation of the device)
 
# Conveyance (identifies if damage incurred during transportation of the device)
Line 38: Line 39:
 
  # smartctl -t long /dev/<device>
 
  # smartctl -t long /dev/<device>
 
  # smartctl -t conveyance /dev/<device>
 
  # smartctl -t conveyance /dev/<device>
 +
 +
{{Note|Some disks (e.g. SSDs) may not support all types of test. You can see what your device supports with {{ic|smartctl --capabilities /dev/<device>}}}}
  
 
==== Results ====
 
==== Results ====
Line 61: Line 64:
 
   systemctl enable smartd.service
 
   systemctl enable smartd.service
  
The smart daemon can be edited for more exact configuration in {{ic|/etc/smartd.conf}} (the configuration is well commented) otherwise all tests are run on all devices. Or, each device can be specified and all tests run by doing (uuid's and device ID can be used for more exact matching):
+
The smart daemon can be edited for more exact configuration in {{ic|/etc/smartd.conf}}.
  
#DEVICESCAN
+
{{Tip|{{ic|/etc/smartd.conf}} is configured with somewhat esoteric command-line style options. See the comments and examples in the file, and refer to the [http://smartmontools.sourceforge.net/man/smartd.conf.5.html manpage] for a full explanation. What follows are some examples of the monitoring options.}}
/dev/<device> -a
+
  
Other options include:
+
==== Define the devices to monitor ====
 +
 
 +
To monitor all attributes of all disks specify:
 +
 
 +
  DEVICESCAN
 +
 
 +
Alternatively, enable monitoring of all attributes on individual disks:
 +
 
 +
#DEVICESCAN
 +
/dev/<first_device> -a
 +
/dev/<second_device> -a
  
* {{ic|-n standby,q}} to run diagnostics only when device is spun-up.
+
{{Tip|If you want to specify different monitoring options for different disks, you'll need to define them separately rather than use {{ic|DEVICESCAN}}.}}
* Details about smartd operations can be found in: {{ic|/var/log/daemon.log}}.
+
  
 
==== Email potential problems ====
 
==== Email potential problems ====
Line 77: Line 88:
 
  DEVICESCAN -m address@domain.com
 
  DEVICESCAN -m address@domain.com
  
To be able to send the email externally (i.e. not to the root mail account) a MTA (Mail Transport Agent) or a MUA (Mail User Agent) will need to be installed and configured.  Common MTAs are [[msmtp|MSMTP]] and [[SSMTP]]. Common MTUs are sendmail and [[Postfix]].
+
To be able to send the email externally (i.e. not to the root mail account) a MTA (Mail Transport Agent) or a MUA (Mail User Agent) will need to be installed and configured.  Common MTAs are [[msmtp|MSMTP]] and [[SSMTP]]. Common MTUs are sendmail and [[Postfix]]. It's enough to simply configure [[S-nail]] if you don't want anything else.
  
 
Once the mail agent is setup the {{ic|-M test}} option can be used to test if an email will be sent (restart the daemon immediately to discover):
 
Once the mail agent is setup the {{ic|-M test}} option can be used to test if an email will be sent (restart the daemon immediately to discover):
Line 90: Line 101:
  
 
More info on [http://sourceforge.net/apps/trac/smartmontools/wiki/Powermode smartmontools wiki].
 
More info on [http://sourceforge.net/apps/trac/smartmontools/wiki/Powermode smartmontools wiki].
 +
 +
==== Schedule self-tests ====
 +
 +
smartd can tell disks to perform self-tests on a schedule. The following {{ic|/etc/smartd.conf}} configuration will start a short self-test every day between 2-3am, and an extended self test weekly on Saturdays between 3-4am:
 +
 +
  DEVICESCAN -s (S/../.././02|L/../../6/03)
 +
 +
==== Alert on temperature changes ====
 +
 +
smartd can track disk temperatures and alert if they rise too quickly or hit a high limit. The following will log changes of 4 degrees or more, log when temp reaches 35 degrees, and log/email a warning when temp reaches 40:
 +
 +
  DEVICESCAN -W 4,35,40
 +
 +
{{Tip|You can determine the current disk temperature with the command {{ic|smartctl -A /dev/<device> &#124; grep Temperature_Celsius}}}}
 +
 +
{{Tip|If you have some disks that run a lot hotter/cooler than others, remove {{ic|DEVICESCAN}} and define a separate configuration for each device with appropriate temperature settings.}}
 +
 +
==== Complete {{ic|smartd.conf}} example ====
 +
 +
Putting together all of the above gives the following example configuration:
 +
 +
* {{ic|DEVICESCAN}} (smartd scans for disks and monitors all it finds)
 +
* {{ic|-a}} (monitor all attributes)
 +
* {{ic|-o on}} (enable automatic online data collection)
 +
* {{ic|-S on}} (enable automatic attribute autosave)
 +
* {{ic|-n standby,q}} (don't check if disk is in standby, and supress log message to that effect so as not to cause a write to disk)
 +
* {{ic|-s ...}} (schedule short and long self-tests)
 +
* {{ic|-W ...}} (monitor temperature)
 +
* {{ic|-m ...}} (mail alerts)
 +
 +
  DEVICESCAN -a -o on -S on -n standby,q -s (S/../.././02|L/../../6/03) -W 4,35,40 -m <username or email>
 +
 +
==== Start/reload the {{ic|smartd}} service and check status ====
 +
 +
  # systemctl start smartd
 +
 +
or
 +
  # systemctl reload smartd
 +
 +
Check status:
 +
 +
  # systemctl status smartd
 +
 +
Full {{ic|smartd}} log:
 +
 +
  # journalctl -u smartd
  
 
=== GUI Applications ===
 
=== GUI Applications ===

Revision as of 02:27, 14 March 2014

S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) is a supplementary component build into many modern storage devices through which devices monitor, store, and analyze the health of their operation. Statistics are collected (temperature, number of reallocated sectors, seek errors...) which software can use to measure the health of a device, predict possible device failure, and provide notifications on unsafe values.

Smartmontools

The smartmontools package contains two utility programs (smartctl and smartd) to analyze and monitor storage devices. Install smartmontools from the official repositories.

Detect if device has SMART support

To check if the device has SMART capability (it may be necessary to add -d ata to specify it is an ATA derived device):

# smartctl -i /dev/<device>

(where <device> is sda, hda,...). This will give general information about the device, the last two lines will show if it is supported:

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

If SMART is not enabled, it can be enabled by doing:

# smartctl -s on /dev/<device>

Test the device health

Three type of health tests that can be performed on the device (all are safe to user data):

  1. Short (runs tests that have a high probability of detecting device problems)
  2. Extended (or Long; a short check with complete disk surface examination)
  3. Conveyance (identifies if damage incurred during transportation of the device)

To view the device's available tests and the time it will take to perform each test do:

# smartctl -c /dev/<device>

To run the tests do:

# smartctl -t short /dev/<device>
# smartctl -t long /dev/<device>
# smartctl -t conveyance /dev/<device>
Note: Some disks (e.g. SSDs) may not support all types of test. You can see what your device supports with smartctl --capabilities /dev/<device>

Results

To view the test's overall health status (compiled from all tests):

# smartctl -H /dev/<device>

To view the test's result errors:

# smartctl -l selftest /dev/<device>

To view the test's detailed results:

# smartctl -a /dev/<device>

If no errors are reported the device is likely healthy. If there are a few errors this may or may not indicate a problem and should be investigated further. When a device starts to fail it is recommended to backup the data and replace it.

Monitor devices

Devices can be monitored in the background with use of the smartmontools daemon that will check devices periodically and optionally email any potential problems. To have devices monitored on boot, enable smartd service:

 systemctl enable smartd.service

The smart daemon can be edited for more exact configuration in /etc/smartd.conf.

Tip: /etc/smartd.conf is configured with somewhat esoteric command-line style options. See the comments and examples in the file, and refer to the manpage for a full explanation. What follows are some examples of the monitoring options.

Define the devices to monitor

To monitor all attributes of all disks specify:

 DEVICESCAN

Alternatively, enable monitoring of all attributes on individual disks:

#DEVICESCAN
/dev/<first_device> -a
/dev/<second_device> -a
Tip: If you want to specify different monitoring options for different disks, you'll need to define them separately rather than use DEVICESCAN.

Email potential problems

To have an email sent when a failure or new error occurs, use the -m option:

DEVICESCAN -m address@domain.com

To be able to send the email externally (i.e. not to the root mail account) a MTA (Mail Transport Agent) or a MUA (Mail User Agent) will need to be installed and configured. Common MTAs are MSMTP and SSMTP. Common MTUs are sendmail and Postfix. It's enough to simply configure S-nail if you don't want anything else.

Once the mail agent is setup the -M test option can be used to test if an email will be sent (restart the daemon immediately to discover):

DEVICESCAN -m address@domain.com -M test

Power management

If you use a computer under control of power management, you should instruct smartd how to handle disks in low power mode. Usually, in response to SMART commands issued by smartd, the disk platters are spun up. So if this option is not used, then a disk which is in a low-power mode may be spun up and put into a higher-power mode when it is periodically polled by smartd.

DEVICESCAN -n standby,15,q

More info on smartmontools wiki.

Schedule self-tests

smartd can tell disks to perform self-tests on a schedule. The following /etc/smartd.conf configuration will start a short self-test every day between 2-3am, and an extended self test weekly on Saturdays between 3-4am:

 DEVICESCAN -s (S/../.././02|L/../../6/03)

Alert on temperature changes

smartd can track disk temperatures and alert if they rise too quickly or hit a high limit. The following will log changes of 4 degrees or more, log when temp reaches 35 degrees, and log/email a warning when temp reaches 40:

 DEVICESCAN -W 4,35,40
Tip: You can determine the current disk temperature with the command smartctl -A /dev/<device> | grep Temperature_Celsius
Tip: If you have some disks that run a lot hotter/cooler than others, remove DEVICESCAN and define a separate configuration for each device with appropriate temperature settings.

Complete smartd.conf example

Putting together all of the above gives the following example configuration:

  • DEVICESCAN (smartd scans for disks and monitors all it finds)
  • -a (monitor all attributes)
  • -o on (enable automatic online data collection)
  • -S on (enable automatic attribute autosave)
  • -n standby,q (don't check if disk is in standby, and supress log message to that effect so as not to cause a write to disk)
  • -s ... (schedule short and long self-tests)
  • -W ... (monitor temperature)
  • -m ... (mail alerts)
 DEVICESCAN -a -o on -S on -n standby,q -s (S/../.././02|L/../../6/03) -W 4,35,40 -m <username or email>

Start/reload the smartd service and check status

 # systemctl start smartd

or

 # systemctl reload smartd

Check status:

 # systemctl status smartd

Full smartd log:

 # journalctl -u smartd

GUI Applications

  • Gsmartcontrol — A GNOME frontend for the smartctl hard disk drive health inspection tool
http://gsmartcontrol.berlios.de/home/index.php/en/Home || gsmartcontrol

Resources