Difference between revisions of "RAID"

From ArchWiki
Jump to: navigation, search
(Redundancy)
m (Build the array: The right location is /dev/md/ not /dev)
(24 intermediate revisions by 9 users not shown)
Line 3: Line 3:
 
[[it:RAID]]
 
[[it:RAID]]
 
{{Article summary start}}
 
{{Article summary start}}
{{Article summary text|This article explains what RAID is and how to install, configure and maintain it.}}
+
{{Article summary text|This article explains what RAID is and how to install software RAID using mdadm, configure and maintain it.}}
 
{{Article summary heading|Required software}}
 
{{Article summary heading|Required software}}
 
{{Article summary link|mdadm|http://neil.brown.name/blog/mdadm}}
 
{{Article summary link|mdadm|http://neil.brown.name/blog/mdadm}}
Line 15: Line 15:
 
== Introduction ==
 
== Introduction ==
 
{{Wikipedia|RAID}}
 
{{Wikipedia|RAID}}
Redundant Array of Independent Disks (RAID) devices are virtual devices created from two or more real block devices.  This allows multiple devices (typically disk drives or partitions thereof) to be combined into a single device to hold (for example) a single filesystem. RAID is designed to prevent data loss in the event of a hard disk failure. There are different [[Wikipedia:Standard RAID levels|levels of RAID]].
+
Redundant Array of Independent Disks (RAID) is a storage technology that combines multiple disk drive components (typically disk drives or partitions thereof) into a logical unit. Depending the RAID implementation, this logical unit can be a file system or an additional transparent layer that can hold several partitions. Data is distributed across the drives in one of several ways called "RAID levels", depending on the level of redundancy and performance required. The RAID level chosen can thus prevent data loss in the event of a hard disk failure, increase performance or be a combination of both.
 +
 
 +
Despite redundancy implied by most RAID levels, RAID does not guarantee that data is safe. A RAID will not protect data if there is a fire, the computer is stolen or multiple hard drives fail at once. Furthermore, installing a system with RAID is a complex process that may destroy data. {{Warning|Therefore, be sure [[Backup Programs|to backup]] all data before proceeding.}}
 +
 
 +
There are many different [[Wikipedia:Standard RAID levels|levels of RAID]], please find hereafter the most commonly used ones.
  
 
===Standard RAID levels===
 
===Standard RAID levels===
; [[Wikipedia:Standard RAID levels#RAID 0|RAID 0]]: Uses striping to combine disks. Not really RAID in that it ''provides no redundancy''. It does, however, ''provide a big speed benefit''. This example will utilize RAID 0 for swap, on the assumption that a desktop system is being used, where the speed increase is worth the possibility of system crash if one of your drives fails. On a server, a RAID 1 or RAID 5 array is more appropriate. The size of a RAID 0 array block device is the size of the smallest component partition times the number of component partitions.
+
; [[Wikipedia:Standard RAID levels#RAID 0|RAID 0]]
; [[Wikipedia:Standard RAID levels#RAID 1|RAID 1]]: The most straightforward RAID level: straight mirroring. As with other RAID levels, it only makes sense if the partitions are on different physical disk drives. If one of those drives fails, the block device provided by the RAID array will continue to function as normal. The example will be using RAID 1 for everything except swap. Note that RAID 1 is the only option for the boot partition, because bootloaders (which read the boot partition) do not understand RAID, but a RAID 1 component partition can be read as a normal partition. The size of a RAID 1 array block device is the size of the smallest component partition.
+
: Uses striping to combine disks. Even if ''does not provide redundancy'', it is anyway considered as a RAID. It does, however, ''provide a big speed benefit''. If you think the speed increase is worth the possibility of data loss (for your [[swap]] partition for example), choose this RAID level. On a server, RAID 1 and RAID 5 arrays are more appropriate. The size of a RAID 0 array block device is the size of the smallest component partition times the number of component partitions.
; [[Wikipedia:Standard RAID levels#RAID 5|RAID 5]]: Requires 3 or more physical drives, and provides the redundancy of RAID 1 combined with the speed and size benefits of RAID 0. RAID 5 uses striping, like RAID 0, but also stores parity blocks distributed across each member disk. In the event of a failed disk, these parity blocks are used to reconstruct the data on a replacement disk. RAID 5 can withstand the loss of one member disk.
+
; [[Wikipedia:Standard RAID levels#RAID 1|RAID 1]]
: {{Note|RAID 5 is a common choice due to its combination of speed and data redundancy. The caveat is that if 1 drive were to fail and before that drive was replaced another drive failed, all data will be lost. For excellent information regarding this, see the ''[http://ubuntuforums.org/showthread.php?t=1588106 RAID5 Risks]'' discussion thread on the Ubuntu forums. The best alternative to RAID5 when redundancy is crucial is RAID 10.}}
+
: The most straightforward RAID level: straight mirroring. As with other RAID levels, it only makes sense if the partitions are on different physical disk drives. If one of those drives fails, the block device provided by the RAID array will continue to function as normal. The example will be using RAID 1 for everything except [[swap]] and temporary data. Please note that with a software implementation, the RAID 1 level is the only option for the boot partition, because bootloaders reading the boot partition do not understand RAID, but a RAID 1 component partition can be read as a normal partition. The size of a RAID 1 array block device is the size of the smallest component partition.
 +
; [[Wikipedia:Standard RAID levels#RAID 5|RAID 5]]
 +
: Requires 3 or more physical drives, and provides the redundancy of RAID 1 combined with the speed and size benefits of RAID 0. RAID 5 uses striping, like RAID 0, but also stores parity blocks ''distributed across each member disk''. In the event of a failed disk, these parity blocks are used to reconstruct the data on a replacement disk. RAID 5 can withstand the loss of one member disk.
 +
: {{Note|RAID 5 is a common choice due to its combination of speed and data redundancy. The caveat is that if one drive were to fail before and that drive was replaced another drive failed too, all data will be lost.}}
  
 
===Nested RAID levels===
 
===Nested RAID levels===
; [[Wikipedia:Nested RAID levels#RAID 1 + 0|RAID 1+0]]: Commonly referred to as ''RAID 10'', is a nested RAID that combines two of the standard levels of RAID to gain performance and additional redundancy.
+
; [[Wikipedia:Nested RAID levels#RAID 1 + 0|RAID 1+0]]
 
+
: Commonly referred to as ''RAID 10'', is a nested RAID that combines two of the standard levels of RAID to gain performance and additional redundancy. It is the best alternative to RAID 5 when redundancy is crucial.
=== Redundancy ===
+
{{Warning|Installing a system with RAID is a complex process that may destroy data. Be sure to backup all data before proceeding.}}
+
 
+
RAID does not guarantee that data is safe. A RAID will not protect data if there is a fire, the computer is stolen or multiple hard drives fail. Therefore it is important to make backups (see [[Backup Programs|List of backup software]]).
+
  
 
=== RAID level comparison ===
 
=== RAID level comparison ===
Line 35: Line 38:
 
! RAID level!!Data redundancy!!Physical drive utilization!!Read performance!!Write performance!!Min drives
 
! RAID level!!Data redundancy!!Physical drive utilization!!Read performance!!Write performance!!Min drives
 
|-
 
|-
| '''0'''||'''No'''||100%||'''Superior'''||'''Superior'''||1
+
| '''0'''||'''No'''||100%||nX
 +
 
 +
'''Best'''
 +
||nX
 +
 
 +
'''Best'''
 +
||2
 
|-
 
|-
| '''1'''||Yes||50%||Very high||Very high||2
+
| '''1'''||Yes||50%||nX (theoretically)
 +
 
 +
1X (in practice)
 +
||1X||2
 
|-
 
|-
| '''5'''||Yes||67% - 94%||'''Superior'''||High||3
+
| '''5'''||Yes||67% - 94%||(n−1)X
 +
 
 +
'''Superior'''
 +
||(n−1)X
 +
 
 +
'''Superior'''
 +
||3
 
|-
 
|-
| '''6'''||Yes||50% - 88%||Very High||High||4
+
| '''6'''||Yes||50% - 88%||(n−2)X||(n−2)X||4
 +
|-
 +
| '''10'''||Yes||50%||(n−2)X||(n−2)X||4
 
|-
 
|-
| '''10'''||Yes||50%||Very high||Very high||4
 
 
|}
 
|}
 +
<nowiki>*</nowiki> Where ''n'' is standing for the number of dedicated disks.
 +
 +
== Implementation ==
 +
The RAID devices can be managed in different ways:
 +
 +
; Software RAID
 +
: This is the easier implementation as it does not rely on obscure proprietary firmware and software to be used. The array is managed by the operating system either by:
 +
:* by an abstraction layer (e.g. [[#Installation|mdadm]]); {{Note|This is the method we will use later in this guide. If you want to use this one too, read on.}}
 +
:* by a logical volume manager (e.g. [[LVM]]);
 +
:* by a component of a file system (e.g. [[ZFS]]).
 +
 +
; Hardware RAID
 +
: The array is directly managed by a dedicated hardware card installed in your computer to which the disks are directly connected. The RAID logic runs on an on-board processor independently of [[Wikipedia:Central processing unit|the host processor (CPU)]]. Although this solution is independent of any operating system, the latter requires a driver in order to function properly with the hardware RAID controller. The RAID array can either be configured via an option rom interface or, depending on the manufacturer, with a dedicated application when the OS has been installed. The configuration is transparent for the Linux kernel: it doesn't see the disks separately.
 +
 +
; [[Fakeraid|FakeRAID]]
 +
: This type of RAID is properly called BIOS or Onboard RAID, but is falsely advertised as hardware RAID. The array is managed by pseudo-RAID controllers where the RAID logic is implemented in an option rom or in the firmware itself [http://www.win-raid.com/t19f13-Intel-EFI-RAID-quot-SataDriver-quot-BIOS-Modules.html with a EFI SataDriver] (in case of [[UEFI]]), but are not full hardware RAID controllers with ''all'' RAID features implemented. Therefore, this type of RAID is sometimes called FakeRAID. {{Pkg|dmraid}} from the [[Official Repositories|official repositories]], will be used to deal with these controllers. Some FakeRAID controller examples: [[Wikipedia:Intel Rapid Storage Technology|Intel Rapid Storage]], JMicron JMB36x RAID ROM, AMD RAID, ASMedia 106x,...
 +
 +
===Which type of RAID do I have?===
 +
 +
As the process to set up software RAID is completely user driven, determining if you're using this implementation is quite evident.
 +
 +
However, discerning between FakeRAID and true hardware RAID can be more difficult. As stated, manufacturers often incorrectly distinguish these two types of RAID and false advertising is always possible. The best solution in this instance is to run the {{ic|lspci}} command and looking through the output to find your RAID controller. Then do a search to see what information you can find about your RAID controller. True hardware RAID controller are often rather expensive (~$400+), so if you customized a computer, it is very likely that choosing a hardware RAID setup made a very noticeable change in the computer's price.
  
 
==Installation==
 
==Installation==
[[pacman|Install]] {{Pkg|mdadm}} and {{Pkg|parted}}, available in the [[Official Repositories]].
+
We will need to [[pacman|install]] {{Pkg|mdadm}} and {{Pkg|parted}}, both available from the [[Official Repositories|official repositories]].
 +
 
 +
''mdadm'' is used for administering pure software RAID using plain block devices: the underlying hardware does not provides any RAID logic, just a supply of disks. ''mdadm'' will work with any collection of block devices. Even if unusual, you can thus make a RAID array from a collection of thumb drives.
  
 
===Prepare the device===
 
===Prepare the device===
To prevent possible issues down the line, you should consider wiping your entire disk before setting up RAID.  This should be repeated for each disk you will be using for RAID, these commands completely erase anything currently on the device!
+
{{Warning|These steps erase everything on a device, so type carefully.}}
{{Warning|These steps erase everything on the {{ic|/dev/disk-to-clean}} so type carefully}}
+
  
Erase any old RAID configuration info
+
To prevent possible issues each device in the RAID should be [[Securely wipe disk|securely wiped]]. Additionally, the following steps can be taken.
{{bc|1=# mdadm --zero-superblock /dev/disk-to-clean}}
+
  
Erase all partition-table data
+
Erase any old RAID configuration information on the device:
{{bc|1=# dd if=/dev/zero of=/dev/disk-to-clean bs=4096 count=1}}
+
{{bc|1=# mdadm --zero-superblock /dev/<drive>}}
  
Make sure kernel clears old entries
+
Verify that the kernel clears old entries:
 
{{bc|1=# partprobe -s}}
 
{{bc|1=# partprobe -s}}
  
Verify the entries in {{ic|/etc/fstab}} and {{ic|/etc/mdadm.conf}}
+
With a software RAID, disabling the hard disk cache will help prevent data loss during power loss, as long as you do not use a [[Wikipedia:Uninterruptible power supply|UPS]]. Repeat the command for each drive in the array. Note however, that this decreases performance.
 
+
{{bc|# hdparm -W 0 /dev/<drive>}}
With a software RAID, disabling the hard disk cache will help prevent data loss during power loss, as long as you do not use a [[Wikipedia:Uninterruptible power supply|UPS]]. Repeat the command for each drive in the array. Note however, that this decreases performance.
+
{{bc|# hdparm -W 0 /dev/path_to_disk}}
+
  
 
===Create the partition table===
 
===Create the partition table===
The RAID setup varies between different RAID-levels. If you know what RAID you want and already set up your hardware accordingly, you can proceed with formatting the disks you want in your array. It is also possible to create a RAID directly on the raw disks (without partitions), but not recommended because it can cause problems when swapping a failed disk.
+
It is recommended to format the disks you want in your array. It is also possible to create a RAID directly on the raw disks (without partitions), but not recommended because it can cause problems when swapping a failed disk.
  
When replacing a failed disk of a RAID-array, the new disk has to be exactly the same size as the failed disk or bigger — otherwise the array recreation process will not work. Even hard drives of the same manufacturer and model can have small size differences. By leaving a little space at the end of the disk unallocated one can compensate for the size differences between drives, which makes choosing a replacement drive model easier. Therefore, it is good practice to leave about 100 MB of unallocated space at the end of the disk.  
+
When replacing a failed disk of a RAID, the new disk has to be exactly the same size as the failed disk or bigger — otherwise the array recreation process will not work. Even hard drives of the same manufacturer and model can have small size differences. By leaving a little space at the end of the disk unallocated one can compensate for the size differences between drives, which makes choosing a replacement drive model easier. Therefore, it is good practice to leave about 100 MB of unallocated space at the end of the disk.
 
+
Format one of the drives in the array with your favorite tool. For example,
+
{{bc|# cfdisk /dev/path_to_disk}}
+
{{Tip|Using GParted to create the partitions and align them to the cylinder will create optimized disk alignment. This can be achieved using the [http://gparted.sourceforge.net/livecd.php Gnome Partition Editor Live Media].}}
+
  
 
====Partition code====
 
====Partition code====
The two [[Wikipedia:Partition types|partition type]]s that are applicable to RAID devices are Non-FS data and Linux RAID auto. Non-FS data is recommended, as your array is not auto-assembled during boot. With Linux RAID auto one may run into trouble when booting from a live-cd or when installing the degraded RAID-array in a different system (maybe with other degraded RAID-arrays in worst case) as Linux will try to automatically assemble and resync the array which could render your data on the array unreadable if it fails.
+
On GPT partition tables, the preferred RAID partition type is GUID A19D880F-05FC-4D3B-A006-743F0F84911E, which in gpt capable fdisk (such as gdisk) is the hex code fd00.
 +
 
 +
The two [[Wikipedia:Partition types|partition type]]s on mbr block devices that are applicable to RAID devices are Non-FS data and Linux RAID auto. Non-FS data is recommended, as your array is not auto-assembled during boot. With Linux RAID auto one may run into trouble when booting from a live-cd or when installing the degraded RAID in a different system (maybe with other degraded RAIDs in worst case) as Linux will try to automatically assemble and resync the array which could render your data on the array unreadable if it fails.
  
 
{{note|cfdisk and mkpart use a set of "filesystem types" to set the partition codes. Each type corresponds to a partition code (see [http://www.gnu.org/software/parted/manual/html_node/mkpart.html#mkpart Parted User's Manual]). It uses the {{ic|da}} type to denote Non-FS data and {{ic|fd}} for Linux RAID auto.}}
 
{{note|cfdisk and mkpart use a set of "filesystem types" to set the partition codes. Each type corresponds to a partition code (see [http://www.gnu.org/software/parted/manual/html_node/mkpart.html#mkpart Parted User's Manual]). It uses the {{ic|da}} type to denote Non-FS data and {{ic|fd}} for Linux RAID auto.}}
 +
 +
Once you have selected a partition type follow the [[Beginner's Guide]] to [[Beginner's Guide#Prepare the storage drive|prepare the storage drive]].
  
 
===Copy the partition table===
 
===Copy the partition table===
Once you have a properly partitioned and aligned disk you can copy the setup to any other disk.
+
Once you have a properly partitioned the disk copy the partition table to the other disks in the RAID.
  
 
Verify your partitions meet basic requirements:
 
Verify your partitions meet basic requirements:
{{bc|1=# sfdisk -lRV /dev/path_to_formatted_array_disk}}
+
{{bc|1=# sfdisk -lRV /dev/<drive>}}
  
 
Dump the partition table from the formatted disk to a file:
 
Dump the partition table from the formatted disk to a file:
{{bc|<nowiki># sfdisk -d /dev/path_to_formatted_array_disk > ~/formatted_array.dump</nowiki>}}
+
{{bc|<nowiki># sfdisk -d /dev/<drive> > ~/partitions.dump</nowiki>}}
  
Copy the partition table from the disk dump file to all other disks in the array:  
+
Copy the partition table from the dump file to all other disks in the array:  
{{bc|<nowiki># sfdisk /dev/path_to_unformatted_array_disk < ~/formatted_array.dump</nowiki>}}
+
{{bc|<nowiki># sfdisk /dev/<drive> < ~/partitions.dump</nowiki>}}
  
After repeating the command for every unformatted disk of the array, verify that the disks are identical with
+
After repeating the command for every other disk of the array, verify that the disks are identical with {{ic|fdisk -l}} or {{ic|sfdisk -l -u S}}.
# fdisk -l
+
or
+
# sfdisk -l -u S
+
  
 
===Build the array===
 
===Build the array===
Now build the array (e.g. [http://fomori.org/blog/blog/2011/10/19/raid5-server-to-hold-all-your-data-%e2%80%94-the-nas-alternative/ post on RAID5 setup]).
+
Use {{ic|mdadm}} to build the array.
  
 
{{Warning|Make sure to change the '''bold values''' below to match your setup.}}
 
{{Warning|Make sure to change the '''bold values''' below to match your setup.}}
 +
{{Note|If this is a RAID 1 array which you intend to boot from using [[Syslinux]] you need to change the metadata value to 1.0 ([[Syslinux]] as of version 4.07 does not understand md 1.2 metadata)}}
  
{{bc| <nowiki># mdadm --create --verbose /dev/md/your_array --level=</nowiki>'''5''' <nowiki>--metadata=</nowiki>'''1.2''' <nowiki>--chunk=</nowiki>'''256''' <nowiki>--raid-devices=</nowiki>'''5 /dev/path_to_array_disk-1 /dev/path_to_array_disk-2 /dev/path_to_array_disk-3 /dev/path_to_array_disk-4 /dev/path_to_array_disk-5''' }}
+
{{bc| <nowiki># mdadm --create --verbose --level=</nowiki>'''5''' <nowiki>--metadata=</nowiki>'''1.2''' <nowiki>--chunk=</nowiki>'''256''' <nowiki>--raid-devices=</nowiki>'''5 /dev/md/<raid-device-name> /dev/<disk1> /dev/<disk2> /dev/<disk3> /dev/<disk4> /dev/<disk5>''' }}
 
+
The array is created under the virtual device {{ic|/dev/md/your_array}}, assembled and ready to use (in degraded mode). You can directly start using it while mdadm resyncs the array in the background. It can take a long time to restore parity, you can check the progress with:
+
  
 +
The array is created under the virtual device {{ic|/dev/md/<array>}}, assembled and ready to use (in degraded mode). You can directly start using it while mdadm resyncs the array in the background. It can take a long time to restore parity. Check the progress with:
 
{{bc|$ cat /proc/mdstat}}
 
{{bc|$ cat /proc/mdstat}}
  
Line 113: Line 149:
  
 
Redirect the contents of the metadata stored on the named devices to the configuration file:
 
Redirect the contents of the metadata stored on the named devices to the configuration file:
  # mdadm --examine --scan > /etc/mdadm.conf
+
  # mdadm --detail --scan > /etc/mdadm.conf
  
 
{{Note|If you are updating your RAID configuration from within the Arch Installer by swapping to another TTY, you will need to ensure that you are writing to the correct {{ic|mdadm.conf}} file:}}
 
{{Note|If you are updating your RAID configuration from within the Arch Installer by swapping to another TTY, you will need to ensure that you are writing to the correct {{ic|mdadm.conf}} file:}}
  # mdadm --examine --scan > /mnt/etc/mdadm.conf
+
  # mdadm --detail --scan > /mnt/etc/mdadm.conf
  
 
Once the configuration file has been updated the array can be assembled using mdadm:
 
Once the configuration file has been updated the array can be assembled using mdadm:
Line 132: Line 168:
  
 
or write it to {{ic|rc.local}}.
 
or write it to {{ic|rc.local}}.
 
  
 
===Add to kernel image===
 
===Add to kernel image===
See [[mkinitcpio]] for more info.
+
Add {{ic|mdadm_udev}} to the [[Mkinitcpio#HOOKS|HOOKS]] section of the [[Mkinitcpio]] file before the {{ic|filesystems}} hook.  This will add support for mdadm directly into the init image.
 +
{{bc|1= HOOKS="base udev autodetect block '''mdadm_udev''' filesystems usbinput fsck"}}
  
Add '''mdadm''' or '''mdadm_udev''' to the ''HOOKS='' section of the {{ic|/etc/mkinitcpio.conf}} file before the filesystems hook. This will add support for mdadm directly into the init image.
+
Add the {{ic|raid456}} module and the filesystem module created on the RAID (e.g. {{ic|ext4}}) to the [[Mkinitcpio#MODULES|MODULES]] section. This will build these modules into the kernel image. For example,
{{bc|1= HOOKS="base udev autodetect pata scsi sata '''mdadm_udev''' filesystems usbinput fsck"}}
+
{{bc|<nowiki>MODULES="ext4 raid456"</nowiki>}}
  
You can view available hooks for the ''HOOKS='' section with: {{bc|1=# mkinitcpio -L}} and find out information about each hook with: {{bc|1=# mkinitcpio -H mdadm_udev}}
+
Next regenerate the initramfs image (see [[Mkinitcpio#Image creation and activation|Image creation and activation]]).
 
+
Add '''raid456''' module and the filesystem module used on the raid (ext4) to the ''MODULES='' section of the {{ic|/etc/mkinitcpio.conf}} file.  This will build these modules into the kernel image.
+
{{bc|1= MODULES="'''ext4 raid456'''"}}
+
 
+
Next regenerate your initrd with: {{bc|1=# mkinitcpio -p linux}} or if using {{pkg|linux-lts}} {{bc|1=# mkinitcpio -p linux-lts}}
+
  
 
== Mounting from a Live CD ==
 
== Mounting from a Live CD ==
  
 
If you want to mount your RAID partition from a Live CD, use
 
If you want to mount your RAID partition from a Live CD, use
  # mdadm --assemble /dev/md0 /dev/sda3 /dev/sdb3 /dev/sdc3
+
  # mdadm --assemble /dev/<disk1> /dev/<disk2> /dev/<disk3> /dev/<disk4>
 
+
(or whatever mdX and drives apply to you)
+
  
{{Note | Live CDs like [http://www.sysresccd.org/Main_Page SystemrescueCD] assemble the RAID arrays automatically at boot time if you used the partition type fd at the install of the array)}}
+
{{Note|Live CDs like [http://www.sysresccd.org/Main_Page SystemrescueCD] assemble the RAIDs automatically at boot time if you used the partition type {{ic|fd}} at the install of the array.}}
  
 
==Removing device, stop using the array==
 
==Removing device, stop using the array==
Line 253: Line 282:
  
 
{{bc|<nowiki>iotop -a -p $(sed 's, , -p ,g' <<<`pgrep "_raid|_resync|jbd2"`)</nowiki>}}
 
{{bc|<nowiki>iotop -a -p $(sed 's, , -p ,g' <<<`pgrep "_raid|_resync|jbd2"`)</nowiki>}}
 +
 +
===Track IO with iostat===
 +
 +
The {{pkg|iostat}} package lets you view input/output statistics for devices and partitions.
 +
 +
  iostat -dmy 1 /dev/md0
 +
  iostat -dmy 1 # all
  
 
===Mailing on events===
 
===Mailing on events===
Line 301: Line 337:
 
{{ic|hdparm}} should '''NOT''' be used to benchmark a RAID, because it provides very inconsistent results.
 
{{ic|hdparm}} should '''NOT''' be used to benchmark a RAID, because it provides very inconsistent results.
  
== Additional Resources ==
+
== See also ==
* [http://en.gentoo-wiki.com/wiki/RAID/Software RAID/Software] on the Gentoo Wiki
+
* [http://en.gentoo-wiki.com/wiki/Software_RAID_Install Software RAID Install] on the Gentoo Wiki
+
 
* [http://www.gentoo.org/doc/en/articles/software-raid-p1.xml Software RAID in the new Linux 2.4 kernel, Part 1] and [http://www.gentoo.org/doc/en/articles/software-raid-p2.xml Part 2] in the Gentoo Linux Docs
 
* [http://www.gentoo.org/doc/en/articles/software-raid-p1.xml Software RAID in the new Linux 2.4 kernel, Part 1] and [http://www.gentoo.org/doc/en/articles/software-raid-p2.xml Part 2] in the Gentoo Linux Docs
 
* [http://raid.wiki.kernel.org/index.php/Linux_Raid Linux RAID wiki entry] on The Linux Kernel Archives
 
* [http://raid.wiki.kernel.org/index.php/Linux_Raid Linux RAID wiki entry] on The Linux Kernel Archives
 +
* [https://raid.wiki.kernel.org/index.php/Write-intent_bitmap How Bitmaps Work]
 
* [http://linux-101.org/howto/arch-linux-software-raid-installation-guide Arch Linux software RAID installation guide] on Linux 101
 
* [http://linux-101.org/howto/arch-linux-software-raid-installation-guide Arch Linux software RAID installation guide] on Linux 101
 
* [http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-raid.html Chapter 15: Redundant Array of Independent Disks (RAID)] of Red Hat Enterprise Linux 6 Documentation
 
* [http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-raid.html Chapter 15: Redundant Array of Independent Disks (RAID)] of Red Hat Enterprise Linux 6 Documentation
Line 313: Line 348:
 
* [http://www.linux-mag.com/id/7924/ Introduction to RAID], [http://www.linux-mag.com/id/7931/ Nested-RAID: RAID-5 and RAID-6 Based Configurations], [http://www.linux-mag.com/id/7928/ Intro to Nested-RAID: RAID-01 and RAID-10], and [http://www.linux-mag.com/id/7932/ Nested-RAID: The Triple Lindy] in Linux Magazine
 
* [http://www.linux-mag.com/id/7924/ Introduction to RAID], [http://www.linux-mag.com/id/7931/ Nested-RAID: RAID-5 and RAID-6 Based Configurations], [http://www.linux-mag.com/id/7928/ Intro to Nested-RAID: RAID-01 and RAID-10], and [http://www.linux-mag.com/id/7932/ Nested-RAID: The Triple Lindy] in Linux Magazine
 
* [http://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html HowTo: Speed Up Linux Software Raid Building And Re-syncing]
 
* [http://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html HowTo: Speed Up Linux Software Raid Building And Re-syncing]
 +
* [http://fomori.org/blog/?p=94 RAID5-Server to hold all your data]
  
 
'''mdadm'''
 
'''mdadm'''
Line 321: Line 357:
 
'''Forum threads'''
 
'''Forum threads'''
 
* [http://forums.overclockers.com.au/showthread.php?t=865333 Raid Performance Improvements with bitmaps]
 
* [http://forums.overclockers.com.au/showthread.php?t=865333 Raid Performance Improvements with bitmaps]
* 2011-08-28 - Arch Linux - [https://bbs.archlinux.org/viewtopic.php?id=125445 GRUB and GRUB2]
+
* [https://bbs.archlinux.org/viewtopic.php?id=125445 GRUB and GRUB2]
* 2011-08-03 - Arch Linux - [https://bbs.archlinux.org/viewtopic.php?id=123698 Can't install grub2 on software RAID]
+
* [https://bbs.archlinux.org/viewtopic.php?id=123698 Can't install grub2 on software RAID]
* 2011-07-29 - Gentoo - [http://forums.gentoo.org/viewtopic-t-888624-start-0.html Use RAID metadata 1.2 in boot and root partition]
+
* [http://forums.gentoo.org/viewtopic-t-888624-start-0.html Use RAID metadata 1.2 in boot and root partition]
  
 
'''RAID with encryption'''
 
'''RAID with encryption'''
 
* [http://www.shimari.com/dm-crypt-on-raid/ Linux/Fedora: Encrypt /home and swap over RAID with dm-crypt] by Justin Wells
 
* [http://www.shimari.com/dm-crypt-on-raid/ Linux/Fedora: Encrypt /home and swap over RAID with dm-crypt] by Justin Wells

Revision as of 21:57, 16 August 2013

Template:Article summary start Template:Article summary text Template:Article summary heading Template:Article summary link Template:Article summary link Template:Article summary heading Template:Article summary wiki Template:Article summary wiki Template:Article summary wiki Template:Article summary end

Introduction

Template:Wikipedia Redundant Array of Independent Disks (RAID) is a storage technology that combines multiple disk drive components (typically disk drives or partitions thereof) into a logical unit. Depending the RAID implementation, this logical unit can be a file system or an additional transparent layer that can hold several partitions. Data is distributed across the drives in one of several ways called "RAID levels", depending on the level of redundancy and performance required. The RAID level chosen can thus prevent data loss in the event of a hard disk failure, increase performance or be a combination of both.

Despite redundancy implied by most RAID levels, RAID does not guarantee that data is safe. A RAID will not protect data if there is a fire, the computer is stolen or multiple hard drives fail at once. Furthermore, installing a system with RAID is a complex process that may destroy data.
Warning: Therefore, be sure to backup all data before proceeding.

There are many different levels of RAID, please find hereafter the most commonly used ones.

Standard RAID levels

RAID 0
Uses striping to combine disks. Even if does not provide redundancy, it is anyway considered as a RAID. It does, however, provide a big speed benefit. If you think the speed increase is worth the possibility of data loss (for your swap partition for example), choose this RAID level. On a server, RAID 1 and RAID 5 arrays are more appropriate. The size of a RAID 0 array block device is the size of the smallest component partition times the number of component partitions.
RAID 1
The most straightforward RAID level: straight mirroring. As with other RAID levels, it only makes sense if the partitions are on different physical disk drives. If one of those drives fails, the block device provided by the RAID array will continue to function as normal. The example will be using RAID 1 for everything except swap and temporary data. Please note that with a software implementation, the RAID 1 level is the only option for the boot partition, because bootloaders reading the boot partition do not understand RAID, but a RAID 1 component partition can be read as a normal partition. The size of a RAID 1 array block device is the size of the smallest component partition.
RAID 5
Requires 3 or more physical drives, and provides the redundancy of RAID 1 combined with the speed and size benefits of RAID 0. RAID 5 uses striping, like RAID 0, but also stores parity blocks distributed across each member disk. In the event of a failed disk, these parity blocks are used to reconstruct the data on a replacement disk. RAID 5 can withstand the loss of one member disk.
Note: RAID 5 is a common choice due to its combination of speed and data redundancy. The caveat is that if one drive were to fail before and that drive was replaced another drive failed too, all data will be lost.

Nested RAID levels

RAID 1+0
Commonly referred to as RAID 10, is a nested RAID that combines two of the standard levels of RAID to gain performance and additional redundancy. It is the best alternative to RAID 5 when redundancy is crucial.

RAID level comparison

RAID level Data redundancy Physical drive utilization Read performance Write performance Min drives
0 No 100% nX

Best

nX

Best

2
1 Yes 50% nX (theoretically)

1X (in practice)

1X 2
5 Yes 67% - 94% (n−1)X

Superior

(n−1)X

Superior

3
6 Yes 50% - 88% (n−2)X (n−2)X 4
10 Yes 50% (n−2)X (n−2)X 4

* Where n is standing for the number of dedicated disks.

Implementation

The RAID devices can be managed in different ways:

Software RAID
This is the easier implementation as it does not rely on obscure proprietary firmware and software to be used. The array is managed by the operating system either by:
  • by an abstraction layer (e.g. mdadm);
    Note: This is the method we will use later in this guide. If you want to use this one too, read on.
  • by a logical volume manager (e.g. LVM);
  • by a component of a file system (e.g. ZFS).
Hardware RAID
The array is directly managed by a dedicated hardware card installed in your computer to which the disks are directly connected. The RAID logic runs on an on-board processor independently of the host processor (CPU). Although this solution is independent of any operating system, the latter requires a driver in order to function properly with the hardware RAID controller. The RAID array can either be configured via an option rom interface or, depending on the manufacturer, with a dedicated application when the OS has been installed. The configuration is transparent for the Linux kernel: it doesn't see the disks separately.
FakeRAID
This type of RAID is properly called BIOS or Onboard RAID, but is falsely advertised as hardware RAID. The array is managed by pseudo-RAID controllers where the RAID logic is implemented in an option rom or in the firmware itself with a EFI SataDriver (in case of UEFI), but are not full hardware RAID controllers with all RAID features implemented. Therefore, this type of RAID is sometimes called FakeRAID. dmraid from the official repositories, will be used to deal with these controllers. Some FakeRAID controller examples: Intel Rapid Storage, JMicron JMB36x RAID ROM, AMD RAID, ASMedia 106x,...

Which type of RAID do I have?

As the process to set up software RAID is completely user driven, determining if you're using this implementation is quite evident.

However, discerning between FakeRAID and true hardware RAID can be more difficult. As stated, manufacturers often incorrectly distinguish these two types of RAID and false advertising is always possible. The best solution in this instance is to run the lspci command and looking through the output to find your RAID controller. Then do a search to see what information you can find about your RAID controller. True hardware RAID controller are often rather expensive (~$400+), so if you customized a computer, it is very likely that choosing a hardware RAID setup made a very noticeable change in the computer's price.

Installation

We will need to install mdadm and parted, both available from the official repositories.

mdadm is used for administering pure software RAID using plain block devices: the underlying hardware does not provides any RAID logic, just a supply of disks. mdadm will work with any collection of block devices. Even if unusual, you can thus make a RAID array from a collection of thumb drives.

Prepare the device

Warning: These steps erase everything on a device, so type carefully.

To prevent possible issues each device in the RAID should be securely wiped. Additionally, the following steps can be taken.

Erase any old RAID configuration information on the device:

# mdadm --zero-superblock /dev/<drive>

Verify that the kernel clears old entries:

# partprobe -s

With a software RAID, disabling the hard disk cache will help prevent data loss during power loss, as long as you do not use a UPS. Repeat the command for each drive in the array. Note however, that this decreases performance.

# hdparm -W 0 /dev/<drive>

Create the partition table

It is recommended to format the disks you want in your array. It is also possible to create a RAID directly on the raw disks (without partitions), but not recommended because it can cause problems when swapping a failed disk.

When replacing a failed disk of a RAID, the new disk has to be exactly the same size as the failed disk or bigger — otherwise the array recreation process will not work. Even hard drives of the same manufacturer and model can have small size differences. By leaving a little space at the end of the disk unallocated one can compensate for the size differences between drives, which makes choosing a replacement drive model easier. Therefore, it is good practice to leave about 100 MB of unallocated space at the end of the disk.

Partition code

On GPT partition tables, the preferred RAID partition type is GUID A19D880F-05FC-4D3B-A006-743F0F84911E, which in gpt capable fdisk (such as gdisk) is the hex code fd00.

The two partition types on mbr block devices that are applicable to RAID devices are Non-FS data and Linux RAID auto. Non-FS data is recommended, as your array is not auto-assembled during boot. With Linux RAID auto one may run into trouble when booting from a live-cd or when installing the degraded RAID in a different system (maybe with other degraded RAIDs in worst case) as Linux will try to automatically assemble and resync the array which could render your data on the array unreadable if it fails.

Note: cfdisk and mkpart use a set of "filesystem types" to set the partition codes. Each type corresponds to a partition code (see Parted User's Manual). It uses the da type to denote Non-FS data and fd for Linux RAID auto.

Once you have selected a partition type follow the Beginner's Guide to prepare the storage drive.

Copy the partition table

Once you have a properly partitioned the disk copy the partition table to the other disks in the RAID.

Verify your partitions meet basic requirements:

# sfdisk -lRV /dev/<drive>

Dump the partition table from the formatted disk to a file:

# sfdisk -d /dev/<drive> > ~/partitions.dump

Copy the partition table from the dump file to all other disks in the array:

# sfdisk /dev/<drive> < ~/partitions.dump

After repeating the command for every other disk of the array, verify that the disks are identical with fdisk -l or sfdisk -l -u S.

Build the array

Use mdadm to build the array.

Warning: Make sure to change the bold values below to match your setup.
Note: If this is a RAID 1 array which you intend to boot from using Syslinux you need to change the metadata value to 1.0 (Syslinux as of version 4.07 does not understand md 1.2 metadata)
 # mdadm --create --verbose --level=5 --metadata=1.2 --chunk=256 --raid-devices=5 /dev/md/<raid-device-name> /dev/<disk1> /dev/<disk2> /dev/<disk3> /dev/<disk4> /dev/<disk5> 

The array is created under the virtual device /dev/md/<array>, assembled and ready to use (in degraded mode). You can directly start using it while mdadm resyncs the array in the background. It can take a long time to restore parity. Check the progress with:

$ cat /proc/mdstat

Update configuration file

Since the installer builds the initrd using /etc/mdadm.conf in the target system, you should update the default configuration file. The default file can be overwritten using the redirection operator, because it only contains explanatory comments.

Redirect the contents of the metadata stored on the named devices to the configuration file:

# mdadm --detail --scan > /etc/mdadm.conf
Note: If you are updating your RAID configuration from within the Arch Installer by swapping to another TTY, you will need to ensure that you are writing to the correct mdadm.conf file:
# mdadm --detail --scan > /mnt/etc/mdadm.conf

Once the configuration file has been updated the array can be assembled using mdadm:

# mdadm --assemble --scan

Configure filesystem

The array can now be formatted like any other disk, just keep in mind that:

  • Due to the large volume size not all filesystems are suited (see: File system limits).
  • The filesystem should support growing and shrinking while online (see: File system features).
  • The biggest performance gain you can achieve on a raid array is to make sure you format the volume aligned to your RAID stripe size (see: RAID Math).

Assemble array on boot

If you selected the Non-FS data partition code the array will not be automatically recreated after the next boot. To assemble the array issue the following command:

 # mdadm --assemble --scan /dev/your_array --uuid=your_array_uuid 

or write it to rc.local.

Add to kernel image

Add mdadm_udev to the HOOKS section of the Mkinitcpio file before the filesystems hook. This will add support for mdadm directly into the init image.

HOOKS="base udev autodetect block mdadm_udev filesystems usbinput fsck"

Add the raid456 module and the filesystem module created on the RAID (e.g. ext4) to the MODULES section. This will build these modules into the kernel image. For example,

MODULES="ext4 raid456"

Next regenerate the initramfs image (see Image creation and activation).

Mounting from a Live CD

If you want to mount your RAID partition from a Live CD, use

# mdadm --assemble /dev/<disk1> /dev/<disk2> /dev/<disk3> /dev/<disk4>
Note: Live CDs like SystemrescueCD assemble the RAIDs automatically at boot time if you used the partition type fd at the install of the array.

Removing device, stop using the array

You can remove a device from the array after you mark it as faulty.

# mdadm --fail /dev/md0 /dev/sdxx

Then you can remove it from the array.

# mdadm -r /dev/md0 /dev/sdxx

Remove device permanently (for example in the case you want to use it individally from now on). Issue the two commands described above then:

# mdadm --zero-superblock /dev/sdxx

After this you can use the disk as you did before creating the array.

Warning: If you reuse the removed disk without zeroing the superblock you will LOSE all your data next boot. (After mdadm will try to use it as the part of the raid array). DO NOT issue this command on linear or RAID0 arrays or you will LOSE all your data on the raid array.

Stop using an array:

  1. Umount target array
  2. Stop the array with: mdadm --stop /dev/md0
  3. Repeat the three command described in the beginning of this section on each device.
  4. Remove the corresponding line from /etc/mdadm.conf

Adding a device to the array

Adding new devices with mdadm can be done on a running system with the devices mounted. Partition the new device /dev/sdx using the same layout as one of those already in the arrays /dev/sda.

# sfdisk -d /dev/sda > table
# sfdisk /dev/sdx < table

Assemble the RAID arrays if they are not already assembled:

# mdadm --assemble /dev/md1 /dev/sda1 /dev/sdb1 /dev/sdc1
# mdadm --assemble /dev/md2 /dev/sda2 /dev/sdb2 /dev/sdc2
# mdadm --assemble /dev/md0 /dev/sda3 /dev/sdb3 /dev/sdc3

First, add the new device as a Spare Device to all of the arrays. We will assume you have followed the guide and use separate arrays for /boot RAID 1 (/dev/md1), swap RAID 1 (/dev/md2) and root RAID 5 (/dev/md0).

# mdadm --add /dev/md1 /dev/sdx1
# mdadm --add /dev/md2 /dev/sdx2
# mdadm --add /dev/md0 /dev/sdx3

This should not take long for mdadm to do. Check the progress with:

# cat /proc/mdstat

Check that the device has been added with the command:

# mdadm --misc --detail /dev/md0

It should be listed as a Spare Device.

Tell mdadm to grow the arrays from 3 devices to 4 (or however many devices you want to use):

# mdadm --grow -n 4 /dev/md1
# mdadm --grow -n 4 /dev/md2
# mdadm --grow -n 4 /dev/md0

This will probably take several hours. You need to wait for it to finish before you can continue. Check the progress in /proc/mdstat. The RAID 1 arrays should automatically sync /boot and swap but you need to install Grub on the MBR of the new device manually. Installing_with_Software_RAID_or_LVM#Install_Grub_on_the_Alternate_Boot_Drives

The rest of this guide will explain how to resize the underlying LVM and filesystem on the RAID 5 array.

Note: I am not sure if this can be done with the volumes mounted and will assume you are booting from a live-cd/usb

If you are have encrypted your LVM volumes with LUKS, you need resize the LUKS volume first. Otherwise, ignore this step.

# cryptsetup luksOpen /dev/md0 cryptedlvm
# cryptsetup resize cryptedlvm

Activate the LVM volume groups:

# vgscan
# vgchange -ay

Resize the LVM Physical Volume /dev/md0 (or e.g. /dev/mapper/cryptedlvm if using LUKS) to take up all the available space on the array. You can list them with the command "pvdisplay".

# pvresize /dev/md0

Resize the Logical Volume you wish to allocate the new space to. You can list them with "lvdisplay". Assuming you want to put it all to your /home volume:

# lvresize -l +100%FREE /dev/array/home

To resize the filesystem to allocate the new space use the appropriate tool. If using ext2 you can resize a mounted filesystem with ext2online. For ext3 you can use resize2fs or ext2resize but not while mounted.

You should check the filesystem before resizing.

# e2fsck -f /dev/array/home
# resize2fs /dev/array/home

Read the manuals for lvresize and resize2fs if you want to customize the sizes for the volumes.

Monitoring

A simple one-liner that prints out the status of your Raid devices:

awk '/^md/ {printf "%s: ", $1}; /blocks/ {print $NF}' </proc/mdstat
md1: [UU]
md0: [UU]

Watch mdstat

watch -t 'cat /proc/mdstat'

Or preferably using tmux

tmux split-window -l 12 "watch -t 'cat /proc/mdstat'"

Track IO with iotop

The iotop package lets you view the input/output stats for processes. Use this command to view the IO for raid threads.

iotop -a -p $(sed 's, , -p ,g' <<<`pgrep "_raid|_resync|jbd2"`)

Track IO with iostat

The iostat package lets you view input/output statistics for devices and partitions.

 iostat -dmy 1 /dev/md0
 iostat -dmy 1 # all

Mailing on events

You need a smtp mail server (sendmail) or at least an email forwarder (ssmtp/msmtp). Be sure you have configured an email in /etc/mdadm.conf

# mdadm --monitor --scan --test

When it is ready you can enable the service

# systemctl enable mdadm.service

Troubleshooting

If you are getting error when you reboot about "invalid raid superblock magic" and you have additional hard drives other than the ones you installed to, check that your hard drive order is correct. During installation, your RAID devices may be hdd, hde and hdf, but during boot they may be hda, hdb and hdc. Adjust your kernel line accordingly. This is what happened to me anyway.

Start arrays read-only

When an md array is started, the superblock will be written, and resync may begin. To start read-only set the kernel module md_mod parameter start_ro. When this is set, new arrays get an 'auto-ro' mode, which disables all internal io (superblock updates, resync, recovery) and is automatically switched to 'rw' when the first write request arrives.

Note: The array can be set to true 'ro' mode using mdadm -r before the first write request, or resync can be started without a write using mdadm -w.

To set the parameter at boot, add md_mod.start_ro=1 to your kernel line.

Or set it at module load time from /etc/modprobe.d/ file or from directly from /sys/.

echo 1 > /sys/module/md_mod/parameters/start_ro

Recovering from a broken or missing drive in the raid

You might get the above mentioned error also when one of the drives breaks for whatever reason. In that case you will have to force the raid to still turn on even with one disk short. Type this (change where needed):

# mdadm --manage /dev/md0 --run

Now you should be able to mount it again with something like this (if you had it in fstab):

# mount /dev/md0

Now the raid should be working again and available to use, however with one disk short! So, to add that one disc partition it the way like described above in Prepare the device. Once that is done you can add the new disk to the raid by doing:

# mdadm --manage --add /dev/md0 /dev/sdd1

If you type:

# cat /proc/mdstat

you probably see that the raid is now active and rebuilding.

You also might want to update your configuration (see: #Update configuration file).

Benchmarking

There are several tools for benchmarking a RAID. The most notable improvement is the speed increase when multiple threads are reading from the same RAID volume.

Tiobench specifically benchmarks these performance improvements by measuring fully-threaded I/O on the disk.

Bonnie++ tests database type access to one or more files, and creation, reading, and deleting of small files which can simulate the usage of programs such as Squid, INN, or Maildir format e-mail. The enclosed ZCAV program tests the performance of different zones of a hard drive without writing any data to the disk.

hdparm should NOT be used to benchmark a RAID, because it provides very inconsistent results.

See also

mdadm

Forum threads

RAID with encryption