Difference between revisions of "ZFS/Virtual disks"

From ArchWiki
< ZFS
Jump to: navigation, search
(Mark as stub.)
(first draft)
Line 1: Line 1:
{{Stub|Under construction}}
+
[[Category:File systems]]
 +
{{Article summary start}}
 +
{{Article summary text|This article covers some basic tasks and usage of ZFS.  It differs from the parent article linked below somwhat in that the examples herein are demonstrated on a zpool built from virtual disks.  So long as users do not place any critical data on the resulting zpool, they are free to experiment without fear of actual data loss.}}
 +
{{Article summary heading|Related}}
 +
{{Article summary wiki|ZFS}}
 +
{{Article summary end}}
  
 +
Under construction
 
[[User:Graysky|Graysky]] ([[User talk:Graysky|talk]]) 11:42, 20 October 2013 (UTC)
 
[[User:Graysky|Graysky]] ([[User talk:Graysky|talk]]) 11:42, 20 October 2013 (UTC)
 +
 +
As the article summary notes, the examples in this article are shown with a set of virtual discs known in ZFS terms as VDEVs. Users may create their VDEVs either on an existing physical disk or in tmpfs (RAMdisk) depending on the amount of free memory on the system.
 +
 +
== Install the ZFS Family of Packages ==
 +
Due to differences in licencing, ZFS bins and kernel modules are easily distributed from source, but no-so-easily packaged as pre-compiled sets.  Read and follow to the [[ZFS#Installation]] section to build and install the needed bins/modules.
 +
 +
== Creating and Destroying Zpools ==
 +
Management of ZFS is pretty simplistic with only two utils needed:
 +
* {{ic|/usr/bin/zpool}}
 +
* {{ic|/usr/bin/zfs}}
 +
 +
=== RAIDZ1 ===
 +
The minimum number of drives for a RAIDZ1 is three.  In his [https://pthree.org/2012/12/13/zfs-administration-part-viii-zpool-best-practices-and-caveats/ blog] on ZFS (an excellent read), Aaron Toponce recommends following the "power of two plus parity" recommendation.  This is for storage space efficiency and hitting the "sweet spot" in performance.  For RAIDZ-1, use three (2+1), five (4+1), or nine (8+1) disks. This example will use the most simplistic set of (2+1).
 +
 +
Create three x 2G files to serve as virtual hardrives:
 +
$ for i in {1..3}; do truncate -s 2G /scratch/$i.img; done
 +
 +
Assemble the RAIDZ1:
 +
# zpool create myraidz1 raidz1 /scratch/1.img /scratch/2.img /scratch/3.img
 +
 +
Notice that a 3.91G zpool has been created and mounted for us:
 +
{{hc|# zfs list|<nowiki>
 +
NAME  USED  AVAIL  REFER  MOUNTPOINT
 +
test  139K  3.91G  38.6K  /myraidz1
 +
</nowiki>}}
 +
 +
The status of the device can be queried:
 +
{{hc|# zpool status myraidz1|<nowiki>
 +
  pool: myraidz1
 +
state: ONLINE
 +
  scan: none requested
 +
config:
 +
 +
NAME                STATE    READ WRITE CKSUM
 +
myraidz1            ONLINE      0    0    0
 +
  raidz1-0          ONLINE      0    0    0
 +
    /scratch/1.img  ONLINE      0    0    0
 +
    /scratch/2.img  ONLINE      0    0    0
 +
    /scratch/3.img  ONLINE      0    0    0
 +
 +
errors: No known data errors
 +
</nowiki>}}
 +
 +
To destroy a zpool:
 +
# zpool destroy myraidz1
 +
 +
===RAIDZ2 and RAIDZ3===
 +
Higher level ZRAIDs can be assembled in a like fashion by adjusting the for statement to create the image files, by specifying "raidz2" or "raidz3" in the creation step, and by appending the additional image files to the creation step.
 +
 +
Summarizing Toponce's guidance:
 +
* RAIDZ2 should use four (2+2), six (4+2), ten (8+2), or eighteen (16+2) disks.
 +
* RAIDZ3 should use five (2+3), seven (4+3), eleven (8+3), or nineteen (16+3) disks.
 +
 +
== Displaying and Setting Properties ==
 +
Without specifying them in the creation step, users can set properties of their zpools at any time after its creation using {{ic|/usr/bin/zfs}}.
 +
 +
=== Show Properties ===
 +
To see the current properties of a given zpool:
 +
{{hc|# zfs get all myraidz1|<nowiki>
 +
NAME      PROPERTY              VALUE                  SOURCE
 +
myraidz1  type                  filesystem            -
 +
myraidz1  creation              Sun Oct 20  8:46 2013  -
 +
myraidz1  used                  139K                  -
 +
myraidz1  available            3.91G                  -
 +
myraidz1  referenced            38.6K                  -
 +
myraidz1  compressratio        1.00x                  -
 +
myraidz1  mounted              yes                    -
 +
myraidz1  quota                none                  default
 +
myraidz1  reservation          none                  default
 +
myraidz1  recordsize            128K                  default
 +
myraidz1  mountpoint            /myraidz1              default
 +
myraidz1  sharenfs              off                    default
 +
myraidz1  checksum              on                    default
 +
myraidz1  compression          off                    default
 +
myraidz1  atime                on                    default
 +
myraidz1  devices              on                    default
 +
myraidz1  exec                  on                    default
 +
myraidz1  setuid                on                    default
 +
myraidz1  readonly              off                    default
 +
myraidz1  zoned                off                    default
 +
myraidz1  snapdir              hidden                default
 +
myraidz1  aclinherit            restricted            default
 +
myraidz1  canmount              on                    default
 +
myraidz1  xattr                on                    default
 +
myraidz1  copies                1                      default
 +
myraidz1  version              5                      -
 +
myraidz1  utf8only              off                    -
 +
myraidz1  normalization        none                  -
 +
myraidz1  casesensitivity      sensitive              -
 +
myraidz1  vscan                off                    default
 +
myraidz1  nbmand                off                    default
 +
myraidz1  sharesmb              off                    default
 +
myraidz1  refquota              none                  default
 +
myraidz1  refreservation        none                  default
 +
myraidz1  primarycache          all                    default
 +
myraidz1  secondarycache        all                    default
 +
myraidz1  usedbysnapshots      0                      -
 +
myraidz1  usedbydataset        38.6K                  -
 +
myraidz1  usedbychildren        99.9K                  -
 +
myraidz1  usedbyrefreservation  0                      -
 +
myraidz1  logbias              latency                default
 +
myraidz1  dedup                off                    default
 +
myraidz1  mlslabel              none                  default
 +
myraidz1  sync                  standard              default
 +
myraidz1  refcompressratio      1.00x                  -
 +
myraidz1  written              38.6K                  -
 +
myraidz1  snapdev              hidden                default
 +
</nowiki>}}
 +
 +
=== Modify properties ===
 +
Disable the recording of access time in the zpool:
 +
# zfs set atime=off myraidz1
 +
 +
Verify that the property has been set on the zpool:
 +
{{hc|# zfs get atime|<nowiki>
 +
NAME  PROPERTY    VALUE    SOURCE
 +
myraidz1  atime        off      local
 +
</nowiki>}}
 +
 +
== Add Content to the Zpool and Query Compression Performance==
 +
Fill the zpool with files.  For this example, first enable compression.  ZFS uses many compression types, including, lzjb, gzip, gzip-N, zle, and lz4.  Using a setting of simply 'on' will call the default algorithm (lzjb) but lz4 is a nice alternative.  See the zfs man page for more.
 +
 +
# zfs set compression=lz4 myraidz1
 +
 +
In this example, the linux source tarball is copied over and since lz4 compression has been enabled on the zpool, the corresponding compression ratio can be queried as well.
 +
 +
$ wget https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.11.tar.xz
 +
$ tar xJf linux-3.11.tar.xz -C /myraidz1
 +
 +
To see the compression ratio achieved:
 +
{{hc|# zfs get compressratio|<nowiki>
 +
NAME      PROPERTY      VALUE  SOURCE
 +
myraidz1  compressratio  2.32x  -
 +
</nowiki>}}
 +
 +
== Simulate a Disk Failure and Rebuild the Zpool ==
 +
To simulate catastrophic disk failure (i.e. one of the HDDs in the zpool stops functioning), zero out one of the VDEVs.
 +
$ dd if=/dev/zero of=/scratch/2.img bs=4M count=1 2>/dev/null
 +
 +
Since we used a blocksize (bs) of 4M, the once 2G image file is now a mere 4M:
 +
{{hc|$ ls -lh /scratch |<nowiki>
 +
total 317M
 +
-rw-r--r-- 1 facade users 2.0G Oct 20 09:13 1.img
 +
-rw-r--r-- 1 facade users 4.0M Oct 20 09:09 2.img
 +
-rw-r--r-- 1 facade users 2.0G Oct 20 09:13 3.img
 +
</nowiki>}}
 +
 +
The zpool remains online despite the corruption.  Note that if a physical disc does fail, dmesg and related logs would be full of errors.  To detect when damage occurs, users must execute a scrub operation.
 +
 +
# zpool scrub myraidz1
 +
 +
Depending on the size and speed of the underlying media as well as the amount of data in the zpool, the scrub may take hours to complete.
 +
The status of the scrub can be queried:
 +
{{hc|# zpool status myraidz1|<nowiki>
 +
  pool: myraidz1
 +
state: DEGRADED
 +
status: One or more devices could not be used because the label is missing or
 +
invalid.  Sufficient replicas exist for the pool to continue
 +
functioning in a degraded state.
 +
action: Replace the device using 'zpool replace'.
 +
  see: http://zfsonlinux.org/msg/ZFS-8000-4J
 +
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Oct 20 09:13:39 2013
 +
config:
 +
 +
NAME                STATE    READ WRITE CKSUM
 +
myraidz1            DEGRADED    0    0    0
 +
  raidz1-0          DEGRADED    0    0    0
 +
    /scratch/1.img  ONLINE      0    0    0
 +
    /scratch/2.img  UNAVAIL      0    0    0  corrupted data
 +
    /scratch/3.img  ONLINE      0    0    0
 +
 +
errors: No known data errors
 +
</nowiki>}}
 +
 +
Since we zeroed out one of our VDEVs, let's simulate adding a new 2G HDD by creating a new image file and adding it to the zpool:
 +
$ truncate -s 2G /scratch/new.img
 +
# zpool replace myraidz1 /scratch/2.img /scratch/new.img
 +
 +
Upon replacing the VDEV with a new one, zpool rebuilds the data from the data and parity info in the remaining two good VDEVs.  Check the status of this process:
 +
{{hc|# zpool status myraidz1 |<nowiki>
 +
  pool: myraidz1
 +
state: ONLINE
 +
  scan: resilvered 117M in 0h0m with 0 errors on Sun Oct 20 09:21:22 2013
 +
config:
 +
 +
NAME                  STATE    READ WRITE CKSUM
 +
myraidz1              ONLINE      0    0    0
 +
  raidz1-0            ONLINE      0    0    0
 +
    /scratch/1.img    ONLINE      0    0    0
 +
    /scratch/new.img  ONLINE      0    0    0
 +
    /scratch/3.img    ONLINE      0    0    0
 +
 +
errors: No known data errors
 +
</nowiki>}}

Revision as of 13:25, 20 October 2013

Template:Article summary start Template:Article summary text Template:Article summary heading Template:Article summary wiki Template:Article summary end

Under construction Graysky (talk) 11:42, 20 October 2013 (UTC)

As the article summary notes, the examples in this article are shown with a set of virtual discs known in ZFS terms as VDEVs. Users may create their VDEVs either on an existing physical disk or in tmpfs (RAMdisk) depending on the amount of free memory on the system.

Install the ZFS Family of Packages

Due to differences in licencing, ZFS bins and kernel modules are easily distributed from source, but no-so-easily packaged as pre-compiled sets. Read and follow to the ZFS#Installation section to build and install the needed bins/modules.

Creating and Destroying Zpools

Management of ZFS is pretty simplistic with only two utils needed:

  • /usr/bin/zpool
  • /usr/bin/zfs

RAIDZ1

The minimum number of drives for a RAIDZ1 is three. In his blog on ZFS (an excellent read), Aaron Toponce recommends following the "power of two plus parity" recommendation. This is for storage space efficiency and hitting the "sweet spot" in performance. For RAIDZ-1, use three (2+1), five (4+1), or nine (8+1) disks. This example will use the most simplistic set of (2+1).

Create three x 2G files to serve as virtual hardrives:

$ for i in {1..3}; do truncate -s 2G /scratch/$i.img; done

Assemble the RAIDZ1:

# zpool create myraidz1 raidz1 /scratch/1.img /scratch/2.img /scratch/3.img

Notice that a 3.91G zpool has been created and mounted for us:

# zfs list
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 test   139K  3.91G  38.6K  /myraidz1

The status of the device can be queried:

# zpool status myraidz1
  pool: myraidz1
 state: ONLINE
  scan: none requested
config:

	NAME                STATE     READ WRITE CKSUM
	myraidz1            ONLINE       0     0     0
	  raidz1-0          ONLINE       0     0     0
	    /scratch/1.img  ONLINE       0     0     0
	    /scratch/2.img  ONLINE       0     0     0
	    /scratch/3.img  ONLINE       0     0     0

errors: No known data errors

To destroy a zpool:

# zpool destroy myraidz1

RAIDZ2 and RAIDZ3

Higher level ZRAIDs can be assembled in a like fashion by adjusting the for statement to create the image files, by specifying "raidz2" or "raidz3" in the creation step, and by appending the additional image files to the creation step.

Summarizing Toponce's guidance:

  • RAIDZ2 should use four (2+2), six (4+2), ten (8+2), or eighteen (16+2) disks.
  • RAIDZ3 should use five (2+3), seven (4+3), eleven (8+3), or nineteen (16+3) disks.

Displaying and Setting Properties

Without specifying them in the creation step, users can set properties of their zpools at any time after its creation using /usr/bin/zfs.

Show Properties

To see the current properties of a given zpool:

# zfs get all myraidz1
NAME      PROPERTY              VALUE                  SOURCE
myraidz1  type                  filesystem             -
myraidz1  creation              Sun Oct 20  8:46 2013  -
myraidz1  used                  139K                   -
myraidz1  available             3.91G                  -
myraidz1  referenced            38.6K                  -
myraidz1  compressratio         1.00x                  -
myraidz1  mounted               yes                    -
myraidz1  quota                 none                   default
myraidz1  reservation           none                   default
myraidz1  recordsize            128K                   default
myraidz1  mountpoint            /myraidz1              default
myraidz1  sharenfs              off                    default
myraidz1  checksum              on                     default
myraidz1  compression           off                    default
myraidz1  atime                 on                     default
myraidz1  devices               on                     default
myraidz1  exec                  on                     default
myraidz1  setuid                on                     default
myraidz1  readonly              off                    default
myraidz1  zoned                 off                    default
myraidz1  snapdir               hidden                 default
myraidz1  aclinherit            restricted             default
myraidz1  canmount              on                     default
myraidz1  xattr                 on                     default
myraidz1  copies                1                      default
myraidz1  version               5                      -
myraidz1  utf8only              off                    -
myraidz1  normalization         none                   -
myraidz1  casesensitivity       sensitive              -
myraidz1  vscan                 off                    default
myraidz1  nbmand                off                    default
myraidz1  sharesmb              off                    default
myraidz1  refquota              none                   default
myraidz1  refreservation        none                   default
myraidz1  primarycache          all                    default
myraidz1  secondarycache        all                    default
myraidz1  usedbysnapshots       0                      -
myraidz1  usedbydataset         38.6K                  -
myraidz1  usedbychildren        99.9K                  -
myraidz1  usedbyrefreservation  0                      -
myraidz1  logbias               latency                default
myraidz1  dedup                 off                    default
myraidz1  mlslabel              none                   default
myraidz1  sync                  standard               default
myraidz1  refcompressratio      1.00x                  -
myraidz1  written               38.6K                  -
myraidz1  snapdev               hidden                 default

Modify properties

Disable the recording of access time in the zpool:

# zfs set atime=off myraidz1

Verify that the property has been set on the zpool:

# zfs get atime
NAME  PROPERTY     VALUE     SOURCE
myraidz1  atime        off       local

Add Content to the Zpool and Query Compression Performance

Fill the zpool with files. For this example, first enable compression. ZFS uses many compression types, including, lzjb, gzip, gzip-N, zle, and lz4. Using a setting of simply 'on' will call the default algorithm (lzjb) but lz4 is a nice alternative. See the zfs man page for more.

# zfs set compression=lz4 myraidz1

In this example, the linux source tarball is copied over and since lz4 compression has been enabled on the zpool, the corresponding compression ratio can be queried as well.

$ wget https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.11.tar.xz
$ tar xJf linux-3.11.tar.xz -C /myraidz1 

To see the compression ratio achieved:

# zfs get compressratio
NAME      PROPERTY       VALUE  SOURCE
myraidz1  compressratio  2.32x  -

Simulate a Disk Failure and Rebuild the Zpool

To simulate catastrophic disk failure (i.e. one of the HDDs in the zpool stops functioning), zero out one of the VDEVs.

$ dd if=/dev/zero of=/scratch/2.img bs=4M count=1 2>/dev/null

Since we used a blocksize (bs) of 4M, the once 2G image file is now a mere 4M:

$ ls -lh /scratch 
total 317M
-rw-r--r-- 1 facade users 2.0G Oct 20 09:13 1.img
-rw-r--r-- 1 facade users 4.0M Oct 20 09:09 2.img
-rw-r--r-- 1 facade users 2.0G Oct 20 09:13 3.img

The zpool remains online despite the corruption. Note that if a physical disc does fail, dmesg and related logs would be full of errors. To detect when damage occurs, users must execute a scrub operation.

# zpool scrub myraidz1

Depending on the size and speed of the underlying media as well as the amount of data in the zpool, the scrub may take hours to complete. The status of the scrub can be queried:

# zpool status myraidz1
  pool: myraidz1
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
	invalid.  Sufficient replicas exist for the pool to continue
	functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Oct 20 09:13:39 2013
config:

	NAME                STATE     READ WRITE CKSUM
	myraidz1            DEGRADED     0     0     0
	  raidz1-0          DEGRADED     0     0     0
	    /scratch/1.img  ONLINE       0     0     0
	    /scratch/2.img  UNAVAIL      0     0     0  corrupted data
	    /scratch/3.img  ONLINE       0     0     0

errors: No known data errors

Since we zeroed out one of our VDEVs, let's simulate adding a new 2G HDD by creating a new image file and adding it to the zpool:

$ truncate -s 2G /scratch/new.img
# zpool replace myraidz1 /scratch/2.img /scratch/new.img

Upon replacing the VDEV with a new one, zpool rebuilds the data from the data and parity info in the remaining two good VDEVs. Check the status of this process:

# zpool status myraidz1 
  pool: myraidz1
 state: ONLINE
  scan: resilvered 117M in 0h0m with 0 errors on Sun Oct 20 09:21:22 2013
config:

	NAME                  STATE     READ WRITE CKSUM
	myraidz1              ONLINE       0     0     0
	  raidz1-0            ONLINE       0     0     0
	    /scratch/1.img    ONLINE       0     0     0
	    /scratch/new.img  ONLINE       0     0     0
	    /scratch/3.img    ONLINE       0     0     0

errors: No known data errors