Install Arch Linux on ZFS

From ArchWiki

This article details the steps required to install Arch Linux onto a ZFS root filesystem.

Note: This guide assumes the reader is somewhat familiar with ZFS. If you are not, it is recommended to first read and understand ZFS#Concepts and come back to this guide later. It might also be helpful to install ZFS on an existing system and play around with the commands first.

Since ZFS kernel modules are out-of-tree (i.e. not included in the mainline kernel) and Arch Linux is a rolling release distribution, there will often be brief periods when the kernel-specific packages in the external repository are not in sync with those in the Arch repositories. This can sometimes result in the ZFS modules (DKMS packages) failing to compile with the latest kernel. If you always want to use the most recent kernel packages, installing Arch on ZFS might not be ideal.

See ZFS#Installation for possible solutions.

Acquiring installation medium

To install Arch Linux on ZFS, you need to use an installation medium with the ZFS modules. The easiest way would be to use an alternative iso instead (assuming you trust such ISOs). You can also add the modules to the official ISO or create a custom image (see below).

Use an unofficial archiso that includes ZFS modules

An unofficial archiso exists that can be used directly, without the need to manually create an entire image or add ZFS modules once booted. Do note however that it includes only the linux-lts kernel and zfs-linux-lts module.

See r-maerz/archlinux-lts-zfs.

Get ZFS module on archiso system

A script to easily install and load the ZFS module on running archiso system. It should work on any archiso version.

See eoli3n/archiso-zfs.

Embedding ZFS module into custom archiso

To build a custom archiso, see ZFS#Create an Archiso image with ZFS support.

Selecting boot method

Since the initrd tools and boot loaders you choose to use will affect later steps of the installation process, you should decide which combinations of them to use before proceeding with installation.

Initrd tools

By default, both dracut and mkinitcpio does not support booting from a ZFS root since they do not include the necessary kernel modules and userspace tools into the initrd. You'll need to use dracut modules or mkinitcpio hooks to make initrds that can boot from a ZFS root. The initrd tool you choose to use will in turn affect the syntax of kernel parameters/cmdlines for specifying ZFS roots.

Here are the options:

zfs hook

The zfs hook is the only option when using the default busybox based initrd.

To configure the zfs hook, simply add zfs before the filesystems hook in your mkinitcpio.conf(5)

Possible syntax of kernel parameters are:

  • root=zfs, which determines the root filesystem using the bootfs property
  • root=ZFS=<pool/dataset>, which uses a pool or a dataset as root. When a pool is specified, the root filesystem is determined based on the mountpoint property
  • zfs=auto: same effect as root=zfs
  • zfs=<pool/dataset>: same effect as root=ZFS=<pool/dataset>

Additionally, the following kernel parameters can be set to adjust the behavior of the initrd:

  • zfs_force=1 makes the zpool import command use the -f flag
  • zfs_wait=<seconds> waits for the devices to show up before running zpool import

sd-zfs hook

The zfs hook is not compatible with systemd based initrds. Instead you should use the sd-zfs hook.

There are 2 choices: one shipped with zfs-utils-poscat from archlinuxcn and one shipped with mkinitcpio-sd-zfsAUR. The former is actively maintained while the latter seems to be abandoned.

zfs-utils-poscat

To configure this hook simply add it to anywhere in the HOOKS array of your mkinitcpio.conf. A typical configuration could look like this:

HOOKS=(systemd sd-zfs autodetect microcode modconf kms keyboard sd-vconsole block filesystems fsck)

The supported cmdline formats are:

  • root=zfs, which imports all pools in initrd, searches for the first pool with the bootfs property set, and then mounts bootfs as root.
  • root=zfs:poolname, which imports only the specified pool and then mounts the pool's bootfs as root.
  • root=zfs:poolname/dataset, which imports only the specified pool and then mounts the specified dataset as root.


mkinitcpio-sd-zfs

Refer to the github repository for documentation on configuration.

zfs module

If instead you'd like to use dracut for initrd then you should use the zfs dracut module shipped with zfs-utilsAUR. Check the documentation https://openzfs.github.io/openzfs-docs/man/master/7/dracut.zfs.7.html for how to configure the zfs module.

Boot loaders

Since the task of importing ZFS pools, mounting the root filesystem and pivot_rooting into the new root are all handled by the UKI or vmlinuz+initrd, there's no requirements on what bootloader you can use. Indeed, even an EFI boot stub should suffice given that the kernel parameters are configured properly depending on what tools you used for your initrd (See the above section #Initrd tools)

Note: All boot loaders except grub2 cannot read files from ZFS filesystems, this means unless you are using grub2, you'll need to put your UKI or vmlinuz+initrd on a separate filesystem that's readable by your boot loader of choice. When using UEFI boot, you can reuse you ESP for storing them (in fact is is recommended to put your UKI in ESP), just remember to reserve enough space for your ESP

Using GRUB2

Grub2 is able to read ZFS filesystems, given that the pools are created with only a limited set of features enabled (see ZFS#GRUB-compatible pool creation) thus is possible to place UKI/initrd on ZFS root when using Grub2.

Warning: It is doubtful that putting UKI/initrd on ZFS root achieves much given that it should entirely be rebuildable from the ZFS root (see #Layout supporting full system rollback). Additionally, Grub2's ZFS implementation, which is entirely independent of the OpenZFS implementation, is not known for reliability.

Partition the destination drive

Partitioning is done similar to other filesystems. See the aforementioned partitioning page or the installation guide on what layout to use and how to partition disks.

Note: ZFS does not support swap files and using zvol for swaps has significant drawbacks: the system might deadlock under high memory pressure and it is not possible to hibernate to zvol swaps. So it is recommended to use a separate swap partition.

Layout supporting full system rollback

To be able to use ZFS to snapshot everything you need to rebuild UKI or vmlinuz+initrd (so that you can rollback your full system state), you can use the following partition layout:

  • Do not mount anything on /boot, this way the vmlinuz is placed on your root, which is a ZFS filesystem.
  • If you use the UKI, mount ESP on /efi and point the UKI target to /efi/EFI/Linux/<name of image>.efi.
  • If you use vmlinuz+initrd, mount ESP(UEFI) or boot partition(BIOS) on /efi and point the initrd target to /efi/<name of initrd>.img. Set up a pacman hook that automatically copies vmlinuz from /boot/vmlinuz-* to /efi/.

To perform a rollback, just rollback your ZFS root filesystem and either regenerate your UKI or regenerate the initrd and then copy vmlinuz from /boot/vmlinuz-* to /efi/ manually.

Setup the ZFS filesystem

First, make sure the ZFS modules are loaded,

# modprobe zfs

Create the root zpool

Create your pool and set all default dataset options. All dataset created on the zpool will inherit of each -O set at the zpool creation. Default options are detailed in Debian Buster Root on ZFS. Step 2: Disk Formatting.

Note: Use -o ashift=9 for disks with a 512 byte physical sector size or -o ashift=12 for disks with a 4096 byte physical sector size. See lsblk -S -o NAME,PHY-SEC to get the physical sector size of each SCSI/SATA disk. Remove -S if you want the same value from all devices. For NVMe drives, use nvme id-ns /dev/nvmeXnY -H | grep "LBA Format" to get which LBA format is in use. Most NVMe drives ship with 512-byte sectors, see OpenZFS: NVMe low level formatting to switch to 4096-byte sectors.
Warning: Keep in mind that most modern devices use a 4096 byte physical sector size, even though some report 512. This is especially true for SSDs. Selecting ashift=9 on a 4096 byte sector size (even if it reports 512) will incur a performance penalty. Selecting ashift=12 on a 512 byte sector size may incur in a capacity penalty, but no performance penalty. If in doubt, for a modern drive, err on the side of ashift=12, or research your particular device for the appropriate value. Refer to OpenZFS issue #967 for a related discussion, and OpenZFS issue #2497 for a consequence of a higher ashift value.
# zpool create -f -o ashift=12         \
             -O acltype=posixacl       \
             -O relatime=on            \
             -O xattr=sa               \
             -O dnodesize=auto       \
             -O normalization=formD    \
             -O mountpoint=none        \
             -O canmount=off           \
             -O devices=off            \
             -R /mnt                   \
             zroot /dev/disk/by-id/id-to-partition-partx

Compression and native encryption

This will enable compression and native encryption by default on all datasets:

# zpool create -f -o ashift=12         \
             -O acltype=posixacl       \
             -O relatime=on            \
             -O xattr=sa               \
             -O dnodesize=auto       \
             -O normalization=formD    \
             -O mountpoint=none        \
             -O canmount=off           \
             -O devices=off            \
             -R /mnt                   \
             -O compression=lz4        \
             -O encryption=aes-256-gcm \
             -O keyformat=passphrase   \
             -O keylocation=prompt     \
             zroot /dev/disk/by-id/id-to-partition-partx

The options after -O control ZFS behavior. A detailed explanation of them can be found in the zfsprops(7) man page.

Warning:
  • Always use id names when working with ZFS, otherwise import errors will occur.
  • Instead of by-id, consider using by-partuuid or by-uuid, as these will stay consistent even if an internal drive is moved into a USB enclosure or vice-versa (this is only possible if ZFS is used with a partition, not with a whole disk)
  • GRUB users should keep in mind that the zpool-create command normally enables all features, some of which may not be supported by GRUB. See: ZFS#GRUB-compatible pool creation.

Create your datasets

Instead of using conventional disk partitions, ZFS has the concept of datasets to manage your storage. Unlike disk partitions, datasets have no fixed size and allow for different attributes, such as compression, to be applied per dataset. Normal ZFS datasets are mounted automatically by ZFS whilst legacy datasets are required to be mounted using fstab or with the traditional mount command.

One of the most useful features of ZFS is boot environments. Boot environments allow you to create a bootable snapshot of your system that you can revert to at any time instantly by simply rebooting and booting from that boot environment. This can make doing system updates much safer and is also incredibly useful for developing and testing software. In order to be able to use a boot environment manager such as beadm, zectlAUR (systemd-boot), or zedenvAUR (GRUB) to manage boot environments, your datasets must be configured properly. Key to this are that you split your data directories (such as /home) into datasets that are distinct from your system datasets and that you do not place data in the root of the pool as this cannot be moved afterwards.

You should always create a dataset for at least your root filesystem and in nearly all cases you will also want /home to be in a separate dataset. You may decide you want your logs to persist over boot environments. If you are a running any software that stores data outside of /home (such as is the case for database servers) you should structure your datasets so that the data directories of the software you want to run are separated out from the root dataset.

With these example commands, we will create a basic boot environment compatible configuration comprising of just root and /home datasets. It inherits default options from zpool creation.

# zfs create -o mountpoint=none zroot/data
# zfs create -o mountpoint=none zroot/ROOT
# zfs create -o mountpoint=/ -o canmount=noauto zroot/ROOT/default
# zfs create -o mountpoint=/home zroot/data/home

You can also create your ROOT dataset without having to specify mountpoint to / since GRUB will mount it to / anyway. That gives you possibility to boot into some old versions of root just by cloning it and putting as menuentry of GRUB. In such, you can create ROOT with the following command:

# zfs create -o mountpoint=/roots/default zroot/ROOT/default

You can store /root in your zroot/data/home dataset.

# zfs create -o mountpoint=/root zroot/data/home/root

You will need to enable some options for datasets which hold specific directories:

Options required by specific directories
Directory Dataset option Details
/var/log/journal acltype=posixacl systemd#systemd-tmpfiles-setup.service fails to start at boot

System datasets

To create datasets for system directories, use canmount=off.

For some examples, please read Debian-Buster-Root-on-ZFS#step-3-system-installation.

Note: Consider using zfs-mount-generator instead of zfs-mount.service if you mount a dataset, e.g. zroot/var/log, to /var/log. It fixes the filesystem mount ordering as described in Step 5.7 of Debian Buster Root on ZFS.
# zfs create -o mountpoint=/var -o canmount=off     zroot/var
# zfs create                                        zroot/var/log
# zfs create -o mountpoint=/var/lib -o canmount=off zroot/var/lib
# zfs create                                        zroot/var/lib/libvirt
# zfs create                                        zroot/var/lib/docker

Export/Import your pools

To validate your configurations, export then reimport all your zpools.

Warning: Do not skip this, otherwise you will be required to use -f when importing your pools. This unloads the imported pool.
Note: This might fail if you added a swap partition. You need to turn it off with the swapoff command.
# zpool export zroot
# zpool import -d /dev/disk/by-id -R /mnt zroot -N
Note: -d is not the actual device ID, but the /dev/by-id directory containing the symbolic links.

If this command fails and you are asked to import your pool via its numeric ID, run zpool import to find out the ID of your pool then use a command such as:

# zpool import 9876543212345678910 -R /mnt zroot

If you used native encryption, load zfs key.

# zfs load-key zroot

Manually mount your rootfs dataset because it uses canmount=noauto, then mount all others datasets.

# zfs mount zroot/ROOT/default
# zfs mount -a

The ZFS filesystem is now ready to use.

Configure the root filesystem

If you used legacy datasets, it must be listed in /etc/fstab.

Set the bootfs property on the descendant root filesystem so the boot loader knows where to find the operating system.

# zpool set bootfs=zroot/ROOT/default zroot

If you do not have /etc/zfs/zpool.cache, create it:

# zpool set cachefile=/etc/zfs/zpool.cache zroot

Be sure to bring the zpool.cache file into your new system. This is required later for the ZFS daemon to start.

# mkdir -p /mnt/etc/zfs
# cp /etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache

Install and configure Arch Linux

Follow the following steps using the Installation guide. It will be noted where special consideration must be taken for ZFSonLinux.

  • First mount any legacy or non-ZFS boot or system partitions using the mount command.
  • Install the base system.
  • The procedure described in Installation guide#Fstab is usually overkill for ZFS. ZFS usually auto mounts its own partitions, so we do not need ZFS partitions in fstab file, unless the user made legacy datasets of system directories. To generate the fstab for filesystems, use:
# genfstab -U -p /mnt >> /mnt/etc/fstab
# arch-chroot /mnt
  • Edit the /etc/fstab:
Note:
  • If you chose to create legacy datasets for system directories, keep them in this fstab!
  • Comment out all non-legacy datasets apart from the swap file and the EFI system partition. It is a convention to replace the swap's uuid with /dev/zvol/zroot/swap.
  • When creating the initial ramdisk, first edit /etc/mkinitcpio.conf. Add zfs to MODULES:
MODULES=(zfs)

Then in HOOKS, add zfs before filesystems. Also, move keyboard hook before zfs so you can type in console if something goes wrong. You may also remove fsck (if you are not using Ext3 or Ext4). Your HOOKS line should look something like the following:

HOOKS=(base udev autodetect microcode modconf kms keyboard keymap consolefont block zfs filesystems)
  • When using systemd in the initrd, you need to install mkinitcpio-sd-zfsAUR and add the sd-zfs hook after the systemd hook instead of the zfs hook. Keep in mind that this hook uses different kernel parameters than the default zfs hook, more information can be found at the project page.
Note:
  • sd-zfs does not support native encryption yet dasJ/sd-zfs/issues/4.
  • If you are using a separate dataset for /usr and have followed the instructions below, you must make sure you have the usr hook enabled after zfs, or your system will not boot.
  • When you generate the initramfs, the zpool.cache is copied into the initrd. If you did not generate it before, or needed to regenerate it, remember to regenerate the initramfs again.
  • You can also use legacy mountpoint to let fstab mount it

Install and configure the boot loader

In principle, the boot loader configuration does not differ that much if the kernel and initrd resides on a non-zfs partition. Once the kernel has the zfs module and initrd has been built with the hook, and the kernel cmdline has the zfs kernel parameter, the system can boot. For example, zfs=zroot/ROOT/default rw or even just zfs=zroot rw (in case bootfs parameter has been set up correctly as per this page) should work for most cases. You do not need a root parameter as zfs will mount the root. See your boot loader documentation on how to set kernel parameters.

However if you need to load the kernel image and/or initrd from the zfs, you need to use a boot loader which can read zfs and configure your boot loader properly.

Considering the above, configuring your boot loader should be quite straightforward. Here are some examples, but this list is by no means conclusive.


Configure systemd ZFS mounts

For your system to be able to reboot without issues, you need to enable the zfs.target to auto mount the pools and set the hostid.

Note: The instructions in this section assume you are still in arch-chroot

For each pool you want automatically mounted execute:

# zpool set cachefile=/etc/zfs/zpool.cache pool

Enable zfs.target

In order to mount zfs pools automatically on boot you need to enablezfs-import-cache.service, zfs-mount.service and zfs-import.target.

When running ZFS on root, the machine's hostid will not be available at the time of mounting the root filesystem. There are two solutions to this. You can either place your spl hostid in the kernel parameters in your boot loader. For example, adding spl.spl_hostid=0x00bab10c, to get your number use the hostid command.

The other, and suggested, solution is to make sure that there is a hostid in /etc/hostid, and then regenerate the initramfs image which will copy the hostid into the initramfs image. To write the hostid file safely you need to use the zgenhostid command.

To use the libc-generated hostid (recommended):

# zgenhostid $(hostid)

To use a custom hostid (must be hexadecimal and 8 characters long):

# zgenhostid deadbeef

To let the tool generate a hostid:

# zgenhostid

Do not forget to regenerate the initramfs.

Unmount and restart

We are almost done! If you have a legacy boot partition:

# umount /mnt/boot

Otherwise:

# zfs umount -a
# zpool export zroot

Now reboot.

Warning: If you do not properly export the zpool, the pool will refuse to import in the ramdisk environment and you will be stuck at the busybox terminal.

Loading password from USB-Stick

It is possible to store password on usb-stick and load it when booting:

Save password on first bytes of usb-stick:

# dd if=your_password_file bs=32 count=1 of=/dev/disk/by-id/usb_stick

To create partition zfs partition you can either use previous described method with password prompt or pipe with dd:

# dd if=/dev/disk/by-id/usb_stick bs=32 count=1 | zfs create -o encryption=on -o keyformat=passphrase zroot/ROOT

Next step is modyfing zfs hook. By default zfs prompts for password. You have to change it to have it piped with dd from your pendrive. In order to do so modify /usr/lib/initcpio/hooks/zfs and change line:

# ! eval zfs load-key "${encryptionroot}"; do

to:

# ! eval dd if=/dev/disk/by-id/usb_stick bs=32 count=1 | zfs load-key "${encryptionroot}"; do

You are modifying your zfs hook so do not forget to regenerate the initramfs. Now zfs should load password from your usb-stick on boot.

Troubleshooting

System fails to boot due to: cannot import zroot: no such pool available

You can try the following steps and see if they can help.

  • Use the kernel modules from the archzfs repo instead of the dkms version. You can go back to the dkms version after a sucessfull boot.
  • Remove the /etc/zfs/zpool.cache and run:
# zpool set cachefile=none zroot

See also