Jump to content

Install Arch Linux on ZFS

From ArchWiki

This article details the steps required to install Arch Linux onto a ZFS root filesystem.

Note Blindly copying and pasting this wiki will not work. It is necessary to take the time to understand the boot process, and what is done when creating the pool and datasets. Here are some useful links:

Since ZFS kernel modules are out-of-tree (i.e. not included in the mainline kernel) and Arch Linux is a rolling release distribution, there will often be brief periods when the kernel-specific packages in the external repository are not in sync with those in the Arch repositories. This can sometimes result in the ZFS modules (DKMS packages) failing to compile with the latest kernel. If you always want to use the most recent kernel packages, installing Arch on ZFS might not be ideal.

See ZFS#Installation for possible solutions.

Installation

To install Arch Linux on ZFS, you need to use an installation medium with the ZFS modules. This may be done using an existing Arch Linux installation, or done using a Virtual Machine, such as VirtualBox or VMware, with an active "Bi-directional share" directory defined and enabled, if you are not on an existing Arch Linux installation.

Embedding ZFS module into custom archiso

To build a custom ISO, install archiso. Follow archiso#Prepare a custom profile to prepare a custom profile (named archlive in the following example).

First, edit the package list file packages.x86_64 to add the linux-lts kernel and the following ZFS packages:

packages.x86_64
...
linux-lts
linux-lts-headers
libunwind
zfs-utils
zfs-dkms

Make sure to remove linux and broadcom-wl (which would pull in the linux package) from the list.

You will also need to edit pacman.conf and append:

...
[archzfs]
SigLevel = TrustAll Optional
Server = http://archzfs.com/$repo/$arch

You will also need to edit the following files to change vmlinuz-linux and initramfs-linux.img entries to vmlinuz-linux-lts and initramfs-linux-lts:

archlive/airootfs/etc/mkinitcpio.d/linux.preset
archlive/efiboot/loader/entries/01-archiso-x86_64-linux.conf
archlive/efiboot/loader/entries/02-archiso-x86_64-speech-linux.conf
archlive/syslinux/archiso_pxe-linux.cfg
archlive/syslinux/archiso_sys-linux.cfg
archlive/grub/loopback.cfg
Warning If you fail to edit these entries your ISO will not boot.

Now we will make an isobuild and work pair of directories to start the build process:

# mkdir isobuild
# mkarchiso -v -r -w /tmp/archiso-tmp -o isobuild ~/archlive

You should now have an archlinux-YYYY.MM.DD.x86_64.iso in the archlive/isobuild directory.

Burn this file to your installation media of choice.

Note
  • You may wish to recreate this media occasionally if the zpool status -v output shows, after an update, that the zpool needs to be updated to support newer features.
  • You should use the linux-lts and linux-headers-lts for the actual installation for best compatibility. You also should use the Unofficial archzfs Repository for binary packages for zfs-utils and zfs-dkms updates, not the AUR packages. If you do choose to use the AUR packages, there will be times when a new mainline and zen Linux kernel, do not support ZFS, and may break you system. Due to this, the linux-lts kernel is strongly recommended for the actual installation section.
Warning Do not redistribute these custom ISOs as they will violate the GPL and CDDL licenses!

Be sure to test your new ISO with a virtual machine and test the ISO. Simply run:

# modprobe zfs
# zpool status

If it fails, then your zfs module did not build correctly, and you will have to try again.

Partition the destination drive

ZFS supports GPT and MBR partition tables. See Partitioning#Choosing between GPT and MBR for information on determining the partition table type to use.

ZFS manages its own partitions, so only a basic partition table scheme is required. The partition that will contain the ZFS filesystem should be of the type bf00, or "Solaris Root".

Partition scheme

Although for some legacy machines you, in the past, with MBR based partitioning methods, you could create a zfs bootable root partition, it is not recommended to use ZFS in this manner due usage of GPT partitioning differences. It is recommended to use a separate /boot partition to avoid issues with bootloaders and ensure best compatibility.

Using GRUB on a BIOS (or UEFI machine in legacy boot mode) machine but using a GPT partition table:

Part     Size   Type
----     ----   -------------------------
   1       2M   BIOS boot partition (ef02)
   2       1G   Linux partition (8300)
   3     XXXG   Solaris Root (bf00)

You may also use this method of partitioning with Grub and rEFInd. This method is the most recommended by the author for best compatibility with all systems.

Part     Size   Type
----     ----   -------------------------
   1       1G   EFI system partition (ef00)
   2     XXXG   Solaris Root (bf00)

You can choose to separate Linux swap partition, or using a zvol, see ZFS#Swap volume, as swap.

If you wish to create a traditional swap partition, see Partitioning#Example layouts.

Format the destination disk

If you have opted for a boot partition as well as any other non-ZFS system partitions then format them. Do not do anything to the Solaris partition nor to the BIOS boot partition. ZFS will manage the first, and your boot loader the second.

Setup the ZFS filesystem

First, make sure the ZFS modules are loaded,

# modprobe zfs

Create the root zpool

Create your pool and set all default dataset options. All dataset created on the zpool will inherit of each -O set at the zpool creation. Default options are detailed in Debian Buster Root on ZFS. Step 2: Disk Formatting.

Note Use -o ashift=9 for disks with a 512 byte physical sector size or -o ashift=12 for disks with a 4096 byte physical sector size. See lsblk -S -o NAME,PHY-SEC to get the physical sector size of each SCSI/SATA disk. Remove -S if you want the same value from all devices. For NVMe drives, use nvme id-ns /dev/nvmeXnY -H | grep "LBA Format" to get which LBA format is in use. Most NVMe drives ship with 512-byte sectors, see OpenZFS: NVMe low level formatting to switch to 4096-byte sectors.
Warning Keep in mind that most modern devices use a 4096 byte physical sector size, even though some report 512. This is especially true for SSDs. Selecting ashift=9 on a 4096 byte sector size (even if it reports 512) will incur a performance penalty. Selecting ashift=12 on a 512 byte sector size may incur in a capacity penalty, but no performance penalty. If in doubt, for a modern drive, err on the side of ashift=12, or research your particular device for the appropriate value. Refer to OpenZFS issue #967 for a related discussion, and OpenZFS issue #2497 for a consequence of a higher ashift value.
# zpool create -f -o ashift=12         \
             -O acltype=posixacl       \
             -O relatime=on            \
             -O xattr=sa               \
             -O dnodesize=auto       \
             -O normalization=formD    \
             -O mountpoint=none        \
             -O canmount=off           \
             -O devices=off            \
             -R /mnt                   \
             zroot /dev/disk/by-id/id-to-partition-partx

Compression and native encryption

This will enable compression and native encryption by default on all datasets:

# zpool create -f -o ashift=12         \
             -O acltype=posixacl       \
             -O relatime=on            \
             -O xattr=sa               \
             -O dnodesize=auto       \
             -O normalization=formD    \
             -O mountpoint=none        \
             -O canmount=off           \
             -O devices=off            \
             -R /mnt                   \
             -O compression=lz4        \
             -O encryption=aes-256-gcm \
             -O keyformat=passphrase   \
             -O keylocation=prompt     \
             zroot /dev/disk/by-id/id-to-partition-partx

The options after -O control ZFS behavior. A detailed explanation of them can be found in the zfsprops(7) man page.

Warning
  • Always use id names when working with ZFS, otherwise import errors will occur.
  • Instead of by-id, consider using by-partuuid or by-uuid, as these will stay consistent even if an internal drive is moved into a USB enclosure or vice-versa (this is only possible if ZFS is used with a partition, not with a whole disk).

Create your datasets

Instead of using conventional disk partitions, ZFS has the concept of datasets to manage your storage. Unlike disk partitions, datasets have no fixed size and allow for different attributes, such as compression, to be applied per dataset. Normal ZFS datasets are mounted automatically by ZFS whilst legacy datasets are required to be mounted using fstab or with the traditional mount command.

One of the most useful features of ZFS is boot environments. Boot environments allow you to create a bootable snapshot of your system that you can revert to at any time instantly by simply rebooting and booting from that boot environment. This can make doing system updates much safer and is also incredibly useful for developing and testing software. In order to be able to use a boot environment manager such as beadm, zectlAUR (systemd-boot), or zedenvAUR (GRUB) to manage boot environments, your datasets must be configured properly. Key to this are that you split your data directories (such as /home) into datasets that are distinct from your system datasets and that you do not place data in the root of the pool as this cannot be moved afterwards.

You should always create a dataset for at least your root filesystem and in nearly all cases you will also want /home to be in a separate dataset. You may decide you want your logs to persist over boot environments. If you are a running any software that stores data outside of /home (such as is the case for database servers) you should structure your datasets so that the data directories of the software you want to run are separated out from the root dataset.

With these example commands, we will create a basic boot environment compatible configuration comprising of just root and /home datasets. It inherits default options from zpool creation.

# zfs create -o mountpoint=none zroot/data
# zfs create -o mountpoint=none zroot/ROOT
# zfs create -o mountpoint=/ -o canmount=noauto zroot/ROOT/default
# zfs create -o mountpoint=/home zroot/data/home

You can also create your ROOT dataset without having to specify mountpoint to / since GRUB will mount it to / anyway. That gives you possibility to boot into some old versions of root just by cloning it and putting as menuentry of GRUB. In such, you can create ROOT with the following command:

# zfs create -o mountpoint=/roots/default zroot/ROOT/default

You can store /root in your zroot/data/home dataset.

# zfs create -o mountpoint=/root zroot/data/home/root

System datasets

To create datasets for system directories, use canmount=off.

For some examples, please read Debian-Buster-Root-on-ZFS#step-3-system-installation.

Note Consider using zfs-mount-generator instead of zfs-mount.service if you mount a dataset, e.g. zroot/var/log, to /var/log. It fixes the filesystem mount ordering as described in Step 5.7 of Debian Buster Root on ZFS.
# zfs create -o mountpoint=/var -o canmount=off                 zroot/var
# zfs create                                                    zroot/var/log
# zfs create -o mountpoint=/var/log/journal -o acltype=posixacl zroot/var/log/journal
# zfs create -o mountpoint=/var/lib -o canmount=off             zroot/var/lib
# zfs create                                                    zroot/var/lib/libvirt
# zfs create                                                    zroot/var/lib/docker
Note systemd-journald requires a mountpoint to be created otherwise systemd-journald.service will fail at boot systemd#systemd-tmpfiles-setup.service fails to start at boot

Export/Import your pools

To validate your configurations, export then reimport all your zpools.

Warning Do not skip this, otherwise you will be required to use -f when importing your pools. This unloads the imported pool.
Note This might fail if you added a swap partition. You need to turn it off with the swapoff command.
# zpool export zroot
# zpool import -d /dev/disk/by-id -R /mnt zroot -N
Note -d is not the actual device ID, but the /dev/by-id directory containing the symbolic links.

If this command fails and you are asked to import your pool via its numeric ID, run zpool import to find out the ID of your pool then use a command such as:

# zpool import 9876543212345678910 -R /mnt zroot

If you used native encryption, load zfs key.

# zfs load-key zroot

Manually mount your rootfs dataset because it uses canmount=noauto, then mount all others datasets.

# zfs mount zroot/ROOT/default
# zfs mount -a

The ZFS filesystem is now ready to use.

Configure the root filesystem

If you used legacy datasets, it must be listed in /etc/fstab.

Set the bootfs property on the descendant root filesystem so the boot loader knows where to find the operating system.

# zpool set bootfs=zroot/ROOT/default zroot

If you do not have /etc/zfs/zpool.cache, create it:

# zpool set cachefile=/etc/zfs/zpool.cache zroot

Be sure to bring the zpool.cache file into your new system. This is required later for the ZFS daemon to start.

# mkdir -p /mnt/etc/zfs
# cp /etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache

Install and configure Arch Linux

Follow the following steps using the Installation guide. It will be noted where special consideration must be taken for ZFSonLinux.

  • First mount any legacy or non-ZFS boot or system partitions using the mount command.
  • Install the base system.
Note You may wish to opt for linux-lts for your kernel of choice during the pacstrap portion. Make sure to also add libunwind to pacstrap to resolve a dependency for zfs-utils.
  • The procedure described in Installation guide#Fstab is usually overkill for ZFS. ZFS usually auto mounts its own partitions, so we do not need ZFS partitions in fstab file, unless the user made legacy datasets of system directories. To generate the fstab for filesystems, use:
# genfstab -U -p /mnt >> /mnt/etc/fstab
# arch-chroot /mnt
  • Edit the /etc/fstab:
Note
  • If you chose to create legacy datasets for system directories, keep them in this fstab!
  • Comment out all non-legacy datasets apart from the swap file and the EFI system partition. It is a convention to replace the swap's uuid with /dev/zvol/zroot/swap.
Note For simplicity, and better compatibility, this guide only recommends using the zfs-dkms package for usage with linux-lts.
  • When creating the initial ramdisk, first edit /etc/mkinitcpio.conf. Add zfs to MODULES:
MODULES=(zfs)

Then in HOOKS, add zfs before filesystems. Also, move keyboard hook before zfs so you can type in console if something goes wrong. You may also remove fsck (if you are not using Ext3 or Ext4). Your HOOKS line should look something like the following:

HOOKS=(base udev autodetect microcode modconf kms keyboard keymap consolefont block zfs filesystems)
  • Add ZFS to your kernel command line

You can now set up your boot loader. You also need to add a kernel parameter to make ZFS bootable:

root=ZFS=zroot/ROOT/default

Configure systemd ZFS mounts

For your system to be able to reboot without issues, you need to enable the zfs.target to auto mount the pools and set the hostid.

Note
  • The instructions in this section assume you are still in arch-chroot.
  • Usage of zpool.cache is still required by zfs-utils to achieve a bootable system, although zpool.cache has been considered a candidate for deprecation [See: https://github.com/openzfs/zfs/issues/1035], however at this time, it is still the default and recommended method to achieving a bootable system.

For each pool you want automatically mounted execute:

# zpool set cachefile=/etc/zfs/zpool.cache pool

Enable zfs.target

In order to mount zfs pools automatically on boot you need to enablezfs-import-cache.service, zfs-mount.service and zfs-import.target.

When running ZFS on root, the machine's hostid will not be available at the time of mounting the root filesystem. There are two solutions to this. You can either place your spl hostid in the kernel parameters in your boot loader. For example, adding spl.spl_hostid=0x00bab10c, to get your number use the hostid command.

The other, and suggested, solution is to make sure that there is a hostid in /etc/hostid, and then regenerate the initramfs image which will copy the hostid into the initramfs image. To write the hostid file safely you need to use the zgenhostid command.

To use the libc-generated hostid (recommended):

# zgenhostid $(hostid)

To use a custom hostid (must be hexadecimal and 8 characters long):

# zgenhostid deadbeef

To let the tool generate a hostid:

# zgenhostid

Do not forget to regenerate the initramfs.

Unmount and restart

We are almost done! If you have a legacy boot partition:

# umount /mnt/boot

Otherwise:

# zfs umount -a
# zfs umount zroot/ROOT/default
# zpool export zroot

Now reboot.

Warning If you do not properly export the zpool, the pool will refuse to import in the ramdisk environment and you will be stuck at the busybox terminal.

Loading password from USB-Stick

It is possible to store password on usb-stick and load it when booting:

Save password on first bytes of usb-stick:

# dd if=your_password_file bs=32 count=1 of=/dev/disk/by-id/usb_stick

To create partition zfs partition you can either use previous described method with password prompt or pipe with dd:

# dd if=/dev/disk/by-id/usb_stick bs=32 count=1 | zfs create -o encryption=on -o keyformat=passphrase zroot/ROOT

Next step is modyfing zfs hook. By default zfs prompts for password. You have to change it to have it piped with dd from your pendrive. In order to do so modify /usr/lib/initcpio/hooks/zfs and change line:

# ! eval zfs load-key "${encryptionroot}"; do

to:

# ! eval dd if=/dev/disk/by-id/usb_stick bs=32 count=1 | zfs load-key "${encryptionroot}"; do

You are modifying your zfs hook so do not forget to regenerate the initramfs. Now zfs should load password from your usb-stick on boot.

Troubleshooting

System fails to boot due to: cannot import zroot: no such pool available

You can try the following steps and see if they can help.

  • Use the kernel modules from the archzfs repo instead of the dkms version. You can go back to the dkms version after a sucessfull boot.
  • Remove the /etc/zfs/zpool.cache and run:
    # zpool set cachefile=none zroot
  • Remove the /etc/hostid.
  • Regenerate the initramfs.

Zpool refuses to export saying it's busy

Arch-chroot will mount specific kernelspace file systems in the system. If these are not unmounted, the zpool may refuse to dismount properly. If this happens, remount the ZFS partition and run findmnt -R /mnt

Then run umount -f /path/to/partition against the partition still mounted.

This should allow the zpool to export.

See also