Difference between revisions of "Systemd-nspawn"

From ArchWiki
Jump to navigation Jump to search
(→‎nsswitch.conf: remove useless section - settings enabled by default)
 
(58 intermediate revisions by 29 users not shown)
Line 1: Line 1:
 
{{Lowercase title}}
 
{{Lowercase title}}
 
[[Category:Virtualization]]
 
[[Category:Virtualization]]
 +
[[Category:Sandboxing]]
 +
[[es:Systemd-nspawn]]
 
[[ja:Systemd-nspawn]]
 
[[ja:Systemd-nspawn]]
 
[[ru:Systemd-nspawn]]
 
[[ru:Systemd-nspawn]]
 +
[[zh-hans:Systemd-nspawn]]
 
{{Related articles start}}
 
{{Related articles start}}
 
{{Related|systemd}}
 
{{Related|systemd}}
Line 26: Line 29:
  
 
=== Create and boot a minimal Arch Linux distribution in a container ===
 
=== Create and boot a minimal Arch Linux distribution in a container ===
{{Tip|You can use [[mkosi]] to do this for arch and other distributions fully automatically and with easy further customization.}}
 
  
 
First install {{Pkg|arch-install-scripts}}.
 
First install {{Pkg|arch-install-scripts}}.
Line 32: Line 34:
 
Next, create a directory to hold the container. In this example we will use {{ic|~/MyContainer}}.  
 
Next, create a directory to hold the container. In this example we will use {{ic|~/MyContainer}}.  
  
Next, we use pacstrap to install a basic arch-system into the container. At minimum we need to install the {{Grp|base}} group.  
+
Next, we use pacstrap to install a basic arch-system into the container. At minimum we need to install the {{Pkg|base}} package.  
 
   
 
   
  # pacstrap -i -c -d ~/MyContainer base [additional pkgs/groups]
+
  # pacstrap -c ~/MyContainer base [additional pkgs/groups]
  
{{Tip|The {{ic|-i}} option will '''avoid''' auto-confirmation of package selection. As you do not need to install the Linux kernel in the container, you can remove it from the package list selection to save space. See [[Pacman#Usage]].}}
+
{{Note|The {{Pkg|base}} package does not depend on the {{Pkg|linux}} kernel package and is container-ready.}}
 
 
{{Note|The package {{Pkg|linux-firmware}} required by {{Pkg|linux}}, which is included in the {{Grp|base}} group and isn't necessary to run the container, causes some issues to {{ic|systemd-tmpfiles-setup.service}} during the booting process with {{ic|systemd-nspawn}}. It's possible to install the {{Grp|base}} group but excluding the {{Pkg|linux}} package and its dependencies when building the container with {{ic|# pacstrap -i -c -d ~/MyContainer base --ignore linux [additional pkgs/groups]}}. The {{ic|--ignore}} flag will be simply passed to {{Pkg|pacman}}. See {{Bug|46591}} for more information.}}
 
  
 
Once your installation is finished, boot into the container:
 
Once your installation is finished, boot into the container:
Line 47: Line 47:
  
 
After the container starts, log in as "root" with no password.
 
After the container starts, log in as "root" with no password.
 +
 +
{{Tip|If the login fails with "Login incorrect", the problem is likely the {{ic|securetty}} TTY device whitelist. Add {{ic|pts/0}} through {{ic|pts/9}} to the container's version of the file ({{ic|~/MyContainer/etc/securetty}}) and retry. See {{Bug|45903}} for details.}}
  
 
The container can be powered off by running {{ic|poweroff}} from within the container. From the host, containers can be controlled by the [[#machinectl|machinectl]] tool.
 
The container can be powered off by running {{ic|poweroff}} from within the container. From the host, containers can be controlled by the [[#machinectl|machinectl]] tool.
  
 
{{Note|To terminate the ''session'' from within the container, hold {{ic|Ctrl}} and rapidly press {{ic|]}} three times. Non-US keyboard users should use {{ic|%}} instead of {{ic|]}}.}}
 
{{Note|To terminate the ''session'' from within the container, hold {{ic|Ctrl}} and rapidly press {{ic|]}} three times. Non-US keyboard users should use {{ic|%}} instead of {{ic|]}}.}}
 
==== Bootstrap Arch Linux i686 inside x86_64 host ====
 
 
It is possible to install a minimal i686 Arch Linux inside a subdirectory and use it as systemd-nspawn container instead of [[chroot]] or [[virtualization]]. This is useful for testing {{ic|PKGBUILD}} compilation for i686 and other tasks. Make sure you use a {{ic|pacman.conf}} '''without''' {{ic|multilib}} repository.
 
 
  # pacman_conf=/tmp/pacman.conf # this is pacman.conf without multilib
 
  # mkdir /mnt/i686-archlinux
 
  # linux32 pacstrap -C "$pacman_conf" -di /mnt/i686-archlinux base base-devel
 
 
You may deselect {{ic|linux}} from {{ic|base}} group, since the resulting bootstrap directory is not meant to be booted on real or virtualized hardware.
 
 
To start the resulting i686 Arch Linux systemd-nspawn instance, just issue the following command.
 
 
  # linux32 systemd-nspawn -D /mnt/i686-archlinux
 
  
 
=== Create a Debian or Ubuntu environment ===
 
=== Create a Debian or Ubuntu environment ===
  
Install {{Pkg|debootstrap}}, {{Aur|gnupg1}}, and one or both of {{Pkg|debian-archive-keyring}} and {{Aur|ubuntu-keyring}} (obviously install the keyrings for the distros you want).
+
Install {{Pkg|debootstrap}}, and one or both of {{Pkg|debian-archive-keyring}} and {{Pkg|ubuntu-keyring}} (obviously install the keyrings for the distros you want).
  
 
{{Note|''systemd-nspawn'' requires that the operating system in the container has systemd running as PID 1 and ''systemd-nspawn'' is installed in the container. This means Ubuntu before 15.04 will not work out of the box and requires additional configuration to switch from upstart to systemd. Also make sure that the {{ic|systemd-container}} package is installed on the container system.}}
 
{{Note|''systemd-nspawn'' requires that the operating system in the container has systemd running as PID 1 and ''systemd-nspawn'' is installed in the container. This means Ubuntu before 15.04 will not work out of the box and requires additional configuration to switch from upstart to systemd. Also make sure that the {{ic|systemd-container}} package is installed on the container system.}}
Line 75: Line 63:
  
 
  # cd /var/lib/machines
 
  # cd /var/lib/machines
  # debootstrap <codename> myContainer <repository-url>
+
  # debootstrap --include=systemd-container --components=main,universe <codename> myContainer <repository-url>
  
 
For Debian valid code names are either the rolling names like "stable" and "testing" or release names like "stretch" and "sid", for Ubuntu the code name like "xenial" or "zesty" should be used. A complete list of codenames is in {{ic|/usr/share/debootstrap/scripts}}. In case of a Debian image the "repository-url" can be {{ic|http://deb.debian.org/debian/}}. For an Ubuntu image, the "repository-url" can be {{ic|http://archive.ubuntu.com/ubuntu/}}.
 
For Debian valid code names are either the rolling names like "stable" and "testing" or release names like "stretch" and "sid", for Ubuntu the code name like "xenial" or "zesty" should be used. A complete list of codenames is in {{ic|/usr/share/debootstrap/scripts}}. In case of a Debian image the "repository-url" can be {{ic|http://deb.debian.org/debian/}}. For an Ubuntu image, the "repository-url" can be {{ic|http://archive.ubuntu.com/ubuntu/}}.
Line 85: Line 73:
 
  # logout
 
  # logout
  
If the above didn't work. One can start the container and use these commands instead:
+
If the above did not work. One can start the container and use these commands instead:
 +
 
 
  # systemd-nspawn -b -D myContainer  #Starts the container
 
  # systemd-nspawn -b -D myContainer  #Starts the container
 
  # machinectl shell root@myContainer /bin/bash  #Get a root bash shell
 
  # machinectl shell root@myContainer /bin/bash  #Get a root bash shell
Line 95: Line 84:
 
''systemd-nspawn'' supports unprivileged containers, though the containers need to be booted as root.
 
''systemd-nspawn'' supports unprivileged containers, though the containers need to be booted as root.
  
{{Note|This feature requires {{man|7|user_namespaces}}, which are disabled in the official Arch kernels due to security reasons presented in {{Bug|36969}}. Unofficial packages {{AUR|linux-userns}} and {{AUR|linux-lts-userns}} are available.}}
+
{{Note|This feature requires {{man|7|user_namespaces}}, for further info see [[Linux Containers#Enable support to run unprivileged containers (optional)]]}}
  
 
The easiest way to do this is to let ''systemd-nspawn'' decide everything:
 
The easiest way to do this is to let ''systemd-nspawn'' decide everything:
Line 126: Line 115:
 
First [[enable]] the {{ic|machines.target}} target, then {{ic|systemd-nspawn@''myContainer''.service}}, where {{ic|myContainer}} is an nspawn container in {{ic|/var/lib/machines}}.
 
First [[enable]] the {{ic|machines.target}} target, then {{ic|systemd-nspawn@''myContainer''.service}}, where {{ic|myContainer}} is an nspawn container in {{ic|/var/lib/machines}}.
  
{{Tip|To customize the startup of a container, [[edit]] the {{ic|systemd-nspawn@''myContainer''}} unit instance. See {{man|1|systemd-nspawn}} for all options.}}
+
{{Tip|To customize the startup of a container, edit {{ic|/etc/systemd/nspawn/''myContainer''.nspawn}}. See {{man|5|systemd.nspawn}} for all options.}}
  
 
=== Build and test packages ===
 
=== Build and test packages ===
Line 185: Line 174:
  
 
  $ systemd-cgtop
 
  $ systemd-cgtop
 +
 +
=== Resource control ===
 +
 +
You can take advantage of control groups to implement limits and resource management of your containers with {{ic|systemctl set-property}}, see {{man|5|systemd.resource-control}}. For example, you may want to limit the memory amount or CPU usage. To limit the memory consumption of your container to 2 GiB:
 +
 +
# systemctl set-property systemd-nspawn@''myContainer''.service MemoryMax=2G
 +
 +
Or to limit the CPU time usage to roughly the equivalent of 2 cores:
 +
 +
# systemctl set-property systemd-nspawn@''myContainer''.service CPUQuota=200%
 +
 +
This will create permanent files in {{ic|/etc/systemd/system.control/systemd-nspawn@''myContainer''.service.d/}}.
 +
 +
According to the documentation, {{ic|MemoryHigh}} is the preferred method to keep in check memory consumption, but it will not be hard-limited as is the case with {{ic|MemoryMax}}. You can use both options leaving {{ic|MemoryMax}} as the last line of defense. Also take in consideration that you will not limit the number of CPUs the container can see, but you will achieve similar results by limiting how much time the container will get at maximum, relative to the total CPU time.
 +
 +
{{Tip|If you do not want this changes to be preserved after reboot you can pass the option {{ic|--runtime}} to make the changes temporary. You can check their results with {{ic|systemd-cgtop}}.}}
  
 
== Tips and tricks ==
 
== Tips and tricks ==
  
 
=== Use an X environment ===
 
=== Use an X environment ===
 +
 +
{{Accuracy|The note about the systemd version at the end of this section seems to be obsolete. For me (systemd version 239) X applications also work if {{ic|/tmp/.X11-unix}} is bound rw.|section=/tmp/.X11-unix contents have to be bind-mounted as read-only - still relevant?}}
  
 
See [[Xhost]] and [[Change root#Run graphical applications from chroot]].
 
See [[Xhost]] and [[Change root#Run graphical applications from chroot]].
Line 194: Line 201:
 
You will need to set the {{ic|DISPLAY}} environment variable inside your container session to connect to the external X server.
 
You will need to set the {{ic|DISPLAY}} environment variable inside your container session to connect to the external X server.
  
X stores some required files in the {{ic|/tmp}} directory. In order for your container to display anything, it needs access to those files. To do so, append the {{ic|--bind<nowiki>=</nowiki>/tmp/.X11-unix:/tmp/.X11-unix}} option when starting the container.
+
X stores some required files in the {{ic|/tmp}} directory. In order for your container to display anything, it needs access to those files. To do so, append the {{ic|--bind-ro<nowiki>=</nowiki>/tmp/.X11-unix}} option when starting the container.
 +
 
 +
{{Note|Since systemd version 235, {{ic|/tmp/.X11-unix}} contents [https://github.com/systemd/systemd/issues/7093 have to be bind-mounted as read-only], otherwise they will disappear from the filesystem. The read-only mount flag does not prevent using {{ic|connect()}} syscall on the socket. If you binded also {{ic|/run/user/1000}} then you might want to explicitly bind {{ic|/run/user/1000/bus}} as read-only to protect the dbus socket from being deleted. }}
 +
 
 +
==== Avoiding xhost ====
 +
 
 +
{{ic|xhost}} only provides rather coarse access rights to the X server. More fine-grained access control is possible via the {{ic|$XAUTHORITY}} file. Unfortunately, just making the {{ic|$XAUTHORITY}} file accessible in the container will not do the job:
 +
your {{ic|$XAUTHORITY}} file is specific to your host, but the container is a different host.
 +
The following trick adapted from [https://stackoverflow.com/a/25280523 stackoverflow] can be used to make your X server accept the {{ic|$XAUTHORITY}} file from an X application run inside the container:
 +
 
 +
$ XAUTH=/tmp/container_xauth
 +
$ xauth nextract - "$DISPLAY" | sed -e 's/^..../ffff/' | xauth -f "$XAUTH" nmerge -
 +
$ sudo systemd-nspawn -D myContainer --bind=/tmp/.X11-unix --bind="$XAUTH" \
 +
                      -E DISPLAY="$DISPLAY" -E XAUTHORITY="$XAUTH" --as-pid2 /usr/bin/xeyes
 +
 
 +
The second line above sets the connection family to "FamilyWild", value {{ic|65535}}, which causes the entry to match every display. See {{man|7|Xsecurity}} for more information.
  
 
=== Run Firefox ===
 
=== Run Firefox ===
Line 221: Line 243:
 
{{Style}}
 
{{Style}}
  
 
+
For the most simple setup, allowing outgoing connections to the internet, you can use [[systemd-networkd]] for network management and DHCP and [[systemd-resolved]] for DNS.
For the most simple setup, allowing outgoing connections to the internet, you can use [[systemd-networkd]] for network management and DHCP and {{ic|systemd-resolved}} for DNS.
 
 
 
# systemctl enable --now systemd-networkd systemd-resolved
 
# ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf # let systemd-resolved manage /etc/resolv.conf
 
  
 
This assumes you have started {{ic|systemd-nspawn}} with the {{ic|-n}} switch, creating a virtual Ethernet link to the host.
 
This assumes you have started {{ic|systemd-nspawn}} with the {{ic|-n}} switch, creating a virtual Ethernet link to the host.
  
Instead of using {{ic|systemd-resolved}} you can also manually [[textedit|edit]] your container's {{ic|/etc/resolv.conf}} by adding your DNS server's IP address.
+
Instead of using [[systemd-resolved]] you can also manually [[textedit|edit]] your container's {{ic|/etc/resolv.conf}} by adding your DNS server's IP address.
  
 
Note the canonical [[systemd-networkd]] host and container .network files are from https://github.com/systemd/systemd/tree/master/network .
 
Note the canonical [[systemd-networkd]] host and container .network files are from https://github.com/systemd/systemd/tree/master/network .
  
 
See [[systemd-networkd#Usage with containers]] for more complex examples.
 
See [[systemd-networkd#Usage with containers]] for more complex examples.
 
==== nsswitch.conf ====
 
 
{{Merge|systemd-networkd}}
 
 
To make it easier to connect to a container from the host, you can enable local DNS resolution for container names. In {{ic|/etc/nsswitch.conf}}, add {{ic|mymachines}} to the {{ic|hosts:}} section, e.g.
 
 
hosts: files mymachines dns myhostname
 
 
Then, any DNS lookup for hostname {{ic|foo}} on the host will first consult {{ic|/etc/hosts}}, then the names of local containers, then upstream DNS etc.
 
  
 
==== Use host networking ====
 
==== Use host networking ====
  
To disable private networking used by containers started with {{ic|machinectl start MyContainer}} [[edit]] the configuration of {{ic|systemd-nspawn@.service}} with
+
To disable private networking used by containers started with {{ic|machinectl start MyContainer}} add a {{ic|MyContainer.nspawn}} file to the{{ic|/etc/systemd/nspawn}} directory (create the directory if needed) and add the following:  
 
 
# systemctl edit systemd-nspawn@.service
 
 
 
and set the {{ic|1=ExecStart=}} option without the {{ic|--network-veth}} parameter unlike the original service:
 
  
{{hc|/etc/systemd/system/systemd-nspawn@.service.d/override.conf|<nowiki>
+
{{hc|/etc/systemd/nspawn/MyContainer.nspawn|<nowiki>
[Service]
+
[Network]
ExecStart=
+
VirtualEthernet=no
ExecStart=/usr/bin/systemd-nspawn --quiet --keep-unit --boot --link-journal=try-guest --machine=%I
 
 
</nowiki>}}
 
</nowiki>}}
  
The newly started containers will use the hosts networking.
+
Parameters set in the {{ic|MyContainer.nspawn}} file will override the defaults used in {{ic|systemd-nspawn@.service}} and the newly started containers will use the host's networking.
  
 
==== Virtual Ethernet interfaces ====
 
==== Virtual Ethernet interfaces ====
Line 270: Line 273:
  
 
For example, a host virtual Ethernet interface shown as {{ic|ve-foo@if2}} will connect to container {{ic|foo}}, and inside the container to the second network interface -- the one shown with index 2 when running {{ic|ip link}} inside the container. Similarly, in the container, the interface named {{ic|host0@if9}} will connect to the 9th slot on the host.
 
For example, a host virtual Ethernet interface shown as {{ic|ve-foo@if2}} will connect to container {{ic|foo}}, and inside the container to the second network interface -- the one shown with index 2 when running {{ic|ip link}} inside the container. Similarly, in the container, the interface named {{ic|host0@if9}} will connect to the 9th slot on the host.
 +
 +
{{Note|If you use a firewall, the traffic of your virtual interface can be blocked as result. You will have to enable the necesary rules to by-pass your firewall.}}
 +
 +
==== Use a network bridge ====
 +
 +
If you have configured a network bridge on the host system in order to have an IP address assigned to the container as if it was a physical machine in your local network (see, for example, [[systemd-networkd#DHCP with two distinct IP]] or [[systemd-networkd#Static IP network]]) you can make systemd-nspawn use it by using the option {{ic|1=--network-bridge=''br0''}}.
  
 
=== Run on a non-systemd system ===
 
=== Run on a non-systemd system ===
Line 302: Line 311:
  
 
After powering off the container, the btrfs subvolume that was created is immediately removed.
 
After powering off the container, the btrfs subvolume that was created is immediately removed.
 +
 +
=== Run docker in systemd-nspawn ===
 +
 +
[[Docker]] requires {{ic|rw}} permission of {{ic|/sys/fs/cgroup}} to run its containers, which is mounted read-only by {{ic|systemd-nspawn}} by default due to cgroup namespace. However, it is possible to run Docker in a systemd-nspawn container by bind-mounting {{ic|/sys/fs/cgroup}} from host os and enabling necessary capabilities and permissions.
 +
 +
{{Note|The following steps are essentially sharing the cgroup namespace to the container, giving kernel keyring permissions and making it a privileged container, which is likely to increase the attack surface and decrease security level. You should always evaluate the actual benefits by doing so before following the steps.}}
 +
 +
First, cgroup namespace should be disabled by {{ic|systemctl edit systemd-nspawn@myContainer}}
 +
 +
{{hc|systemctl edit systemd-nspawn@myContainer|<nowiki>
 +
[Service]
 +
Environment=SYSTEMD_NSPAWN_USE_CGNS=0
 +
</nowiki>}}
 +
 +
Then, edit {{ic|/etc/systemd/nspawn/myContainer.nspawn}} (create if absent) and add the following configurations.
 +
 +
{{hc|/etc/systemd/nspawn/myContainer.nspawn|<nowiki>
 +
[Exec]
 +
Capability=all
 +
SystemCallFilter=add_key keyctl
 +
 +
[Files]
 +
Bind=/sys/fs/cgroup
 +
</nowiki>}}
 +
 +
This grants all capabilities to the container, whitelists two system calls {{ic|add_key}} and {{ic|keyctl}} (related to kernel keyring and required by Docker), and bind-mounts {{ic|/sys/fs/cgroup}} from host to the container. After editing these files, you need to poweroff and restart your container for them to take effect.
 +
 +
{{Note|You might need to load the {{ic|overlay}} module on the host before starting Docker inside the systemd-nspawn to use the {{ic|overlay2}} storage driver (default storage driver of Docker) properly. Failure to load the driver will cause Docker to choose the inefficient driver {{ic|vfs}} which copies everything for every layer of Docker containers. Consult [[Kernel modules#Automatic module loading with systemd]] on how to load the module automatically.}}
  
 
== Troubleshooting ==
 
== Troubleshooting ==
  
=== root login fails ===
+
=== Root login fails ===
  
 
If you get the following error when you try to login (i.e. using {{ic|machinectl login <name>}}):
 
If you get the following error when you try to login (i.e. using {{ic|machinectl login <name>}}):
Line 321: Line 358:
  
 
It can sometimes be impossible to upgrade some packages on the container, {{Pkg|filesystem}} being a perfect example. The issue is due to {{ic|/sys}} being mounted as Read Only. The workaround is to remount the directory in Read Write when running {{ic|mount -o remount,rw -t sysfs sysfs /sys}}, do the upgrade then reboot the container.
 
It can sometimes be impossible to upgrade some packages on the container, {{Pkg|filesystem}} being a perfect example. The issue is due to {{ic|/sys}} being mounted as Read Only. The workaround is to remount the directory in Read Write when running {{ic|mount -o remount,rw -t sysfs sysfs /sys}}, do the upgrade then reboot the container.
 +
 +
=== execv(...) failed: Permission denied ===
 +
 +
When trying to boot the container via {{ic|systemd-nspawn -bD ''/path/to/container''}} (or executing something in the container), and the following error comes up:
 +
 +
execv(/usr/lib/systemd/systemd, /lib/systemd/systemd, /sbin/init) failed: Permission denied
 +
 +
even though the permissions of the files in question (i.e. {{ic|/lib/systemd/systemd}}) are correct, this can be the result of having mounted the file system on which the container is stored as non-root user. For example, if you mount your disk manually with an entry in [[fstab]] that has the options {{ic|noauto,user,...}}, ''systemd-nspawn'' will not allow executing the files even if they are owned by root.
 +
 +
=== Reboot not working ===
 +
 +
When trying to reboot the container via machinectl or within the container, the container does not reboot.
 +
 +
Workaround: edit {{ic|/usr/lib/systemd/system/systemd-nspawn@.service}} and remove {{ic|--keep-unit}}
 +
 +
Reference: https://github.com/systemd/systemd/issues/2809
 +
 +
=== Mounting a NFS share inside the container ===
 +
 +
Not possible at this time (June 2019).
  
 
== See also ==
 
== See also ==
Line 330: Line 387:
 
* [https://www.youtube.com/results?search_query=systemd-nspawn&aq=f Presentation by Lennart Pottering on systemd-nspawn]
 
* [https://www.youtube.com/results?search_query=systemd-nspawn&aq=f Presentation by Lennart Pottering on systemd-nspawn]
 
* [http://dabase.com/e/12009/ Running Firefox in a systemd-nspawn container]
 
* [http://dabase.com/e/12009/ Running Firefox in a systemd-nspawn container]
 +
* [https://patrickskiba.com/sysytemd-nspawn/2019/03/21/graphical-applications-in-systemd-nspawn.html Graphical applications in systemd-nspawn]

Latest revision as of 10:51, 13 October 2019

systemd-nspawn is like the chroot command, but it is a chroot on steroids.

systemd-nspawn may be used to run a command or OS in a light-weight namespace container. It is more powerful than chroot since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and the host and domain name.

systemd-nspawn limits access to various kernel interfaces in the container to read-only, such as /sys, /proc/sys or /sys/fs/selinux. Network interfaces and the system clock may not be changed from within the container. Device nodes may not be created. The host system cannot be rebooted and kernel modules may not be loaded from within the container.

This mechanism differs from Lxc-systemd or Libvirt-lxc, as it is a much simpler tool to configure.

Installation

systemd-nspawn is part of and packaged with systemd.

Examples

Create and boot a minimal Arch Linux distribution in a container

First install arch-install-scripts.

Next, create a directory to hold the container. In this example we will use ~/MyContainer.

Next, we use pacstrap to install a basic arch-system into the container. At minimum we need to install the base package.

# pacstrap -c ~/MyContainer base [additional pkgs/groups]
Note: The base package does not depend on the linux kernel package and is container-ready.

Once your installation is finished, boot into the container:

# systemd-nspawn -b -D ~/MyContainer

The -b option will boot the container (i.e. run systemd as PID=1), instead of just running a shell, and -D specifies the directory that becomes the container's root directory.

After the container starts, log in as "root" with no password.

Tip: If the login fails with "Login incorrect", the problem is likely the securetty TTY device whitelist. Add pts/0 through pts/9 to the container's version of the file (~/MyContainer/etc/securetty) and retry. See FS#45903 for details.

The container can be powered off by running poweroff from within the container. From the host, containers can be controlled by the machinectl tool.

Note: To terminate the session from within the container, hold Ctrl and rapidly press ] three times. Non-US keyboard users should use % instead of ].

Create a Debian or Ubuntu environment

Install debootstrap, and one or both of debian-archive-keyring and ubuntu-keyring (obviously install the keyrings for the distros you want).

Note: systemd-nspawn requires that the operating system in the container has systemd running as PID 1 and systemd-nspawn is installed in the container. This means Ubuntu before 15.04 will not work out of the box and requires additional configuration to switch from upstart to systemd. Also make sure that the systemd-container package is installed on the container system.

From there it's rather easy to setup Debian or Ubuntu environments:

# cd /var/lib/machines
# debootstrap --include=systemd-container --components=main,universe <codename> myContainer <repository-url>

For Debian valid code names are either the rolling names like "stable" and "testing" or release names like "stretch" and "sid", for Ubuntu the code name like "xenial" or "zesty" should be used. A complete list of codenames is in /usr/share/debootstrap/scripts. In case of a Debian image the "repository-url" can be http://deb.debian.org/debian/. For an Ubuntu image, the "repository-url" can be http://archive.ubuntu.com/ubuntu/.

Unlike Arch, Debian and Ubuntu will not let you login without a password on first login. To set the root password login without the '-b' option and set a password:

# systemd-nspawn -D myContainer
# passwd
# logout

If the above did not work. One can start the container and use these commands instead:

# systemd-nspawn -b -D myContainer  #Starts the container
# machinectl shell root@myContainer /bin/bash  #Get a root bash shell
# passwd
# logout

Creating private users (unprivileged containers)

systemd-nspawn supports unprivileged containers, though the containers need to be booted as root.

The easiest way to do this is to let systemd-nspawn decide everything:

# systemd-nspawn -UD myContainer
# passwd
# logout
# systemd-nspawn -bUD myContainer

Here systemd-nspawn will see if the owner of the directory is being used, if not it will use that as base and 65536 IDs above it. On the other hand if the UID/GID is in use it will randomly pick an unused range of 65536 IDs from 524288 - 1878982656 and use them.

Note:
  • The base of the range chosen is always a multiple of 65536.
  • -U and --private-users=pick is the same, if kernel supports user namespaces. --private-users=pick also implies --private-users-chown, see systemd-nspawn(1) for details.

You can also specify the UID/GID of the container manually:

# systemd-nspawn -D myContainer --private-users=1354956800:65536 --private-users-chown
# passwd
# logout
# systemd-nspawn -bUD myContainer

While booting the container you could still use --private-users=1354956800:65536 with --private-users-chown, but it is unnecessarily complicated, let -U handle it after the assigning the IDs.

Enable container on boot

When using a container frequently, you may want to start it on boot.

First enable the machines.target target, then systemd-nspawn@myContainer.service, where myContainer is an nspawn container in /var/lib/machines.

Tip: To customize the startup of a container, edit /etc/systemd/nspawn/myContainer.nspawn. See systemd.nspawn(5) for all options.

Build and test packages

See Creating packages for other distributions for example uses.

Management

machinectl

Note: The machinectl tool requires systemd and dbus to be installed in the container. See [1] for detailed discussion.

Managing your containers is essentially done with the machinectl command. See machinectl(1) for details.

Examples:

Spawn a new shell inside a running container:

$ machinectl login MyContainer

Show detailed information about a container:

$ machinectl status MyContainer

Reboot a container:

$ machinectl reboot MyContainer

Poweroff a container:

$ machinectl poweroff MyContainer
Tip: Poweroff and reboot operations can be performed from within a container session using the systemctl poweroff or reboot commands.

Download an image:

# machinectl pull-tar URL name

systemd toolchain

Much of the core systemd toolchain has been updated to work with containers. Tools that do usually provide a -M, --machine= option which will take a container name as argument.

Examples:

See journal logs for a particular machine:

$ journalctl -M MyContainer

Show control group contents:

$ systemd-cgls -M MyContainer

See startup time of container:

$ systemd-analyze -M MyContainer

For an overview of resource usage:

$ systemd-cgtop

Resource control

You can take advantage of control groups to implement limits and resource management of your containers with systemctl set-property, see systemd.resource-control(5). For example, you may want to limit the memory amount or CPU usage. To limit the memory consumption of your container to 2 GiB:

# systemctl set-property systemd-nspawn@myContainer.service MemoryMax=2G

Or to limit the CPU time usage to roughly the equivalent of 2 cores:

# systemctl set-property systemd-nspawn@myContainer.service CPUQuota=200%

This will create permanent files in /etc/systemd/system.control/systemd-nspawn@myContainer.service.d/.

According to the documentation, MemoryHigh is the preferred method to keep in check memory consumption, but it will not be hard-limited as is the case with MemoryMax. You can use both options leaving MemoryMax as the last line of defense. Also take in consideration that you will not limit the number of CPUs the container can see, but you will achieve similar results by limiting how much time the container will get at maximum, relative to the total CPU time.

Tip: If you do not want this changes to be preserved after reboot you can pass the option --runtime to make the changes temporary. You can check their results with systemd-cgtop.

Tips and tricks

Use an X environment

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: The note about the systemd version at the end of this section seems to be obsolete. For me (systemd version 239) X applications also work if /tmp/.X11-unix is bound rw. (Discuss in Talk:Systemd-nspawn#/tmp/.X11-unix contents have to be bind-mounted as read-only - still relevant?)

See Xhost and Change root#Run graphical applications from chroot.

You will need to set the DISPLAY environment variable inside your container session to connect to the external X server.

X stores some required files in the /tmp directory. In order for your container to display anything, it needs access to those files. To do so, append the --bind-ro=/tmp/.X11-unix option when starting the container.

Note: Since systemd version 235, /tmp/.X11-unix contents have to be bind-mounted as read-only, otherwise they will disappear from the filesystem. The read-only mount flag does not prevent using connect() syscall on the socket. If you binded also /run/user/1000 then you might want to explicitly bind /run/user/1000/bus as read-only to protect the dbus socket from being deleted.

Avoiding xhost

xhost only provides rather coarse access rights to the X server. More fine-grained access control is possible via the $XAUTHORITY file. Unfortunately, just making the $XAUTHORITY file accessible in the container will not do the job: your $XAUTHORITY file is specific to your host, but the container is a different host. The following trick adapted from stackoverflow can be used to make your X server accept the $XAUTHORITY file from an X application run inside the container:

$ XAUTH=/tmp/container_xauth
$ xauth nextract - "$DISPLAY" | sed -e 's/^..../ffff/' | xauth -f "$XAUTH" nmerge -
$ sudo systemd-nspawn -D myContainer --bind=/tmp/.X11-unix --bind="$XAUTH" \
                      -E DISPLAY="$DISPLAY" -E XAUTHORITY="$XAUTH" --as-pid2 /usr/bin/xeyes

The second line above sets the connection family to "FamilyWild", value 65535, which causes the entry to match every display. See Xsecurity(7) for more information.

Run Firefox

See Firefox tweaks.

Access host filesystem

See --bind and --bind-ro in systemd-nspawn(1).

If both the host and the container are Arch Linux, then one could, for example, share the pacman cache:

# systemd-nspawn --bind=/var/cache/pacman/pkg

Or you can specify per-container bind using the file:

/etc/systemd/nspawn/my-container.nspawn
[Files]
Bind=/var/cache/pacman/pkg

See #Specify per-container settings.

Configure networking

Tango-edit-clear.pngThis article or section needs language, wiki syntax or style improvements. See Help:Style for reference.Tango-edit-clear.png

Reason: please use the first argument of the template to provide a brief explanation. (Discuss in Talk:Systemd-nspawn#)

For the most simple setup, allowing outgoing connections to the internet, you can use systemd-networkd for network management and DHCP and systemd-resolved for DNS.

This assumes you have started systemd-nspawn with the -n switch, creating a virtual Ethernet link to the host.

Instead of using systemd-resolved you can also manually edit your container's /etc/resolv.conf by adding your DNS server's IP address.

Note the canonical systemd-networkd host and container .network files are from https://github.com/systemd/systemd/tree/master/network .

See systemd-networkd#Usage with containers for more complex examples.

Use host networking

To disable private networking used by containers started with machinectl start MyContainer add a MyContainer.nspawn file to the/etc/systemd/nspawn directory (create the directory if needed) and add the following:

/etc/systemd/nspawn/MyContainer.nspawn
[Network]
VirtualEthernet=no

Parameters set in the MyContainer.nspawn file will override the defaults used in systemd-nspawn@.service and the newly started containers will use the host's networking.

Virtual Ethernet interfaces

If a container is started with systemd-nspawn ... -n, systemd will automatically create one virtual Ethernet interface on the host, and one in the container, connected by a virtual Ethernet cable.

If the name of the container is foo, the name of the virtual Ethernet interface on the host is ve-foo. The name of the virtual Ethernet interface in the container is always host0.

When examining the interfaces with ip link, interface names will be shown with a suffix, such as ve-foo@if2 and host0@if9. The @ifN is not actually part of the name of the interface; instead, ip link appends this information to indicate which "slot" the virtual Ethernet cable connects to on the other end.

For example, a host virtual Ethernet interface shown as ve-foo@if2 will connect to container foo, and inside the container to the second network interface -- the one shown with index 2 when running ip link inside the container. Similarly, in the container, the interface named host0@if9 will connect to the 9th slot on the host.

Note: If you use a firewall, the traffic of your virtual interface can be blocked as result. You will have to enable the necesary rules to by-pass your firewall.

Use a network bridge

If you have configured a network bridge on the host system in order to have an IP address assigned to the container as if it was a physical machine in your local network (see, for example, systemd-networkd#DHCP with two distinct IP or systemd-networkd#Static IP network) you can make systemd-nspawn use it by using the option --network-bridge=br0.

Run on a non-systemd system

See Init#systemd-nspawn.

Specify per-container settings

To specify per-container settings and not overrides for all (e.g. bind a directory to only one container), the .nspawn files can be used. See systemd.nspawn(5) for details.

Use Btrfs subvolume as container root

To use a Btrfs subvolume as a template for the container's root, use the --template flag. This takes a snapshot of the subvolume and populates the root directory for the container with it.

Note: If the template path specified is not the root of a subvolume, the entire tree is copied. This will be very time consuming.

For example, to use a snapshot located at /.snapshots/403/snapshot:

# systemd-nspawn --template=/.snapshots/403/snapshots -b -D my-container

where my-container is the name of the directory that will be created for the container. After powering off, the newly created subvolume is retained.

Use temporary Btrfs snapshot of container

One can use the --ephemeral or -x flag to create a temporary btrfs snapshot of the container and use it as the container root. Any changes made while booted in the container will be lost. For example:

# systemd-nspawn -D my-container -xb

where my-container is the directory of an existing container or system. For example, if / is a btrfs subvolume one could create an ephemeral container of the currently running host system by doing:

# systemd-nspawn -D / -xb 

After powering off the container, the btrfs subvolume that was created is immediately removed.

Run docker in systemd-nspawn

Docker requires rw permission of /sys/fs/cgroup to run its containers, which is mounted read-only by systemd-nspawn by default due to cgroup namespace. However, it is possible to run Docker in a systemd-nspawn container by bind-mounting /sys/fs/cgroup from host os and enabling necessary capabilities and permissions.

Note: The following steps are essentially sharing the cgroup namespace to the container, giving kernel keyring permissions and making it a privileged container, which is likely to increase the attack surface and decrease security level. You should always evaluate the actual benefits by doing so before following the steps.

First, cgroup namespace should be disabled by systemctl edit systemd-nspawn@myContainer

systemctl edit systemd-nspawn@myContainer
[Service]
Environment=SYSTEMD_NSPAWN_USE_CGNS=0

Then, edit /etc/systemd/nspawn/myContainer.nspawn (create if absent) and add the following configurations.

/etc/systemd/nspawn/myContainer.nspawn
[Exec]
Capability=all
SystemCallFilter=add_key keyctl

[Files]
Bind=/sys/fs/cgroup

This grants all capabilities to the container, whitelists two system calls add_key and keyctl (related to kernel keyring and required by Docker), and bind-mounts /sys/fs/cgroup from host to the container. After editing these files, you need to poweroff and restart your container for them to take effect.

Note: You might need to load the overlay module on the host before starting Docker inside the systemd-nspawn to use the overlay2 storage driver (default storage driver of Docker) properly. Failure to load the driver will cause Docker to choose the inefficient driver vfs which copies everything for every layer of Docker containers. Consult Kernel modules#Automatic module loading with systemd on how to load the module automatically.

Troubleshooting

Root login fails

If you get the following error when you try to login (i.e. using machinectl login <name>):

arch-nspawn login: root
Login incorrect

And journalctl shows:

pam_securetty(login:auth): access denied: tty 'pts/0' is not secure !

Add pts/0 to the list of terminal names in /etc/securetty on the container filesystem, see [2]. You can also opt to delete /etc/securetty on the container to allow root to login to any tty, see [3].

Unable to upgrade some packages on the container

It can sometimes be impossible to upgrade some packages on the container, filesystem being a perfect example. The issue is due to /sys being mounted as Read Only. The workaround is to remount the directory in Read Write when running mount -o remount,rw -t sysfs sysfs /sys, do the upgrade then reboot the container.

execv(...) failed: Permission denied

When trying to boot the container via systemd-nspawn -bD /path/to/container (or executing something in the container), and the following error comes up:

execv(/usr/lib/systemd/systemd, /lib/systemd/systemd, /sbin/init) failed: Permission denied

even though the permissions of the files in question (i.e. /lib/systemd/systemd) are correct, this can be the result of having mounted the file system on which the container is stored as non-root user. For example, if you mount your disk manually with an entry in fstab that has the options noauto,user,..., systemd-nspawn will not allow executing the files even if they are owned by root.

Reboot not working

When trying to reboot the container via machinectl or within the container, the container does not reboot.

Workaround: edit /usr/lib/systemd/system/systemd-nspawn@.service and remove --keep-unit

Reference: https://github.com/systemd/systemd/issues/2809

Mounting a NFS share inside the container

Not possible at this time (June 2019).

See also