systemd-nspawn is like the chroot command, but it is a chroot on steroids.
systemd-nspawn may be used to run a command or OS in a light-weight namespace container. It is more powerful than chroot since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and the host and domain name.
systemd-nspawn limits access to various kernel interfaces in the container to read-only, such as
/sys/fs/selinux. Network interfaces and the system clock may not be changed from within the container. Device nodes may not be created. The host system cannot be rebooted and kernel modules may not be loaded from within the container.
systemd-nspawn is part of and packaged with.
Create and boot a minimal Arch Linux container
Next, create a directory to hold the container. In this example we will use
Next, we use pacstrap to install a basic Arch system into the container. At minimum we need to install thepackage.
# pacstrap -c ~/MyContainer base [additional packages/groups]
Once your installation is finished, chroot into the container, and set a root password:
# systemd-nspawn -D ~/MyContainer # passwd # logout
Finally, boot into the container:
# systemd-nspawn -b -D ~/MyContainer
-b option will boot the container (i.e. run
systemd as PID=1), instead of just running a shell, and
-D specifies the directory that becomes the container's root directory.
After the container starts, log in as "root" with your password.
securettyTTY device whitelist. See #Root login fails.
The container can be powered off by running
poweroff from within the container. From the host, containers can be controlled by the machinectl tool.
Ctrland rapidly press
]three times. Non-US keyboard users should use
Create a Debian or Ubuntu environment
Install, and one or both of and depending on which distribution you want.
From there it is rather easy to set up Debian or Ubuntu environments:
# cd /var/lib/machines # debootstrap --include=systemd-container --components=main,universe codename container-name repository-url
For Debian valid code names are either the rolling names like "stable" and "testing" or release names like "stretch" and "sid", for Ubuntu the code name like "xenial" or "zesty" should be used. A complete list of code names is in
/usr/share/debootstrap/scripts. In case of a Debian image the "repository-url" can be https://deb.debian.org/debian/. For an Ubuntu image, the "repository-url" can be http://archive.ubuntu.com/ubuntu/. "repository-url" should not contain a trailing slash.
Just like Arch, Debian and Ubuntu will not let you log in without a password. To set the root password, run systemd-nspawn without the
# cd /var/lib/machines # systemd-nspawn -D ./container-name # passwd # logout
Build and test packages
See Creating packages for other distributions for example uses.
Containers located in
/var/lib/machines/ can be controlled by the machinectl command, which internally controls instances of the
systemd-nspawn@.service unit. The subdirectories in
/var/lib/machines/ correspond to the container names, i.e.
/var/lib/machines/for some reason, it can be symlinked. See for details.
Default systemd-nspawn options
It is important to realize that containers started via machinectl or
systemd-nspawn@.service use different default options than containers started manually by the systemd-nspawn command. The extra options used by the service are:
--boot– Managed containers automatically search for an init program and invoke it as PID 1.
--private-network– Managed containers get a virtual network interface and are disconnected from the host network. See #Networking for details.
-U– Managed containers use the feature by default if supported by the kernel. See #Unprivileged containers for implications.
The behaviour can be overridden in per-container configuration files, see #Configuration for details.
Containers can be managed by the
machinectl subcommand container-name command. For example, to start a container:
$ machinectl start container-name
Similarly, there are subcommands such as
show. See for detailed explanations.
Other common commands are:
machinectl list– show a list of currently running containers
machinectl login container-name– open an interactive login session in a container
machinectl shell [username@]container-name– open an interactive shell session in a container (this immediately invokes a user process without going through the login process in the container)
machinectl enable container-nameand
machinectl disable container-name– enable or disable a container to start at boot, see #Enable container to start at boot for details
machinectl also has subcommands for managing container (or virtual machine) images and image transfers. Seeand for details.
Much of the core systemd toolchain has been updated to work with containers. Tools that do usually provide a
-M, --machine= option which will take a container name as argument.
See journal logs for a particular machine:
# journalctl -M container-name
Show control group contents:
$ systemd-cgls -M container-name
See startup time of container:
$ systemd-analyze -M container-name
For an overview of resource usage:
To specify per-container settings and not global overrides, the .nspawn files can be used. Seefor details.
- .nspawn files may be removed unexpectedly from
/etc/systemd/nspawn/when you run
machinectl remove. 
- The interaction of network options specified in the .nspawn file and on the command line does not work correctly when there is
--settings=override(which is specified in the
systemd-nspawn@.servicefile).  As a workaround, you need to include the option
VirtualEthernet=on, even though the service specifies
Enable container to start at boot
When using a container frequently, you may want to start it at boot.
First make sure that the
machines.target is enabled.
Containers discoverable by machinectl can be enabled or disabled:
$ machinectl enable container-name
You can take advantage of control groups to implement limits and resource management of your containers with
systemctl set-property, see . For example, you may want to limit the memory amount or CPU usage. To limit the memory consumption of your container to 2 GiB:
# systemctl set-property firstname.lastname@example.org MemoryMax=2G
Or to limit the CPU time usage to roughly the equivalent of 2 cores:
# systemctl set-property email@example.com CPUQuota=200%
This will create permanent files in
According to the documentation,
MemoryHigh is the preferred method to keep in check memory consumption, but it will not be hard-limited as is the case with
MemoryMax. You can use both options leaving
MemoryMax as the last line of defense. Also take in consideration that you will not limit the number of CPUs the container can see, but you will achieve similar results by limiting how much time the container will get at maximum, relative to the total CPU time.
--runtime. You can check their results with systemd-cgtop.
systemd-nspawn containers can use either host networking or private networking:
- In the host networking mode, the container has full access to the host network. This means that the container will be able to access all network services on the host and packets coming from the container will appear to the outside network as coming from the host (i.e. sharing the same IP address).
- In the private networking mode, the container is disconnected from the host's network. This makes all network interfaces unavailable to the container, with the exception of the loopback device and those explicitly assigned to the container. There is a number of different ways to set up network interfaces for the container:
- an existing interface can be assigned to the container (e.g. if you have multiple Ethernet devices),
- a virtual network interface associated with an existing interface (i.e. VLAN interface) can be created and assigned to the container,
- a virtual Ethernet link between the host and the container can be created.
- In the latter case the container's network is fully isolated (from the outside network as well as other containers) and it is up to the administrator to configure networking between the host and the containers. This typically involves creating a network bridge to connect multiple (physical or virtual) interfaces or setting up a Network Address Translation between multiple interfaces.
The host networking mode is suitable for application containers which do not run any networking software that would configure the interface assigned to the container. Host networking is the default mode when you run systemd-nspawn from the shell.
On the other hand, the private networking mode is suitable for system containers that should be isolated from the host system. The creation of virtual Ethernet links is a very flexible tool allowing to create complex virtual networks. This is the default mode for containers started by machinectl or
The following subsections describe common scenarios. Seefor details about the available systemd-nspawn options.
Use host networking
To disable private networking and the creation of a virtual Ethernet link used by containers started with machinectl, add a .nspawn file with the following option:
This will override the
--network-veth option used in
systemd-nspawn@.service and the newly started containers will use the host networking mode.
If a container is started with the
--network-veth option, systemd-nspawn will create a virtual Ethernet link between the host and the container. The host side of the link will be available as a network interface named
ve-container-name. The container side of the link will be named
host0. Note that this option implies
- If the container name is too long, the interface name will be shortened (e.g.
ve-long-container-name) to fit into the 15-characters limit. The full name will be set as the
altnameproperty of the interface (see ) and can be still used to reference the interface.
- When examining the interfaces with
ip link, interface names will be shown with a suffix, such as
@ifNis not actually part of the interface name; instead,
ip linkappends this information to indicate which "slot" the virtual Ethernet cable connects to on the other end.
- For example, a host virtual Ethernet interface shown as
ve-foo@if2is connected to the container
foo, and inside the container to the second network interface – the one shown with index 2 when running
ip linkinside the container. Similarly, the interface named
host0@if9in the container is connected to the 9th network interface on the host.
When you start the container, an IP address has to be assigned to both interfaces (on the host and in the container). If you use systemd-networkd on the host as well as in the container, this is done out-of-the-box:
/usr/lib/systemd/network/80-container-ve.networkfile on the host matches the
ve-container-nameinterface and starts a DHCP server, which assigns IP addresses to the host interface as well as the container,
/usr/lib/systemd/network/80-container-host0.networkfile in the container matches the
host0interface and starts a DHCP client, which receives an IP address from the host.
To give the container access to the outside network, you can configure NAT as described in Internet sharing#Enable NAT. If you use systemd-networkd, this is done (partially) automatically via the
IPMasquerade=both option in
/usr/lib/systemd/network/80-container-ve.network. However, this issues just one iptables (or nftables) rule such as
-t nat -A POSTROUTING -s 192.168.163.192/28 -j MASQUERADE
filter table has to be configured manually as shown in Internet sharing#Enable NAT. You can use a wildcard to match all interfaces starting with
# iptables -A FORWARD -i ve-+ -o internet0 -j ACCEPT
Additionally, the rule
-A FORWARD -i ve-+ -o internet0 -j ACCEPT may not work as described in Internet sharing#Enable NAT. If that is the case, try
-A FORWARD -i ve-+ -j ACCEPT.
Use a network bridge
If you have configured a network bridge on the host system, you can create a virtual Ethernet link for the container and add its host side to the network bridge. This is done with the
--network-bridge=bridge-name option. Note that
--network-veth, i.e. the virtual Ethernet link is created automatically. However, the host side of the link will use the
vb- prefix instead of
ve-, so the systemd-networkd options for starting the DHCP server and IP masquerading will not be applied.
The bridge management is left to the administrator. For example, the bridge can connect virtual interfaces with a physical interface, or it can connect only virtual interfaces of several containers. See systemd-networkd#Network bridge with DHCP and systemd-networkd#Network bridge with static IP addresses for example configurations using systemd-networkd.
There is also a
--network-zone=zone-name option which is similar to
--network-bridge but the network bridge is managed automatically by systemd-nspawn and systemd-networkd. The bridge interface named
vz-zone-name is automatically created when the first container configured with
--network-zone=zone-name is started, and is automatically removed when the last container configured with
--network-zone=zone-name exits. Hence, this option makes it easy to place multiple related containers on a common virtual network. Note that
vz-* interfaces are managed by systemd-networkd same way as
ve-* interfaces using the options from the
Use a "macvlan" or "ipvlan" interface
Instead of creating a virtual Ethernet link (whose host side may or may not be added to a bridge), you can create a virtual interface on an existing physical interface (i.e. VLAN interface) and add it to the container. The virtual interface will be bridged with the underlying host interface and thus the container will be exposed to the outside network, which allows it to obtain a distinct IP address via DHCP from the same LAN as the host is connected to.
systemd-nspawn offers 2 options:
--network-macvlan=interface– the virtual interface will have a different MAC address than the underlying physical
interfaceand will be named
--network-ipvlan=interface– the virtual interface will have the same MAC address as the underlying physical
interfaceand will be named
Both options imply
Use an existing interface
If the host system has multiple physical network interfaces, you can use the
--network-interface=interface to assign
interface to the container (and make it unavailable to the host while the container is started). Note that
When private networking is enabled, individual ports on the host can be mapped to ports on the container using the
--port option or by using the
Port setting in an .nspawn file. This is done by issuing iptables rules to the
nat table, but the
FORWARD chain in the
filter table needs to be configured manually as shown in #Use a virtual Ethernet link.
For example, to map a TCP port 8000 on the host to the TCP port 80 in the container:
loopbackinterface when mapping ports. Hence, for the example above,
localhost:8000connects to the host and not to the container. Only connections to other interfaces are subjected to port mapping. See  for details.
Domain name resolution
Domain name resolution in the container can be configured by the
--resolv-conf option of systemd-nspawn or the corresponding option
ResolvConf= for the .nspawn files. There are many possible values which are described in .
The default value is
auto, which means that:
--private-networkis enabled, the
/etc/resolv.confis left as it is in the container.
- Otherwise, if systemd-resolved is running on the host, its stub
resolv.conffile is copied or bind-mounted into the container.
- Otherwise, the
/etc/resolv.conffile is copied or bind-mounted from the host to the container.
In the last two cases, the file is copied, if the container root is writeable, and bind-mounted if it is read-only.
Tips and tricks
systemd-nspawn supports unprivileged containers, though the containers need to be booted as root.
The easiest way to do this is to let systemd-nspawn automatically choose an unused range of UIDs/GIDs by using the
# systemd-nspawn -bUD ~/MyContainer
If kernel supports user namespaces, the
-U option is equivalent to
--private-users=pick --private-users-chown. This implies that files and directories in the container are chowned to the selected range of private UIDs/GIDs when the container starts. See for details.
Once you have started a container with a private UID/GID range, you need to keep using it that way to avoid permission errors. Alternatively, it is possible to undo the effect of
-U) on the file system by specifying a range of IDs starting at 0:
# systemd-nspawn -D ~/MyContainer --private-users=0 --private-users-chown
Use an X environment
You will need to set the
DISPLAY environment variable inside your container session to connect to the external X server.
X stores some required files in the
/tmp directory. In order for your container to display anything, it needs access to those files. To do so, append the
--bind-ro=/tmp/.X11-unix option when starting the container.
/tmp/.X11-unixcontents have to be bind-mounted as read-only, otherwise they will disappear from the filesystem. The read-only mount flag does not prevent using
connect()syscall on the socket. If you binded also
/run/user/1000then you might want to explicitly bind
/run/user/1000/busas read-only to protect the dbus socket from being deleted.
xhost only provides rather coarse access rights to the X server. More fine-grained access control is possible via the
$XAUTHORITY file. Unfortunately, just making the
$XAUTHORITY file accessible in the container will not do the job:
$XAUTHORITY file is specific to your host, but the container is a different host.
The following trick adapted from stackoverflow can be used to make your X server accept the
$XAUTHORITY file from an X application run inside the container:
$ XAUTH=/tmp/container_xauth $ xauth nextract - "$DISPLAY" | sed -e 's/^..../ffff/' | xauth -f "$XAUTH" nmerge - # systemd-nspawn -D myContainer --bind=/tmp/.X11-unix --bind="$XAUTH" -E DISPLAY="$DISPLAY" -E XAUTHORITY="$XAUTH" --as-pid2 /usr/bin/xeyes
The second line above sets the connection family to "FamilyWild", value
65535, which causes the entry to match every display. See for more information.
Using X nesting/Xephyr
Another simple way to run X applications and avoid the risks of a shared X desktop is using X nesting. The advantages here are avoiding interaction between in-container applications and non-container applications entirely and being able to run a different desktop environment or window manager, the downsides are less performance and the lack of hardware acceleration when using Xephyr.
Start Xephyr outside of the container using:
# Xephyr :1 -resizeable
Then start the container with the following options:
No other binds are necessary.
You might still need to manually set
DISPLAY=:1 in the container under some circumstances (mostly if used with
To run as PID 1
# systemd-nspawn --setenv=DISPLAY=:0 \ --setenv=XAUTHORITY=~/.Xauthority \ --bind-ro=$HOME/.Xauthority:/root/.Xauthority \ --bind=/tmp/.X11-unix \ -D ~/containers/firefox \ firefox
Alternatively you can boot the container and let e.g. systemd-networkd set up the virtual network interface:
# systemd-nspawn --bind-ro=$HOME/.Xauthority:/root/.Xauthority \ --bind=/tmp/.X11-unix \ -D ~/containers/firefox \ --network-veth -b
Once your container is booted, run the Xorg binary like so:
# systemd-run -M firefox --setenv=DISPLAY=:0 firefox
Access host filesystem
--bind-ro in .
If both the host and the container are Arch Linux, then one could, for example, share the pacman cache:
# systemd-nspawn --bind=/var/cache/pacman/pkg
Or you can specify per-container bind using the file:
To bind the directory to a different path within the container, add the path be separated by a colon. For example:
# systemd-nspawn --bind=/path/to/host_dir:/path/to/container_dir
Run on a non-systemd system
Use Btrfs subvolume as container root
To use a Btrfs subvolume as a template for the container's root, use the
--template flag. This takes a snapshot of the subvolume and populates the root directory for the container with it.
For example, to use a snapshot located at
# systemd-nspawn --template=/.snapshots/403/snapshots -b -D my-container
my-container is the name of the directory that will be created for the container. After powering off, the newly created subvolume is retained.
Use temporary Btrfs snapshot of container
One can use the
-x flag to create a temporary btrfs snapshot of the container and use it as the container root. Any changes made while booted in the container will be lost. For example:
# systemd-nspawn -D my-container -xb
where my-container is the directory of an existing container or system. For example, if
/ is a btrfs subvolume one could create an ephemeral container of the currently running host system by doing:
# systemd-nspawn -D / -xb
After powering off the container, the btrfs subvolume that was created is immediately removed.
Run docker in systemd-nspawn
rw permission of
/sys/fs/cgroup to run its containers, which is mounted read-only by systemd-nspawn by default due to cgroup namespace. However, it is possible to run Docker in a systemd-nspawn container by bind-mounting
/sys/fs/cgroup from the host system and enabling necessary capabilities and permissions.
First, cgroup namespace should be disabled by
systemctl edit systemd-nspawn@myContainer
systemctl edit systemd-nspawn@myContainer
/etc/systemd/nspawn/myContainer.nspawn (create if absent) and add the following configurations.
[Exec] Capability=all SystemCallFilter=add_key keyctl PrivateUsers=no [Files] Bind=/sys/fs/cgroup
This grants all capabilities to the container, disables user namespacing, whitelists two system calls
keyctl (related to kernel keyring and required by Docker), and bind-mounts
/sys/fs/cgroup from host to the container. After editing these files, you need to poweroff and restart your container for them to take effect. If your container had user namespaces enabled before this change (which is the default if the
systemd-nspawn@.service unit is used), you will also need to undo permission changes caused by user namespacing to avoid permission errors. See #Unprivileged containers for details.
- You might need to load the
overlaymodule on the host before starting Docker inside the systemd-nspawn to use the
overlay2storage driver (default storage driver of Docker) properly. Failure to load the driver will cause Docker to choose the inefficient driver
vfswhich copies everything for every layer of Docker containers. Consult Kernel modules#Automatic module loading with systemd on how to load the module automatically.
- As of November 2020, cgroups v2 seems to break Docker inside systemd-nspawn. If you want to use Docker in this way, do not set the kernel parameter
Root login fails
If you get the following error when you try to login (i.e. using
machinectl login <name>):
arch-nspawn login: root Login incorrect
And the journal shows:
pam_securetty(login:auth): access denied: tty 'pts/0' is not secure !
/usr/share/factory/etc/securetty on the container file system. Optionally add them to NoExtract in
/etc/pacman.conf to prevent them from getting reinstalled. See FS#45903 for details.
execv(...) failed: Permission denied
When trying to boot the container via
systemd-nspawn -bD /path/to/container (or executing something in the container), and the following error comes up:
execv(/usr/lib/systemd/systemd, /lib/systemd/systemd, /sbin/init) failed: Permission denied
even though the permissions of the files in question (i.e.
/lib/systemd/systemd) are correct, this can be the result of having mounted the file system on which the container is stored as non-root user. For example, if you mount your disk manually with an entry in fstab that has the options
noauto,user,..., systemd-nspawn will not allow executing the files even if they are owned by root.
Terminal type in TERM is incorrect (broken colors)
When logging into the container via
machinectl login, the colors and keystrokes in the terminal within the container might be broken. This may be due to an incorrect terminal type in
TERM environment variable. The environment variable is not inherited from the shell on the host, but falls back to a default fixed in systemd (
vt220), unless explicitly configured. To configure, within the container create a configuration overlay for the
container-getty@.service systemd service that launches the login getty for
machinectl login, and set
TERM to the value that matches the host terminal you are logging in from:
machinectl shell. It properly inherits the
TERM environment variable from the terminal.
Not possible at this time (June 2019).