Linux Containers (简体中文)
Linux Containers (LXC) is an operating-system-level virtualization method for running multiple isolated Linux systems (containers) on a single control host (LXC host). It does not provide a virtual machine, but rather provides a virtual environment that has its own CPU, memory, block I/O, network, etc. space and the resource control mechanism. This is provided by the namespaces and cgroups features in the Linux kernel on the LXC host. It is similar to a chroot, but offers much more isolation.
LXD can be used as manager for LXC. This page deals with using LXC directly.
Privileged containers or unprivileged containers
LXCs can be setup to run in either privileged or unprivileged configurations.
In general, running an unprivileged container is considered safer than running a privileged container, since unprivileged containers have an increased degree of isolation by virtue of their design. Key to this is the mapping of the root UID within the container to a non-root UID on the host, which makes it more difficult for a hack inside the container to lead to consequences on the host system. In other words, if an attacker manages to escape the container, they should find themselves with limited or no rights on the host.
The Arch, and kernel packages currently provide out-of-the-box support for unprivileged containers. Similarly, with the package, unprivileged containers are only available for the system administrator; with additional kernel configuration changes required, as user namespaces are disabled by default for normal users there.
An example to illustrate unprivileged containers
To illustrate the power of UID mapping, consider the output below from a running, unprivileged container. Therein, we see the containerized processes owned by the containerized root user in the output of
[root@unprivileged_container /]# ps -ef | head -n 5 UID PID PPID C STIME TTY TIME CMD root 1 0 0 17:49 ? 00:00:00 /sbin/init root 14 1 0 17:49 ? 00:00:00 /usr/lib/systemd/systemd-journald dbus 25 1 0 17:49 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 26 1 0 17:49 ? 00:00:00 /usr/lib/systemd/systemd-networkd
On the host, however, those containerized root processes are actually shown to be running as the mapped user (ID>100000), rather than the host's actual root user:
[root@host /]# lxc-info -Ssip --name sandbox State: RUNNING PID: 26204 CPU use: 10.51 seconds BlkIO use: 244.00 KiB Memory use: 13.09 MiB KMem use: 7.21 MiB
[root@host /]# ps -ef | grep 26204 | head -n 5 UID PID PPID C STIME TTY TIME CMD 100000 26204 26200 0 12:49 ? 00:00:00 /sbin/init 100000 26256 26204 0 12:49 ? 00:00:00 /usr/lib/systemd/systemd-journald 100081 26282 26204 0 12:49 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation 100000 26284 26204 0 12:49 ? 00:00:00 /usr/lib/systemd/systemd-logind
安装和 可以允许宿主系统运行具有特权的 lxcs。
（注：非特权（unprivileged），指的是不具有特权，既 root 权限的对象。）
lxc.idmap = u 0 100000 65536 lxc.idmap = g 0 100000 65536
/etc/subgid 来包含从需要拥有运行容器权利的用户到容器的 uid/gid 元组（pairs）的映射。下面的示例就是很简单的给用户 root（和 systemd (简体中文) 系统单元）用的：
此外，仅仅当你提前代表了 cgroup（the cgroup2 delegation model enforces this restriction, not liblxc），你才能作为一个非特权的用户，运行一个非特权容器。使用下面的 systemd 命令来代表 cgroup（per LXC - Getting started: Creating unprivileged containers as a user）：
$ systemd-run --unit=myshell --user --scope -p "Delegate=yes" lxc-start container_name
This works similarly for other lxc commands.
Alternatively, delegate unprivileged cgroups by creating a systemd unit (per Rootless Containers: Enabling CPU, CPUSET, and I/O delegation):
[Service] Delegate=cpu cpuset io memory pids
Unprivileged containers on linux-hardened and custom kernels
Users wishing to run unprivileged containers onor their custom kernel need to complete several additional setup steps.
Firstly, a kernel is required that has support for User Namespaces (a kernel with
CONFIG_USER_NS). All Arch Linux kernels have support for
CONFIG_USER_NS. However, due to more general security concerns, the kernel does ship with User Namespaces enabled only for the root user. There are two options to create unprivileged containers there:
- Start the unprivileged containers only as root. Also give the sysctl setting
user.max_user_namespacesa positive value to suit your environment if its current value is
Failed to clone process in new user namespaceerrors seen in
lxc info --show-log container_name).
- Enable the sysctl setting
kernel.unprivileged_userns_cloneto allow normal users to run unprivileged containers. This can be done for the current session by running
sysctl kernel.unprivileged_userns_clone=1as root and can be made permanent with .
Host network configuration
LXCs support different virtual network types and devices (see). A bridge device on the host is required for most types of virtual networking which is illustrated in this section.
There are several main setups to consider:
- A host bridge
- A NAT bridge
The host bridge requires the host's network manager to manage a shared bridge interface. The host and any lxc will be assigned an IP address in the same network (for example 192.168.1.x). This might be more simplistic in cases where the goal is to containerize some network-exposed service like a webserver, or VPN server. The user can think of the lxc as just another PC on the physical LAN, and forward the needed ports in the router accordingly. The added simplicity can also be thought of as an added threat vector, again, if WAN traffic is being forwarded to the lxc, having it running on a separate range presents a smaller threat surface.
The NAT bridge does not require the host's network manager to manage the bridge.
lxc-net which creates a NAT bridge called
lxcbr0. The NAT bridge is a standalone bridge with a private network that is not bridged to the host's ethernet device or to a physical network. It exists as a private subnet in the host.
Using a host bridge
See Network bridge.
Using a NAT bridge
Install which is a dependency for
lxc-net and before starting the bridge, first create a configuration file for it:
# Leave USE_LXC_BRIDGE as "true" if you want to use lxcbr0 for your # containers. Set to "false" if you'll use virbr0 or another existing # bridge, or mavlan to your host's NIC. USE_LXC_BRIDGE="true" # If you change the LXC_BRIDGE to something other than lxcbr0, then # you will also need to update your /etc/lxc/default.conf as well as the # configuration (/var/lib/lxc/<container>/config) for any containers # already created using the default config to reflect the new bridge # name. # If you have the dnsmasq daemon installed, you'll also have to update # /etc/dnsmasq.d/lxc and restart the system wide dnsmasq daemon. LXC_BRIDGE="lxcbr0" LXC_ADDR="10.0.3.1" LXC_NETMASK="255.255.255.0" LXC_NETWORK="10.0.3.0/24" LXC_DHCP_RANGE="10.0.3.2,10.0.3.254" LXC_DHCP_MAX="253" # Uncomment the next line if you'd like to use a conf-file for the lxcbr0 # dnsmasq. For instance, you can use 'dhcp-host=mail1,10.0.3.100' to have # container 'mail1' always get ip address 10.0.3.100. #LXC_DHCP_CONFILE=/etc/lxc/dnsmasq.conf # Uncomment the next line if you want lxcbr0's dnsmasq to resolve the .lxc # domain. You can then add "server=/lxc/10.0.3.1' (or your actual $LXC_ADDR) # to your system dnsmasq configuration file (normally /etc/dnsmasq.conf, # or /etc/NetworkManager/dnsmasq.d/lxc.conf on systems that use NetworkManager). # Once these changes are made, restart the lxc-net and network-manager services. # 'container1.lxc' will then resolve on your host. #LXC_DOMAIN="lxc"
Then we need to modify the LXC container template so our containers use our bridge:
lxc.net.0.type = veth lxc.net.0.link = lxcbr0 lxc.net.0.flags = up lxc.net.0.hwaddr = 00:16:3e:xx:xx:xx
Optionally create a configuration file to manually define the IP address of any containers:
Since the lxc is running on the 10.0.3.x subnet, access to services such as ssh, httpd, etc. will need to be actively forwarded to the lxc. In principal, the firewall on the host needs to forward incoming traffic on the expected port on the container.
Example iptables rule
The goal of this rule is to allow ssh traffic to the lxc:
# iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 2221 -j DNAT --to-destination 10.0.3.100:22
This rule forwards tcp traffic originating on port 2221 to the IP address of the lxc on port 22.
To ssh into the container from another PC on the LAN, one needs to ssh on port 2221 to the host. The host will then forward that traffic to the container.
$ ssh -p 2221 host.lan
Example ufw rule
/etc/ufw/before.rules to make this persistent:
*nat :PREROUTING ACCEPT [0:0] -A PREROUTING -i eth0 -p tcp --dport 2221 -j DNAT --to-destination 10.0.3.100:22 COMMIT
Running containers as non-root user
To create and start containers as a non-root user, extra configuration must be applied.
Create the usernet file under
/etc/lxc/lxc-usernet. According to the
lxc-usernet man page, the entry per line is:
user type bridge number
Configure the file with the user needing to create containers. The bridge will be the same as defined in
A copy of the
/etc/lxc/default.conf is needed in the non-root user's home directory, e.g.
~/.config/lxc/default.conf (create the directory if needed).
Running containers as a non-root user requires
+x permissions on
~/.local/share/. Make that change with chmod before starting a container.
Containers are built using
lxc-create. With the release of lxc-3.0.0-1, upstream has deprecated locally stored templates.
To build an Arch container, invoke like this:
# lxc-create -n playtime -t download -- --dist archlinux --release current --arch amd64
For other distros, invoke like this and select options from the supported distros displayed in the list:
# lxc-create -n playtime -t download
haveged.serviceto avoid a perceived hang during the setup process while waiting for system entropy to be seeded. Without it, the generation of private/GPG keys can add a lengthy wait to the process.
-B btrfsto create a Btrfs subvolume for storing containerized rootfs. This comes in handy if cloning containers with the help of
lxc-clonecommand. ZFS users may use
-B zfs, correspondingly.
The examples below can be used with privileged and unprivileged containers alike. Note that for unprivileged containers, additional lines will be present by default which are not shown in the examples, including the
lxc.idmap = u 0 100000 65536 and the
lxc.idmap = g 0 100000 65536 values optionally defined in the #Enable support to run unprivileged containers (optional)[损坏的链接：无效的章节] section.
Basic configuration with networking
System resources to be virtualized/isolated when a process is using the container are defined in
/var/lib/lxc/CONTAINER_NAME/config. By default, the creation process will make a minimum setup without networking support. Below is an example configuration with networking supplied by
# Template used to create this container: /usr/share/lxc/templates/lxc-archlinux # Parameters passed to the template: # For additional config options, please look at lxc.container.conf(5) # Distribution configuration lxc.include = /usr/share/lxc/config/common.conf lxc.arch = x86_64 # Container specific configuration lxc.rootfs.path = dir:/var/lib/lxc/playtime/rootfs lxc.uts.name = playtime # Network configuration lxc.net.0.type = veth lxc.net.0.link = lxcbr0 lxc.net.0.flags = up lxc.net.0.hwaddr = ee:ec:fa:e9:56:7d
Mounts within the container
For privileged containers, one can select directories on the host to bind mount to the container. This can be advantageous for example if the same architecture is being containerize and one wants to share pacman packages between the host and container. Another example could be shared directories. The syntax is simple:
lxc.mount.entry = /var/cache/pacman/pkg var/cache/pacman/pkg none bind 0 0
Xorg program considerations (optional)
In order to run programs on the host's display, some bind mounts need to be defined so that the containerized programs can access the host's resources. Add the following section to
## for xorg lxc.mount.entry = /dev/dri dev/dri none bind,optional,create=dir lxc.mount.entry = /dev/snd dev/snd none bind,optional,create=dir lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none bind,optional,create=dir,ro lxc.mount.entry = /dev/video0 dev/video0 none bind,optional,create=file
If still experiencing a permission denied error in the LXC guest, call
xhost + in the host to allow the guest to connect to the host's display server. Take note of the security concerns of opening up the display server by doing this.
In addition, add the following line before the above bind mount lines.
lxc.mount.entry = tmpfs tmp tmpfs defaults
To list all installed LXC containers:
# lxc-ls -f
Users can also start/stop LXCs without systemd. Start a container:
# lxc-start -n CONTAINER_NAME
Stop a container:
# lxc-stop -n CONTAINER_NAME
To login into a container:
# lxc-console -n CONTAINER_NAME
Once logged, treat the container like any other linux system, set the root password, create users, install packages, etc.
To attach to a container:
# lxc-attach -n CONTAINER_NAME --clear-env
That works nearly the same as lxc-console, but it causes starts with a root prompt inside the container, bypassing login. Without the
--clear-env flag, the host will pass its own environment variables into the container (including
$PATH, so some commands will not work when the containers are based on another distribution).
Users with a need to run multiple containers can simplify administrative overhead (user management, system updates, etc.) by using snapshots. The strategy is to setup and keep up-to-date a single base container, then, as needed, clone (snapshot) it. The power in this strategy is that the disk space and system overhead are truly minimized since the snapshots use an overlayfs mount to only write out to disk, only the differences in data. The base system is read-only but changes to it in the snapshots are allowed via the overlayfs.
For example, setup a container as outlined above. We will call it "base" for the purposes of this guide. Now create 2 snapshots of "base" which we will call "snap1" and "snap2" with these commands:
# lxc-copy -n base -N snap1 -B overlayfs -s # lxc-copy -n base -N snap2 -B overlayfs -s
The snapshots can be started/stopped like any other container. Users can optionally destroy the snapshots and all new data therein with the following command. Note that the underlying "base" lxc is untouched:
# lxc-destroy -n snap1 -f
Converting a privileged container to an unprivileged container
Once the system has been configured to use unprivileged containers (see, #Enable support to run unprivileged containers (optional)[损坏的链接：无效的章节]), AUR contains a utility called
uidmapshift which is able to convert an existing privileged container to an unprivileged container to avoid a total rebuild of the image.
- It is recommended to backup the existing image before using this utility!
- This utility will not shift UIDs and GIDs in ACL, users will need to shift them manually!
Invoke the utility to convert over like so:
# uidmapshift -b /var/lib/lxc/foo 0 100000 65536
Additional options are available simply by calling
uidmapshift without any arguments.
Running Xorg programs
Either attach to or SSH into the target container and prefix the call to the program with the DISPLAY ID of the host's X session. For most simple setups, the display is always 0.
An example of running Firefox from the container in the host's display:
$ DISPLAY=:0 firefox
Alternatively, to avoid directly attaching to or connecting to the container, the following can be used on the host to automate the process:
# lxc-attach -n playtime --clear-env -- sudo -u YOURUSER env DISPLAY=:0 firefox
Root login fails
If presented with following error upon trying to login using lxc-console:
login: root Login incorrect
And the container's journal shows:
pam_securetty(login:auth): access denied: tty 'pts/0' is not secure !
/usr/share/factory/etc/securetty on the container file system. Optionally add them to NoExtract in
/etc/pacman.conf to prevent them from getting reinstalled. See FS#45903 for details.
Alternatively, create a new user in lxc-attach and use it for logging in to the system, then switch to root.
# lxc-attach -n playtime [root@playtime]# useradd -m -Gwheel newuser [root@playtime]# passwd newuser [root@playtime]# passwd root [root@playtime]# exit # lxc-console -n playtime [newuser@playtime]$ su
No network-connection with veth in container config
If you cannot access your LAN or WAN with a networking interface configured as veth and setup through
If the virtual interface gets the ip assigned and should be connected to the network correctly.
ip addr show veth0 inet 192.168.1.111/24
You may disable all the relevant static ip formulas and try setting the ip through the booted container-os like you would normaly do.
... lxc.net.0.type = veth lxc.net.0.name = veth0 lxc.net.0.flags = up lxc.net.0.link =
And then assign the IP through a preferred method inside the container, see also Network configuration#Network management.
Error: unknown command
The error may happen when a basic command (ls, cat, etc.) on an attached container is typed hen a different Linux distribution is containerized relative to the host system (e.g. Debian container in Arch Linux host system). Upon attaching, use the argument
# lxc-attach -n container_name --clear-env
Error: Failed at step KEYRING spawning...
Services in an unprivileged container may fail with the following message
some.service: Failed to change ownership of session keyring: Permission denied some.service: Failed to set up kernel keyring: Permission denied some.service: Failed at step KEYRING spawning ....: Permission denied
Create a file
2 blacklist [all] keyctl errno 38
Then add the following line to the container configuration after lxc.idmap
lxc.seccomp.profile = /etc/lxc/unpriv.seccomp
lxc-execute fails due to missing lxc.init.static
lxc-execute fails with the error message
Unable to open lxc.init.static. See FS#63814 for details.
Starting containers using
lxc-start works fine.