Difference between revisions of "Linux Containers"

From ArchWiki
Jump to navigation Jump to search
(→‎Container creation: accuracy note added)
(Reorder so linux-hardened is mention last. Makes it more obvious what these additional steps are.)
 
(65 intermediate revisions by 21 users not shown)
Line 1: Line 1:
[[Category:Security]]
 
 
[[Category:Virtualization]]
 
[[Category:Virtualization]]
 +
[[Category:Sandboxing]]
 
[[ja:Linux Containers]]
 
[[ja:Linux Containers]]
 
[[pt:Linux Containers]]
 
[[pt:Linux Containers]]
 
{{Related articles start}}
 
{{Related articles start}}
{{Related|AirVPN}}
 
 
{{Related|ABS}}
 
{{Related|ABS}}
 
{{Related|Cgroups}}
 
{{Related|Cgroups}}
Line 16: Line 15:
 
{{Related articles end}}
 
{{Related articles end}}
  
Linux Containers (LXC) is an operating-system-level virtualization method for running multiple isolated Linux systems (containers) on a single control host (LXC host). It does not provide a virtual machine, but rather provides a virtual environment that has its own CPU, memory, block I/O, network, etc. space and the resource control mechanism. This is provided by [[Wikipedia:Linux namespaces|namespaces]] and [[cgroups]] features in Linux kernel on LXC host. It is similar to a chroot, but offers much more isolation.
+
[https://linuxcontainers.org/ Linux Containers] (LXC) is an operating-system-level virtualization method for running multiple isolated Linux systems (containers) on a single control host (LXC host). It does not provide a virtual machine, but rather provides a virtual environment that has its own CPU, memory, block I/O, network, etc. space and the resource control mechanism. This is provided by the [[Wikipedia:Linux namespaces|namespaces]] and [[cgroups]] features in the Linux kernel on the LXC host. It is similar to a chroot, but offers much more isolation.
  
Alternatives for using containers are [[systemd-nspawn]], [[docker]] or {{Pkg|rkt}}.
+
Alternatives for using containers are [[systemd-nspawn]], [[docker]] or {{AUR|rkt}}.
  
 
== Privileged containers or unprivileged containers ==
 
== Privileged containers or unprivileged containers ==
 
LXCs can be setup to run in either ''privileged'' or ''unprivileged'' configurations.
 
LXCs can be setup to run in either ''privileged'' or ''unprivileged'' configurations.
  
In general, running an ''unprivileged'' container is [https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-containers considered safer] than running a ''privileged'' container since ''unprivileged'' containers have an increased degree of isolation by virtue of their design.  Key to this is the mapping of the root UID in the container to a non-root UID on the host which makes it more difficult for a hack within the container to lead to consequences on host system.  In other words, if an attacker manages to escape the container, he or she should find themselves with no rights on the host.
+
In general, running an ''unprivileged'' container is [https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-containers considered safer] than running a ''privileged'' container, since ''unprivileged'' containers have an increased degree of isolation by virtue of their design.  Key to this is the mapping of the root UID within the container to a non-root UID on the host, which makes it more difficult for a hack inside the container to lead to consequences on the host system.  In other words, if an attacker manages to escape the container, he or she should find themselves with limited or no rights on the host.
  
The Arch packages currently provide out-of-the-box support for ''privileged'' containers. ''Unprivileged'' containers are only available for the system  administrator without additional kernel configuration. This is due to the current Arch {{pkg|linux}} kernel shipping with user namespaces disabled for normal users.  This article contains information for users to run either type of container, but additional setup is required to use ''unprivileged'' containers.
+
The Arch {{pkg|linux}}, {{pkg|linux-lts}} and {{pkg|linux-zen}} kernel packages currently provide out-of-the-box support for ''unprivileged'' containers. Similarly, with the {{pkg|linux-hardened}} package, ''unprivileged'' containers are only available for the system  administrator; with additional kernel configuration changes required, as user namespaces are disabled by default for normal users there.  This article contains information for users to run either type of container, but additional steps may be required in order to use ''unprivileged'' containers.
  
 
=== An example to illustrate unprivileged containers ===
 
=== An example to illustrate unprivileged containers ===
Line 38: Line 37:
 
  systemd+    26    1  0 17:49 ?        00:00:00 /usr/lib/systemd/systemd-networkd
 
  systemd+    26    1  0 17:49 ?        00:00:00 /usr/lib/systemd/systemd-networkd
  
On the host however, those containerized root processes are running as the mapped user (ID>100000) on the host, not as the root user on the host:
+
On the host, however, those containerized root processes are actually shown to be running as the mapped user (ID>100000), rather than the host's actual root user:
 
  [root@host /]# lxc-info -Ssip --name sandbox
 
  [root@host /]# lxc-info -Ssip --name sandbox
 
  State:          RUNNING
 
  State:          RUNNING
Line 59: Line 58:
  
 
==== Enable support to run unprivileged containers (optional) ====
 
==== Enable support to run unprivileged containers (optional) ====
Users wishing to run ''unprivileged'' containers need to complete several additional setup steps.
+
Enable the [[control groups]] [[PAM]] module by modifying {{ic|/etc/pam.d/system-login}} to additionally contain the following line:
 
+
session optional pam_cgfs.so -c freezer,memory,name=systemd,unified
Firstly, a kernel is required that has support for '''User Namespaces''' (a kernel with {{ic|CONFIG_USER_NS}}). All Arch Linux kernels have support for {{ic|CONFIG_USER_NS}}. However, due to more general security concerns, the default Arch kernel does ship with User Namespaces enabled only for the ''root'' user. You have multiple options to create ''unprivileged'' containers:
 
 
 
* Start your unprivileged containers only as ''root''.
 
* Enable the ''sysctl'' setting {{ic|kernel.unprivileged_userns_clone}} to allow normal users to run unprivileged containers. This can be done for the current session with {{ic|1=sysctl kernel.unprivileged_userns_clone=1}} and can be made permanent with {{man|5|sysctl.d}}.
 
  
 
Secondly, modify {{ic|/etc/lxc/default.conf}} to contain the following lines:
 
Secondly, modify {{ic|/etc/lxc/default.conf}} to contain the following lines:
Line 79: Line 74:
 
root:100000:65536
 
root:100000:65536
 
}}
 
}}
 +
 +
Users wishing to run ''unprivileged'' containers on {{pkg|linux-hardened}} or their custom kernel need to complete several additional setup steps.
 +
 +
Firstly, a kernel is required that has support for User Namespaces (a kernel with {{ic|CONFIG_USER_NS}}). All Arch Linux kernels have support for {{ic|CONFIG_USER_NS}}. However, due to more general security concerns, the {{pkg|linux-hardened}} kernel does ship with User Namespaces enabled only for the ''root'' user. You have two options to create ''unprivileged'' containers there:
 +
 +
* Start your unprivileged containers only as ''root''.
 +
* Enable the ''sysctl'' setting {{ic|kernel.unprivileged_userns_clone}} to allow normal users to run unprivileged containers. This can be done for the current session with {{ic|1=sysctl kernel.unprivileged_userns_clone=1}} and can be made permanent with {{man|5|sysctl.d}}.
  
 
=== Host network configuration ===
 
=== Host network configuration ===
LXCs support different virtual network types and devices (see {{man|5|lxc.container.conf}}). A bridge device on the host is required for most types of virtual networking.
+
LXCs support different virtual network types and devices (see {{man|5|lxc.container.conf}}). A bridge device on the host is required for most types of virtual networking which is illustrated in this section. 
 +
 
 +
There are several main setups to consider:
 +
# A host bridge
 +
# A NAT bridge
 +
 
 +
The host bridge requires the host's network manager to manage a shared bridge interface.  The host and any lxc will be assigned an IP address in the same network (for example 192.168.1.x).  This might be more simplistic in cases where the goal is to containerize some network-exposed service like a webserver, or VPN server.  The user can think of the lxc as just another PC on the physical LAN, and forward the needed ports in the router accordingly.  The added simplicity can also be thought of as an added threat vector, again, if WAN traffic is being forwarded to the lxc, having it running on a separate range presents a smaller threat surface.
  
LXC comes with its own NAT Bridge (lxcbr0).
+
The NAT bridge does not require the host's network manager to manage the bridge.  {{pkg|lxc}} ships with {{ic|lxc-net}} which creates a NAT bridge called {{ic|lxcbr0}}.  The NAT bridge is a standalone bridge with a private network that is not bridged to the host's ethernet device or to a physical network. It exists as a private subnet in the host.
{{Note|A NAT bridge is a standalone bridge with a private network that is not bridged to the host eth0 or a physical network. It exists as a private subnet in the host.}}
 
{{Tip|This is quite useful when WIFI is the only option. There have been various attempts of creating Bridges on WIFI without much success.}}
 
  
To use LXC's NAT Bridge you need to create its configuration file:
+
==== Using a host bridge ====
 +
See [[Network bridge]].
 +
 
 +
==== Using a NAT bridge ====
 +
 
 +
[[Install]] {{pkg|dnsmasq}} which is a dependency for {{ic|lxc-net}} and before starting the bridge, first create a configuration file for it:
  
 
{{hc|/etc/default/lxc-net|
 
{{hc|/etc/default/lxc-net|
Line 122: Line 133:
 
}}
 
}}
  
{{Tip| Make sure the bridges ip-range does not interfere with your local network.}}
+
{{Tip| Make sure the bridge's IP range does not interfere with your local network.}}
  
 
Then we need to modify the LXC container template so our containers use our bridge:
 
Then we need to modify the LXC container template so our containers use our bridge:
Line 133: Line 144:
 
}}
 
}}
  
You also need to [[install]] {{pkg|dnsmasq}} which is a dependency for lxcbr0.
+
Optionally create a configuration file to manually define the IP address of any containers:
 +
{{hc|/etc/lxc/dnsmasq.conf|
 +
2=dhcp-host=playtime,10.0.3.100
 +
}}
 +
 
 +
Now [[start]] and [[enable]] {{ic|lxc-net.service}} to create the bridge interface.
 +
 
 +
===== Firewall considerations =====
 +
Since the lxc is running on the 10.0.3.x subnet, access to services such as ssh, httpd, etc. will need to be actively forwarded to the lxc.  In principal, the firewall on the host needs to forward traffic incoming traffic on the expected port on the container. 
 +
 
 +
====== Example iptables rule ======
 +
 
 +
The goal of this rule is to allow ssh traffic to the lxc:
 +
# iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 2221 -j DNAT --to-destination 10.0.3.100:22
 +
 
 +
This rule forwards tcp traffic originating on port 2221 to the IP address of the lxc on port 22.
 +
{{Note|Make sure to allow traffic on 2221/tcp on the host and to allow 22/tcp traffic on the lxc.}}
 +
 
 +
To ssh into the container from another PC on the LAN, one needs to ssh on port 2221 to the host.  The host will then forward that traffic to the container.
 +
 
 +
$ ssh -p 2221 host.lan
 +
 
 +
====== Example ufw rule ======
 +
If using {{pkg|ufw}}, append the following at the bottom of {{ic|/etc/ufw/before.rules}} to make this persistent:
 +
 
 +
{{hc|/etc/ufw/before.rules|
 +
 
 +
 
 +
*nat
 +
:PREROUTING ACCEPT [0:0]
 +
-A PREROUTING -i eth0 -p tcp --dport 2221 -j DNAT --to-destination 10.0.3.101:22
 +
COMMIT
 +
}}
 +
 
 +
===== Running containers as non-root user =====
 +
To create and start containers as a non-root user, extra configuration must be applied.  
 +
 
 +
Create the usernet file under {{ic|/etc/lxc/lxc-usernet}}. According to the {{ic|lxc-usernet}} man page, the entry per line is:
  
[[Enable]] and/or [[start]] {{ic|lxc-net.service}} to use the bridge:
+
user type bridge number
  
See [[Network bridge]] for more information.
+
Configure the file with the user you want to use to create containers. The bridge will be the same you defined in {{ic|/etc/default/lxc-net}}.
 +
 
 +
A copy of the {{ic|/etc/lxc/default.conf}} is needed in the non-root user's home directory, e.g. {{ic|~/.config/lxc/default.conf}} (create the directory if needed).
 +
 
 +
Running containers as a non-root user requires {{ic|+x}} permissions on {{ic|~/.local/share/}}. Make that change with [[chmod]] before starting a container.
  
 
=== Container creation ===
 
=== Container creation ===
{{Accuracy|With the release of lxc-3.0.0-1, this section is out-dated.  An alternative to templates has been created upstream called distrobuilder, but there is currently no Arch package available for it.}}
+
Containers are built using {{ic|lxc-create}}.  With the release of lxc-3.0.0-1, upstream has deprecated locally stored templates.
  
For ''privileged'' containers, simply select a template from {{ic|/usr/share/lxc/templates}} that matches the target distro to containerize.  Users wishing to containerize non-Arch distros will need additional packages on the host depending on the target distro:
+
To build an Arch container, invoke like this:
* Debian-based: {{Pkg|debootstrap}}
+
# lxc-create -n playtime -t download -- --dist archlinux --release current --arch amd64
* Fedora-based: {{AUR|yum}}
 
  
Run {{ic|lxc-create}} to create the container, which installs the root filesystem of the LXC to {{ic|/var/lib/lxc/CONTAINER_NAME/rootfs}} by default.  Example creating an Arch Linux LXC named "playtime":
+
For other distros, invoke like this and select options from the supported distros displayed in the list:
# lxc-create -n playtime -t /usr/share/lxc/templates/lxc-archlinux
 
 
 
Users wishing to run ''unprivileged'' containers should use the -t download directive and select from the images that are displayed.  For example:
 
 
  # lxc-create -n playtime -t download
 
  # lxc-create -n playtime -t download
 
Alternatively, create a ''privileged'' container, and see: [[#Converting a privileged container to an unprivileged container]].
 
  
 
{{Tip|Users may optionally install {{Pkg|haveged}} and [[start]] {{ic|haveged.service}} to avoid a perceived hang during the setup process while waiting for system entropy to be seeded.  Without it, the generation of private/GPG keys can add a lengthy wait to the process.}}
 
{{Tip|Users may optionally install {{Pkg|haveged}} and [[start]] {{ic|haveged.service}} to avoid a perceived hang during the setup process while waiting for system entropy to be seeded.  Without it, the generation of private/GPG keys can add a lengthy wait to the process.}}
  
 
{{Tip|Users of [[Btrfs]] can append {{ic|-B btrfs}} to create a Btrfs subvolume for storing containerized rootfs. This comes in handy if cloning containers with the help of {{ic|lxc-clone}} command. [[ZFS]] users may use {{ic|-B zfs}}, correspondingly.}}
 
{{Tip|Users of [[Btrfs]] can append {{ic|-B btrfs}} to create a Btrfs subvolume for storing containerized rootfs. This comes in handy if cloning containers with the help of {{ic|lxc-clone}} command. [[ZFS]] users may use {{ic|-B zfs}}, correspondingly.}}
 +
 +
{{Note|Users wanting the legacy templates can find them in {{AUR|lxc-templates}} or alternatively, users can build their own templates with {{AUR|distrobuilder}}.}}
  
 
=== Container configuration ===
 
=== Container configuration ===
Line 164: Line 212:
 
{{Note|With the release of lxc-1:2.1.0-1, many of the configuration options have changed.  Existing containers need to be updated; users are directed to the table of these changes in the [https://discuss.linuxcontainers.org/t/lxc-2-1-has-been-released/487 v2.1 release notes].}}
 
{{Note|With the release of lxc-1:2.1.0-1, many of the configuration options have changed.  Existing containers need to be updated; users are directed to the table of these changes in the [https://discuss.linuxcontainers.org/t/lxc-2-1-has-been-released/487 v2.1 release notes].}}
  
System resources to be virtualized/isolated when a process is using the container are defined in {{ic|/var/lib/lxc/CONTAINER_NAME/config}}. By default, the creation process will make a minimum setup without networking support.  Below is an example config with networking:
+
System resources to be virtualized/isolated when a process is using the container are defined in {{ic|/var/lib/lxc/CONTAINER_NAME/config}}. By default, the creation process will make a minimum setup without networking support.  Below is an example config with networking supplied by {{ic|lxc-net.service}}:
  
 
{{hc|/var/lib/lxc/playtime/config|<nowiki>
 
{{hc|/var/lib/lxc/playtime/config|<nowiki>
Line 171: Line 219:
 
# For additional config options, please look at lxc.container.conf(5)
 
# For additional config options, please look at lxc.container.conf(5)
  
## default values
+
# Distribution configuration
lxc.rootfs.path = /var/lib/lxc/playtime/rootfs
+
lxc.include = /usr/share/lxc/config/common.conf
 +
lxc.arch = x86_64
 +
 
 +
# Container specific configuration
 +
lxc.rootfs.path = dir:/var/lib/lxc/playtime/rootfs
 
lxc.uts.name = playtime
 
lxc.uts.name = playtime
lxc.arch = x86_64
 
lxc.include = /usr/share/lxc/config/common.conf
 
  
## network
+
# Network configuration
 
lxc.net.0.type = veth
 
lxc.net.0.type = veth
lxc.net.0.link = br0
+
lxc.net.0.link = lxcbr0
 
lxc.net.0.flags = up
 
lxc.net.0.flags = up
lxc.net.0.name = eth0
 
 
lxc.net.0.hwaddr = ee:ec:fa:e9:56:7d
 
lxc.net.0.hwaddr = ee:ec:fa:e9:56:7d
# uncomment the next two lines if static IP addresses are needed
 
# leaving these commented will imply DHCP networking
 
#
 
#lxc.net.0.ipv4.address = 192.168.0.3/24
 
#lxc.net.0.ipv4.gateway = 192.168.0.1
 
 
</nowiki>}}
 
</nowiki>}}
 
{{Note|The lxc.network.hwaddr entry is optional and if skipped, a random MAC address will be created automatically. It can be advantageous to define a MAC address for the container to allow the DHCP server to always assign the same IP to the container's NIC (beyond the scope of this article but worth mentioning).}}
 
  
 
==== Mounts within the container ====
 
==== Mounts within the container ====
Line 207: Line 249:
  
 
If you still get a permission denied error in your LXC guest, then you may need to call {{ic|xhost +}} in your host to allow the guest to connect to the host's display server. Take note of the security concerns of opening up your display server by doing this.
 
If you still get a permission denied error in your LXC guest, then you may need to call {{ic|xhost +}} in your host to allow the guest to connect to the host's display server. Take note of the security concerns of opening up your display server by doing this.
In addition you might need to add the following line
+
In addition you might need to add the following line '''before''' the above bind mount lines.
 
  lxc.mount.entry = tmpfs tmp tmpfs defaults
 
  lxc.mount.entry = tmpfs tmp tmpfs defaults
before the bind mount lines.
 
  
 
{{Note|This will not work if using ''unprivileged'' containers.}}
 
{{Note|This will not work if using ''unprivileged'' containers.}}
Line 215: Line 256:
 
==== OpenVPN considerations ====
 
==== OpenVPN considerations ====
  
Users wishing to run [[OpenVPN]] within the container are direct to either [[OpenVPN (client) in Linux containers]] and/or [[OpenVPN (server) in Linux containers]].
+
Users wishing to run [[OpenVPN]] within the container are requested to direct to either [[OpenVPN (client) in Linux containers]] and/or [[OpenVPN (server) in Linux containers]].
  
 
== Managing containers ==
 
== Managing containers ==
Line 223: Line 264:
  
 
Systemd can be used to [[start]] and to [[stop]] LXCs via {{ic|lxc@CONTAINER_NAME.service}}.  [[Enable]] {{ic|lxc@CONTAINER_NAME.service}} to have it start when the host system boots.
 
Systemd can be used to [[start]] and to [[stop]] LXCs via {{ic|lxc@CONTAINER_NAME.service}}.  [[Enable]] {{ic|lxc@CONTAINER_NAME.service}} to have it start when the host system boots.
 +
 +
{{Warning|See {{Bug|61078}} wherein this service unit is currently broken as shipped and will require a [[Systemd#Drop-in files]] modification to work properly.}}
  
 
Users can also start/stop LXCs without systemd.
 
Users can also start/stop LXCs without systemd.
Line 248: Line 291:
 
==== LXC clones ====
 
==== LXC clones ====
 
Users with a need to run multiple containers can simplify administrative overhead (user management, system updates, etc.) by using snapshots.  The strategy is to setup and keep up-to-date a single base container, then, as needed, clone (snapshot) it.  The power in this strategy is that the disk space and system overhead are truly minimized since the snapshots use an overlayfs mount to only write out to disk, only the differences in data.  The base system is read-only but changes to it in the snapshots are allowed via the overlayfs.
 
Users with a need to run multiple containers can simplify administrative overhead (user management, system updates, etc.) by using snapshots.  The strategy is to setup and keep up-to-date a single base container, then, as needed, clone (snapshot) it.  The power in this strategy is that the disk space and system overhead are truly minimized since the snapshots use an overlayfs mount to only write out to disk, only the differences in data.  The base system is read-only but changes to it in the snapshots are allowed via the overlayfs.
 +
 +
{{Expansion|The note needs a reference.}}
 +
 +
{{Note|overlayfs for unprivileged containers is not supported in the current mainline Arch Linux kernel due to security considerations.}}
  
 
For example, setup a container as outlined above.  We will call it "base" for the purposes of this guide.  Now create 2 snapshots of "base" which we will call "snap1" and "snap2" with these commands:
 
For example, setup a container as outlined above.  We will call it "base" for the purposes of this guide.  Now create 2 snapshots of "base" which we will call "snap1" and "snap2" with these commands:
Line 258: Line 305:
 
  # lxc-destroy -n snap1 -f
 
  # lxc-destroy -n snap1 -f
  
Systemd units and wrapper scripts to manage snapshots for [[pi-hole]] and [[OpenVPN]] are available to automate the process in {{AUR|lxc-snapshots}}.
+
Systemd units and wrapper scripts to manage snapshots for [[pi-hole]] and [[OpenVPN]] are available to automate the process in [https://github.com/graysky2/lxc-service-snapshots lxc-service-snapshots].
  
 
=== Converting a privileged container to an unprivileged container ===
 
=== Converting a privileged container to an unprivileged container ===
Line 307: Line 354:
 
  [newuser@playtime]$ su
 
  [newuser@playtime]$ su
  
===No network-connection with veth in container config===
+
=== No network-connection with veth in container config===
  
 
If you cannot access your LAN or WAN with a networking interface configured as '''veth''' and setup through {{ic|/etc/lxc/''containername''/config}}.
 
If you cannot access your LAN or WAN with a networking interface configured as '''veth''' and setup through {{ic|/etc/lxc/''containername''/config}}.
Line 324: Line 371:
 
  ...
 
  ...
  
And then assign your IP through your preferred method '''inside''' the container, see also [[Network configuration#Configure the IP address]]{{Broken section link}}.
+
And then assign your IP through your preferred method '''inside''' the container, see also [[Network configuration#Network management]].
  
 
=== Error: unknown command ===
 
=== Error: unknown command ===
Line 331: Line 378:
  
 
  # lxc-attach -n ''container_name'' --clear-env
 
  # lxc-attach -n ''container_name'' --clear-env
 +
 +
=== Error: Failed at step KEYRING spawning... ===
 +
 +
Services in an unprivileged container may fail with the following message
 +
 +
some.service: Failed to change ownership of session keyring: Permission denied
 +
some.service: Failed to set up kernel keyring: Permission denied
 +
some.service: Failed at step KEYRING spawning ....: Permission denied
 +
 +
Create a file {{ic|/etc/lxc/unpriv.seccomp}} containing
 +
 +
{{hc|/etc/lxc/unpriv.seccomp|
 +
2
 +
blacklist
 +
[all]
 +
keyctl errno 38}}
 +
 +
Then add the following line to the container configuration '''after''' lxc.idmap
 +
 +
lxc.seccomp.profile = /etc/lxc/unpriv.seccomp
  
 
== See also ==
 
== See also ==
  
 +
* [https://www.stgraber.org/2013/12/20/lxc-1-0-blog-post-series/ LXC 1.0 Blog Post Series]
 
* [https://stgraber.org/2016/03/11/lxd-2-0-blog-post-series-012/ LXD 2.0: Blog post series]
 
* [https://stgraber.org/2016/03/11/lxd-2-0-blog-post-series-012/ LXD 2.0: Blog post series]
 
* [http://www.ibm.com/developerworks/linux/library/l-lxc-containers/ LXC@developerWorks]
 
* [http://www.ibm.com/developerworks/linux/library/l-lxc-containers/ LXC@developerWorks]
* [http://docs.docker.io/en/latest/installation/archlinux/ Docker Installation on ArchLinux]
 
 
* [http://l3net.wordpress.com/tag/lxc/ LXC articles on l3net]
 
* [http://l3net.wordpress.com/tag/lxc/ LXC articles on l3net]

Latest revision as of 15:43, 12 March 2020

Linux Containers (LXC) is an operating-system-level virtualization method for running multiple isolated Linux systems (containers) on a single control host (LXC host). It does not provide a virtual machine, but rather provides a virtual environment that has its own CPU, memory, block I/O, network, etc. space and the resource control mechanism. This is provided by the namespaces and cgroups features in the Linux kernel on the LXC host. It is similar to a chroot, but offers much more isolation.

Alternatives for using containers are systemd-nspawn, docker or rktAUR.

Privileged containers or unprivileged containers

LXCs can be setup to run in either privileged or unprivileged configurations.

In general, running an unprivileged container is considered safer than running a privileged container, since unprivileged containers have an increased degree of isolation by virtue of their design. Key to this is the mapping of the root UID within the container to a non-root UID on the host, which makes it more difficult for a hack inside the container to lead to consequences on the host system. In other words, if an attacker manages to escape the container, he or she should find themselves with limited or no rights on the host.

The Arch linux, linux-lts and linux-zen kernel packages currently provide out-of-the-box support for unprivileged containers. Similarly, with the linux-hardened package, unprivileged containers are only available for the system administrator; with additional kernel configuration changes required, as user namespaces are disabled by default for normal users there. This article contains information for users to run either type of container, but additional steps may be required in order to use unprivileged containers.

An example to illustrate unprivileged containers

To illustrate the power of UID mapping, consider the output below from a running, unprivileged container. Therein, we see the containerized processes owned by the containerized root user in the output of ps:

[root@unprivileged_container /]# ps -ef | head -n 5
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 17:49 ?        00:00:00 /sbin/init
root        14     1  0 17:49 ?        00:00:00 /usr/lib/systemd/systemd-journald
dbus        25     1  0 17:49 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+    26     1  0 17:49 ?        00:00:00 /usr/lib/systemd/systemd-networkd

On the host, however, those containerized root processes are actually shown to be running as the mapped user (ID>100000), rather than the host's actual root user:

[root@host /]# lxc-info -Ssip --name sandbox
State:          RUNNING
PID:            26204
CPU use:        10.51 seconds
BlkIO use:      244.00 KiB
Memory use:     13.09 MiB
KMem use:       7.21 MiB
[root@host /]# ps -ef | grep 26204 | head -n 5
UID        PID  PPID  C STIME TTY          TIME CMD
100000   26204 26200  0 12:49 ?        00:00:00 /sbin/init
100000   26256 26204  0 12:49 ?        00:00:00 /usr/lib/systemd/systemd-journald
100081   26282 26204  0 12:49 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
100000   26284 26204  0 12:49 ?        00:00:00 /usr/lib/systemd/systemd-logind

Setup

Required software

Installing lxc and arch-install-scripts will allow the host system to run privileged lxcs.

Enable support to run unprivileged containers (optional)

Enable the control groups PAM module by modifying /etc/pam.d/system-login to additionally contain the following line:

session optional pam_cgfs.so -c freezer,memory,name=systemd,unified

Secondly, modify /etc/lxc/default.conf to contain the following lines:

lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536

Finally, create both /etc/subuid and /etc/subgid to contain the mapping to the containerized uid/gid pairs for each user who shall be able to run the containers. The example below is simply for the root user (and systemd system unit):

/etc/subuid
root:100000:65536
/etc/subgid
root:100000:65536

Users wishing to run unprivileged containers on linux-hardened or their custom kernel need to complete several additional setup steps.

Firstly, a kernel is required that has support for User Namespaces (a kernel with CONFIG_USER_NS). All Arch Linux kernels have support for CONFIG_USER_NS. However, due to more general security concerns, the linux-hardened kernel does ship with User Namespaces enabled only for the root user. You have two options to create unprivileged containers there:

  • Start your unprivileged containers only as root.
  • Enable the sysctl setting kernel.unprivileged_userns_clone to allow normal users to run unprivileged containers. This can be done for the current session with sysctl kernel.unprivileged_userns_clone=1 and can be made permanent with sysctl.d(5).

Host network configuration

LXCs support different virtual network types and devices (see lxc.container.conf(5)). A bridge device on the host is required for most types of virtual networking which is illustrated in this section.

There are several main setups to consider:

  1. A host bridge
  2. A NAT bridge

The host bridge requires the host's network manager to manage a shared bridge interface. The host and any lxc will be assigned an IP address in the same network (for example 192.168.1.x). This might be more simplistic in cases where the goal is to containerize some network-exposed service like a webserver, or VPN server. The user can think of the lxc as just another PC on the physical LAN, and forward the needed ports in the router accordingly. The added simplicity can also be thought of as an added threat vector, again, if WAN traffic is being forwarded to the lxc, having it running on a separate range presents a smaller threat surface.

The NAT bridge does not require the host's network manager to manage the bridge. lxc ships with lxc-net which creates a NAT bridge called lxcbr0. The NAT bridge is a standalone bridge with a private network that is not bridged to the host's ethernet device or to a physical network. It exists as a private subnet in the host.

Using a host bridge

See Network bridge.

Using a NAT bridge

Install dnsmasq which is a dependency for lxc-net and before starting the bridge, first create a configuration file for it:

/etc/default/lxc-net
# Leave USE_LXC_BRIDGE as "true" if you want to use lxcbr0 for your
# containers.  Set to "false" if you'll use virbr0 or another existing
# bridge, or mavlan to your host's NIC.
USE_LXC_BRIDGE="true"

# If you change the LXC_BRIDGE to something other than lxcbr0, then
# you will also need to update your /etc/lxc/default.conf as well as the
# configuration (/var/lib/lxc/<container>/config) for any containers
# already created using the default config to reflect the new bridge
# name.
# If you have the dnsmasq daemon installed, you'll also have to update
# /etc/dnsmasq.d/lxc and restart the system wide dnsmasq daemon.
LXC_BRIDGE="lxcbr0"
LXC_ADDR="10.0.3.1"
LXC_NETMASK="255.255.255.0"
LXC_NETWORK="10.0.3.0/24"
LXC_DHCP_RANGE="10.0.3.2,10.0.3.254"
LXC_DHCP_MAX="253"
# Uncomment the next line if you'd like to use a conf-file for the lxcbr0
# dnsmasq.  For instance, you can use 'dhcp-host=mail1,10.0.3.100' to have
# container 'mail1' always get ip address 10.0.3.100.
#LXC_DHCP_CONFILE=/etc/lxc/dnsmasq.conf

# Uncomment the next line if you want lxcbr0's dnsmasq to resolve the .lxc
# domain.  You can then add "server=/lxc/10.0.3.1' (or your actual $LXC_ADDR)
# to your system dnsmasq configuration file (normally /etc/dnsmasq.conf,
# or /etc/NetworkManager/dnsmasq.d/lxc.conf on systems that use NetworkManager).
# Once these changes are made, restart the lxc-net and network-manager services.
# 'container1.lxc' will then resolve on your host.
#LXC_DOMAIN="lxc"
Tip: Make sure the bridge's IP range does not interfere with your local network.

Then we need to modify the LXC container template so our containers use our bridge:

/etc/lxc/default.conf
lxc.net.0.type = veth
lxc.net.0.link = lxcbr0
lxc.net.0.flags = up
lxc.net.0.hwaddr = 00:16:3e:xx:xx:xx

Optionally create a configuration file to manually define the IP address of any containers:

/etc/lxc/dnsmasq.conf
dhcp-host=playtime,10.0.3.100

Now start and enable lxc-net.service to create the bridge interface.

Firewall considerations

Since the lxc is running on the 10.0.3.x subnet, access to services such as ssh, httpd, etc. will need to be actively forwarded to the lxc. In principal, the firewall on the host needs to forward traffic incoming traffic on the expected port on the container.

Example iptables rule

The goal of this rule is to allow ssh traffic to the lxc:

# iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 2221 -j DNAT --to-destination 10.0.3.100:22

This rule forwards tcp traffic originating on port 2221 to the IP address of the lxc on port 22.

Note: Make sure to allow traffic on 2221/tcp on the host and to allow 22/tcp traffic on the lxc.

To ssh into the container from another PC on the LAN, one needs to ssh on port 2221 to the host. The host will then forward that traffic to the container.

$ ssh -p 2221 host.lan
Example ufw rule

If using ufw, append the following at the bottom of /etc/ufw/before.rules to make this persistent:

/etc/ufw/before.rules


*nat
:PREROUTING ACCEPT [0:0]
-A PREROUTING -i eth0 -p tcp --dport 2221 -j DNAT --to-destination 10.0.3.101:22
COMMIT
Running containers as non-root user

To create and start containers as a non-root user, extra configuration must be applied.

Create the usernet file under /etc/lxc/lxc-usernet. According to the lxc-usernet man page, the entry per line is:

user type bridge number

Configure the file with the user you want to use to create containers. The bridge will be the same you defined in /etc/default/lxc-net.

A copy of the /etc/lxc/default.conf is needed in the non-root user's home directory, e.g. ~/.config/lxc/default.conf (create the directory if needed).

Running containers as a non-root user requires +x permissions on ~/.local/share/. Make that change with chmod before starting a container.

Container creation

Containers are built using lxc-create. With the release of lxc-3.0.0-1, upstream has deprecated locally stored templates.

To build an Arch container, invoke like this:

# lxc-create -n playtime -t download -- --dist archlinux --release current --arch amd64

For other distros, invoke like this and select options from the supported distros displayed in the list:

# lxc-create -n playtime -t download
Tip: Users may optionally install haveged and start haveged.service to avoid a perceived hang during the setup process while waiting for system entropy to be seeded. Without it, the generation of private/GPG keys can add a lengthy wait to the process.
Tip: Users of Btrfs can append -B btrfs to create a Btrfs subvolume for storing containerized rootfs. This comes in handy if cloning containers with the help of lxc-clone command. ZFS users may use -B zfs, correspondingly.
Note: Users wanting the legacy templates can find them in lxc-templatesAUR or alternatively, users can build their own templates with distrobuilderAUR.

Container configuration

The examples below can be used with privileged and unprivileged containers alike. Note that for unprivileged containers, additional lines will be present by default which are not shown in the examples, including the lxc.idmap = u 0 100000 65536 and the lxc.idmap = g 0 100000 65536 values optionally defined in the #Enable support to run unprivileged containers (optional) section.

Basic config with networking

Note: With the release of lxc-1:2.1.0-1, many of the configuration options have changed. Existing containers need to be updated; users are directed to the table of these changes in the v2.1 release notes.

System resources to be virtualized/isolated when a process is using the container are defined in /var/lib/lxc/CONTAINER_NAME/config. By default, the creation process will make a minimum setup without networking support. Below is an example config with networking supplied by lxc-net.service:

/var/lib/lxc/playtime/config
# Template used to create this container: /usr/share/lxc/templates/lxc-archlinux
# Parameters passed to the template:
# For additional config options, please look at lxc.container.conf(5)

# Distribution configuration
lxc.include = /usr/share/lxc/config/common.conf
lxc.arch = x86_64

# Container specific configuration
lxc.rootfs.path = dir:/var/lib/lxc/playtime/rootfs
lxc.uts.name = playtime

# Network configuration
lxc.net.0.type = veth
lxc.net.0.link = lxcbr0
lxc.net.0.flags = up
lxc.net.0.hwaddr = ee:ec:fa:e9:56:7d

Mounts within the container

For privileged containers, one can select directories on the host to bind mount to the container. This can be advantageous for example if the same architecture is being containerize and one wants to share pacman packages between the host and container. Another example could be shared directories. The syntax is simple:

lxc.mount.entry = /var/cache/pacman/pkg var/cache/pacman/pkg none bind 0 0
Note: This will not work without filesystem permission modifications on the host if using unprivileged containers.

Xorg program considerations (optional)

In order to run programs on the host's display, some bind mounts need to be defined so that the containerized programs can access the host's resources. Add the following section to /var/lib/lxc/playtime/config:

## for xorg
lxc.mount.entry = /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry = /dev/snd dev/snd none bind,optional,create=dir
lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none bind,optional,create=dir,ro
lxc.mount.entry = /dev/video0 dev/video0 none bind,optional,create=file

If you still get a permission denied error in your LXC guest, then you may need to call xhost + in your host to allow the guest to connect to the host's display server. Take note of the security concerns of opening up your display server by doing this. In addition you might need to add the following line before the above bind mount lines.

lxc.mount.entry = tmpfs tmp tmpfs defaults
Note: This will not work if using unprivileged containers.

OpenVPN considerations

Users wishing to run OpenVPN within the container are requested to direct to either OpenVPN (client) in Linux containers and/or OpenVPN (server) in Linux containers.

Managing containers

Basic usage

To list all installed LXC containers:

# lxc-ls -f

Systemd can be used to start and to stop LXCs via lxc@CONTAINER_NAME.service. Enable lxc@CONTAINER_NAME.service to have it start when the host system boots.

Warning: See FS#61078 wherein this service unit is currently broken as shipped and will require a Systemd#Drop-in files modification to work properly.

Users can also start/stop LXCs without systemd. Start a container:

# lxc-start -n CONTAINER_NAME

Stop a container:

# lxc-stop -n CONTAINER_NAME

To login into a container:

# lxc-console -n CONTAINER_NAME

If when login you get pts/0 and lxc/tty1 use:

# lxc-console -n CONTAINER_NAME -t 0

Once logged, treat the container like any other linux system, set the root password, create users, install packages, etc.

To attach to a container:

# lxc-attach -n CONTAINER_NAME --clear-env

It works nearly the same as lxc-console, but you are automatically accessing root prompt inside the container, bypassing login. Without the --clear-env flag, the host will pass its own environment variables into the container (including $PATH, so some commands will not work when the containers are based on another distribution).

Advanced usage

LXC clones

Users with a need to run multiple containers can simplify administrative overhead (user management, system updates, etc.) by using snapshots. The strategy is to setup and keep up-to-date a single base container, then, as needed, clone (snapshot) it. The power in this strategy is that the disk space and system overhead are truly minimized since the snapshots use an overlayfs mount to only write out to disk, only the differences in data. The base system is read-only but changes to it in the snapshots are allowed via the overlayfs.

Tango-view-fullscreen.pngThis article or section needs expansion.Tango-view-fullscreen.png

Reason: The note needs a reference. (Discuss in Talk:Linux Containers#)
Note: overlayfs for unprivileged containers is not supported in the current mainline Arch Linux kernel due to security considerations.

For example, setup a container as outlined above. We will call it "base" for the purposes of this guide. Now create 2 snapshots of "base" which we will call "snap1" and "snap2" with these commands:

# lxc-copy -n base -N snap1 -B overlayfs -s
# lxc-copy -n base -N snap2 -B overlayfs -s
Note: If a static IP was defined for the "base" lxc, that will need to manually changed in the config for "snap1" and for "snap2" before starting them. If the process is to be automated, a script using sed can do this automatically although this is beyond the scope of this wiki section.

The snapshots can be started/stopped like any other container. Users can optionally destroy the snapshots and all new data therein with the following command. Note that the underlying "base" lxc is untouched:

# lxc-destroy -n snap1 -f

Systemd units and wrapper scripts to manage snapshots for pi-hole and OpenVPN are available to automate the process in lxc-service-snapshots.

Converting a privileged container to an unprivileged container

Once the system has been configured to use unprivileged containers (see, #Enable support to run unprivileged containers (optional)), nsexec-bzrAUR contains a utility called uidmapshift which is able to convert an existing privileged container to an unprivileged container to avoid a total rebuild of the image.

Warning:
  • It is recommended to backup the existing image before using this utility!
  • This utility will not shift UIDs and GIDs in ACL, you will need to shift them on your own.

Invoke the utility to convert over like so:

# uidmapshift -b /var/lib/lxc/foo 0 100000 65536

Additional options are available simply by calling uidmapshift without any arguments.

Running Xorg programs

Either attach to or SSH into the target container and prefix the call to the program with the DISPLAY ID of the host's X session. For most simple setups, the display is always 0.

An example of running Firefox from the container in the host's display:

$ DISPLAY=:0 firefox

Alternatively, to avoid directly attaching to or connecting to the container, the following can be used on the host to automate the process:

# lxc-attach -n playtime --clear-env -- sudo -u YOURUSER env DISPLAY=:0 firefox

Troubleshooting

Root login fails

If you get the following error when you try to login using lxc-console:

login: root
Login incorrect

And the container's journalctl shows:

pam_securetty(login:auth): access denied: tty 'pts/0' is not secure !

Add pts/0 to the list of terminal names in /etc/securetty on the container filesystem, see [1]. You can also opt to delete /etc/securetty on the container to allow always root to login, see [2].

Alternatively, create a new user in lxc-attach and use it for logging in to the system, then switch to root.

# lxc-attach -n playtime
[root@playtime]# useradd -m -Gwheel newuser
[root@playtime]# passwd newuser
[root@playtime]# passwd root
[root@playtime]# exit
# lxc-console -n playtime
[newuser@playtime]$ su

No network-connection with veth in container config

If you cannot access your LAN or WAN with a networking interface configured as veth and setup through /etc/lxc/containername/config. If the virtual interface gets the ip assigned and should be connected to the network correctly.

ip addr show veth0 
inet 192.168.1.111/24

You may disable all the relevant static ip formulas and try setting the ip through the booted container-os like you would normaly do.

Example container/config

...
lxc.net.0.type = veth
lxc.net.0.name = veth0
lxc.net.0.flags = up
lxc.net.0.link = bridge
...

And then assign your IP through your preferred method inside the container, see also Network configuration#Network management.

Error: unknown command

The error may happen when you type a basic command (ls, cat, etc.) on an attached container that have different Linux distribution from the host system (e.g. Debian container in Arch Linux host system). When you attach, use the argument --clear-env:

# lxc-attach -n container_name --clear-env

Error: Failed at step KEYRING spawning...

Services in an unprivileged container may fail with the following message

some.service: Failed to change ownership of session keyring: Permission denied
some.service: Failed to set up kernel keyring: Permission denied
some.service: Failed at step KEYRING spawning ....: Permission denied

Create a file /etc/lxc/unpriv.seccomp containing

/etc/lxc/unpriv.seccomp
2
blacklist
[all]
keyctl errno 38

Then add the following line to the container configuration after lxc.idmap

lxc.seccomp.profile = /etc/lxc/unpriv.seccomp

See also