Difference between revisions of "Linux Containers"

From ArchWiki
Jump to navigation Jump to search
(→‎/etc/rc.conf: put content correctly in code view)
m (→‎Running containers as non-root user: remove trivial commands, this is not a shell tutorial)
 
(366 intermediate revisions by 96 users not shown)
Line 1: Line 1:
=Current state of this HowTo=
+
[[Category:Virtualization]]
 +
[[Category:Sandboxing]]
 +
[[ja:Linux Containers]]
 +
[[pt:Linux Containers]]
 +
{{Related articles start}}
 +
{{Related|ABS}}
 +
{{Related|Cgroups}}
 +
{{Related|Docker}}
 +
{{Related|LXD}}
 +
{{Related|OpenVPN}}
 +
{{Related|OpenVPN (client) in Linux containers}}
 +
{{Related|OpenVPN (server) in Linux containers}}
 +
{{Related|PeerGuardian Linux}}
 +
{{Related|systemd-nspawn}}
 +
{{Related articles end}}
  
[[User:Delerious010|Delerious010]] 21:35, 1 December 2009 (EST)
+
[https://linuxcontainers.org/ Linux Containers] (LXC) is an operating-system-level virtualization method for running multiple isolated Linux systems (containers) on a single control host (LXC host). It does not provide a virtual machine, but rather provides a virtual environment that has its own CPU, memory, block I/O, network, etc. space and the resource control mechanism. This is provided by the [[Wikipedia:Linux namespaces|namespaces]] and [[cgroups]] features in the Linux kernel on the LXC host. It is similar to a chroot, but offers much more isolation.
* Currently just a rough draft... I think I'll need to restructure this a bit and I've also noticed I've become a bit too verbose -_-;; I'll be along shortly to complete this as well as clean it up.
 
  
=Introduction=
+
Alternatives for using containers are [[systemd-nspawn]], [[docker]] or {{AUR|rkt}}.
  
==Synopsis==
+
== Privileged containers or unprivileged containers ==
 +
LXCs can be setup to run in either ''privileged'' or ''unprivileged'' configurations.
  
Linux Containers (LXC) are an operating system-level virtualization method for running multiple isolated server installs (containers) on a single control host. LXC does not provide a virtual machine, but rather provides a virtual environment that has its own process and network space. It is similar to a chroot, but offers much more isolation.
+
In general, running an ''unprivileged'' container is [https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-containers considered safer] than running a ''privileged'' container, since ''unprivileged'' containers have an increased degree of isolation by virtue of their design.  Key to this is the mapping of the root UID within the container to a non-root UID on the host, which makes it more difficult for a hack inside the container to lead to consequences on the host system. In other words, if an attacker manages to escape the container, he or she should find themselves with limited or no rights on the host.
  
==About this HowTo==
+
The Arch {{pkg|linux}}, {{pkg|linux-lts}} and {{pkg|linux-zen}} kernel packages currently provide out-of-the-box support for ''unprivileged'' containers. Similarly, with the {{pkg|linux-hardened}} package, ''unprivileged'' containers are only available for the system  administrator; with additional kernel configuration changes required, as user namespaces are disabled by default for normal users there.  This article contains information for users to run either type of container, but additional steps may be required in order to use ''unprivileged'' containers.
  
This document is intended as an overview on setting up and deploying containers, and is not an in depth detailed instruction by instruction guide. A certain amount of prerequisite knowledge and skills are assumed (running commands as root, kernel configuration, mounting filesystems, shell scripting, chroot type environments, networking setup, etc).
+
=== An example to illustrate unprivileged containers ===
  
Much of this was taken verbatim from [http://lxc.teegra.net/ Dwight Schauer], [http://tuxce.selfip.org/informatique/conteneurs-linux-lxc Tuxce] and [http://artisan.karma-lab.net/node/1749 Ulhume]. It has been copied here both to enable to community to share their collective wisdom and to expand on a few points.
+
To illustrate the power of UID mapping, consider the output below from a running, ''unprivileged'' container. Therein, we see the containerized processes owned by the containerized root user in the output of {{ic|ps}}:
  
==Less verbose tutorial==
+
[root@unprivileged_container /]# ps -ef | head -n 5
 +
UID        PID  PPID  C STIME TTY          TIME CMD
 +
root        1    0  0 17:49 ?        00:00:00 /sbin/init
 +
root        14    1  0 17:49 ?        00:00:00 /usr/lib/systemd/systemd-journald
 +
dbus        25    1  0 17:49 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
 +
systemd+    26    1  0 17:49 ?        00:00:00 /usr/lib/systemd/systemd-networkd
  
[[User:Delerious010|Delerious010]] 21:43, 1 December 2009 (EST) I've come to realize I've added a lot of text to this HowTo. If you'd like something more streamlined, please head on over to [http://lxc.teegra.net/ http://lxc.teegra.net/] for Dwight's excellent guide.
+
On the host, however, those containerized root processes are actually shown to be running as the mapped user (ID>100000), rather than the host's actual root user:
 +
[root@host /]# lxc-info -Ssip --name sandbox
 +
State:          RUNNING
 +
PID:            26204
 +
CPU use:        10.51 seconds
 +
BlkIO use:      244.00 KiB
 +
Memory use:     13.09 MiB
 +
KMem use:      7.21 MiB
  
=Kernel configuration=
+
[root@host /]# ps -ef | grep 26204 | head -n 5
 +
UID        PID  PPID  C STIME TTY          TIME CMD
 +
100000  26204 26200  0 12:49 ?        00:00:00 /sbin/init
 +
100000  26256 26204  0 12:49 ?        00:00:00 /usr/lib/systemd/systemd-journald
 +
100081  26282 26204  0 12:49 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
 +
100000  26284 26204  0 12:49 ?        00:00:00 /usr/lib/systemd/systemd-logind
  
==Through the GUI==
+
== Setup ==
 +
=== Required software ===
 +
Installing {{Pkg|lxc}} and {{Pkg|arch-install-scripts}} will allow the host system to run privileged lxcs.
  
General Setup
+
==== Enable support to run unprivileged containers (optional) ====
* [*] Group CPU scheduler
+
Users wishing to run ''unprivileged'' containers on {{pkg|linux-hardened}} or their custom kernel need to complete several additional setup steps.
** [*] Group scheduling for SCHED_OTHER
 
** [*] Group scheduling for SCHED_RR/FIFO
 
** Basis for grouping tasks (Control groups)
 
*** [*] Control groups
 
  
* [*] Control Group support
+
Firstly, a kernel is required that has support for User Namespaces (a kernel with {{ic|CONFIG_USER_NS}}). All Arch Linux kernels have support for {{ic|CONFIG_USER_NS}}. However, due to more general security concerns, the {{pkg|linux-hardened}} kernel does ship with User Namespaces enabled only for the ''root'' user. You have two options to create ''unprivileged'' containers there:
** [*] Namespace cgroup subsystem
 
** [*] Freezer cgroup subsystem
 
** [*] Device controller for cgroups
 
** [*] Cpuset support
 
*** [*] Include legacy /proc/ /cpuset file
 
** [*] Simple CPU accounting cgroup subsystem
 
** [*] Resource counters
 
*** [*] Memory Resource Controller for Control Groups
 
**** [*] Memory Resource Controller Swap Extension(EXPERIMENTAL)
 
  
* [*] Namespace support
+
* Start your unprivileged containers only as ''root''.
** [*] UTS namespace 
+
* Enable the ''sysctl'' setting {{ic|kernel.unprivileged_userns_clone}} to allow normal users to run unprivileged containers. This can be done for the current session with {{ic|1=sysctl kernel.unprivileged_userns_clone=1}} and can be made permanent with {{man|5|sysctl.d}}.
** [*] IPC namespace 
 
** [*] User namespace (EXPERIMENTAL)
 
** [*] PID Namespaces (EXPERIMENTAL) 
 
** [*] Network namespace
 
  
Networking support
+
Enable the [[control groups]] [[PAM]] module by modifying {{ic|/etc/pam.d/system-login}} to additionally contain the following line:
* Networking options
+
session optional pam_cgfs.so -c freezer,memory,name=systemd,unified
** [*] QoS and/or fair queueing
 
*** [*] Control Group Classifier
 
  
Device drivers
+
Secondly, modify {{ic|/etc/lxc/default.conf}} to contain the following lines:
* Character devices
+
lxc.idmap = u 0 100000 65536
** [*] Unix98 pty support
+
lxc.idmap = g 0 100000 65536
*** [*] Support multiple instances of devpts
 
  
Security options
+
Finally, create both {{ic|/etc/subuid}} and {{ic|/etc/subgid}} to contain the mapping to the containerized uid/gid pairs for each user who shall be able to run the containers.  The example below is simply for the root user (and systemd system unit):
* [*] File POSIX Capabilities
 
  
==Through the .config==
+
{{hc|/etc/subuid|
 +
root:100000:65536
 +
}}
  
CONFIG_GROUP_SCHED=y
+
{{hc|/etc/subgid|
CONFIG_FAIR_GROUP_SCHED=y
+
root:100000:65536
CONFIG_RT_GROUP_SCHED=y
+
}}
CONFIG_CGROUP_SCHED=y
 
CONFIG_CGROUPS=y
 
CONFIG_CGROUP_NS=y
 
CONFIG_CGROUP_FREEZER=y
 
CONFIG_CGROUP_DEVICE=y
 
CONFIG_CPUSETS=y
 
CONFIG_PROC_PID_CPUSET=y
 
CONFIG_CGROUP_CPUACCT=y
 
CONFIG_RESOURCE_COUNTERS=y
 
CONFIG_CGROUP_MEM_RES_CTLR=y
 
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
 
CONFIG_MM_OWNER=y
 
CONFIG_NAMESPACES=y
 
CONFIG_UTS_NS=y
 
CONFIG_IPC_NS=y
 
CONFIG_USER_NS=y
 
CONFIG_PID_NS=y
 
CONFIG_NET_NS=y
 
CONFIG_NET_CLS_CGROUP=y
 
CONFIG_SECURITY_FILE_CAPABILITIES=y
 
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
 
  
==Testing capabilities==
+
=== Host network configuration ===
 +
LXCs support different virtual network types and devices (see {{man|5|lxc.container.conf}}). A bridge device on the host is required for most types of virtual networking which is illustrated in this section. 
  
Once the lxc package is installed, running lxc-checkconfig will print out a list of your system's capabilities
+
There are several main setups to consider:
 +
# A host bridge
 +
# A NAT bridge
  
=Host configuration=
+
The host bridge requires the host's network manager to manage a shared bridge interface.  The host and any lxc will be assigned an IP address in the same network (for example 192.168.1.x).  This might be more simplistic in cases where the goal is to containerize some network-exposed service like a webserver, or VPN server.  The user can think of the lxc as just another PC on the physical LAN, and forward the needed ports in the router accordingly.  The added simplicity can also be thought of as an added threat vector, again, if WAN traffic is being forwarded to the lxc, having it running on a separate range presents a smaller threat surface.
  
==Control group filesystem==
+
The NAT bridge does not require the host's network manager to manage the bridge.  {{pkg|lxc}} ships with {{ic|lxc-net}} which creates a NAT bridge called {{ic|lxcbr0}}.  The NAT bridge is a standalone bridge with a private network that is not bridged to the host's ethernet device or to a physical network. It exists as a private subnet in the host.
  
LXC depends on the control group filesystem being mounted. At present, there exists no standard location for it. As such, you're free to create it where ever you see fit.
+
==== Using a host bridge ====
 +
See [[Network bridge]].
  
===Mounting manually===
+
==== Using a NAT bridge ====
mkdir /cgroup
 
mount -t cgroup none /cgroup
 
  
===In /etc/fstab===
+
[[Install]] {{pkg|dnsmasq}} which is a dependency for {{ic|lxc-net}} and before starting the bridge, first create a configuration file for it:
none /cgroup cgroup defaults 0 0
 
  
==Userspace tools==
+
{{hc|/etc/default/lxc-net|
 +
2=# Leave USE_LXC_BRIDGE as "true" if you want to use lxcbr0 for your
 +
# containers.  Set to "false" if you'll use virbr0 or another existing
 +
# bridge, or mavlan to your host's NIC.
 +
USE_LXC_BRIDGE="true"
  
Both ''lxc'' and ''lxc-git'' can be found in [[Arch User Repository|AUR]]. Package name:
+
# If you change the LXC_BRIDGE to something other than lxcbr0, then
lxc-git
+
# you will also need to update your /etc/lxc/default.conf as well as the
 +
# configuration (/var/lib/lxc/<container>/config) for any containers
 +
# already created using the default config to reflect the new bridge
 +
# name.
 +
# If you have the dnsmasq daemon installed, you'll also have to update
 +
# /etc/dnsmasq.d/lxc and restart the system wide dnsmasq daemon.
 +
LXC_BRIDGE="lxcbr0"
 +
LXC_ADDR="10.0.3.1"
 +
LXC_NETMASK="255.255.255.0"
 +
LXC_NETWORK="10.0.3.0/24"
 +
LXC_DHCP_RANGE="10.0.3.2,10.0.3.254"
 +
LXC_DHCP_MAX="253"
 +
# Uncomment the next line if you'd like to use a conf-file for the lxcbr0
 +
# dnsmasq.  For instance, you can use 'dhcp-host=mail1,10.0.3.100' to have
 +
# container 'mail1' always get ip address 10.0.3.100.
 +
#LXC_DHCP_CONFILE=/etc/lxc/dnsmasq.conf
  
==Bridge device setup==
+
# Uncomment the next line if you want lxcbr0's dnsmasq to resolve the .lxc
 +
# domain.  You can then add "server=/lxc/10.0.3.1' (or your actual $LXC_ADDR)
 +
# to your system dnsmasq configuration file (normally /etc/dnsmasq.conf,
 +
# or /etc/NetworkManager/dnsmasq.d/lxc.conf on systems that use NetworkManager).
 +
# Once these changes are made, restart the lxc-net and network-manager services.
 +
# 'container1.lxc' will then resolve on your host.
 +
#LXC_DOMAIN="lxc"
 +
}}
  
===/etc/conf.d/bridges===
+
{{Tip| Make sure the bridge's IP range does not interfere with your local network.}}
bridge_br0="eth0"
 
config_br0="brctl setfd br0 0"
 
BRIDGE_INTERFACES=(br0)
 
  
===/etc/rc.conf===
+
Then we need to modify the LXC container template so our containers use our bridge:
MODULES=(... bridge ...) # YMMV, but this was not required
 
eth0="eth0 0.0.0.0 up" # I had to do the 0.0.0.0, "eth0 up" was not sufficient.
 
br0="dhcp" # or however you set your address
 
INTERFACES=(eth0 br0)
 
  
===Bridge forward delay===
+
{{hc|/etc/lxc/default.conf|
 +
2=lxc.net.0.type = veth
 +
lxc.net.0.link = lxcbr0
 +
lxc.net.0.flags = up
 +
lxc.net.0.hwaddr = 00:16:3e:xx:xx:xx
 +
}}
  
In order for br0 to dhcp quickly (and for container network devices to be available quickly) one must set the forward delay of the bridge device to zero.
+
Optionally create a configuration file to manually define the IP address of any containers:
brctl setfd br0 0
+
{{hc|/etc/lxc/dnsmasq.conf|
 +
2=dhcp-host=playtime,10.0.3.100
 +
}}
  
===Patch for /etc/rc.d/network===
+
Now [[start]] and [[enable]] {{ic|lxc-net.service}} to create the bridge interface.
  
This is required to use the above mentioned ''config_br0'' statement as of ''initscripts 2009.08-1''.
+
===== Firewall considerations =====
--- network.0 2009-10-13 13:05:40.924603683 -0500
+
Since the lxc is running on the 10.0.3.x subnet, access to services such as ssh, httpd, etc. will need to be actively forwarded to the lxcIn principal, the firewall on the host needs to forward traffic incoming traffic on the expected port on the container.  
+++ network 2009-10-13 13:18:59.534523717 -0500
 
  @@ -172,6 +172,15 @@
 
          /usr/sbin/brctl addif $br $brif || error=1
 
        fi
 
      done
 
  +    eval brconfig="\$config_${br}"
 
+    if [ -n "${brconfig}" ]; then
 
+      if ${brconfig}; then
 
+        true
 
+      else
 
+        echo config_${br}=\"${brconfig}\" \<-- invalid  configuration statement
 
+        error=1
 
+      fi
 
+    fi
 
    fi
 
  done
 
  }
 
See also: [http://bugs.archlinux.org/task/16625 FS#16625]
 
  
=Container setup=
+
====== Example iptables rule ======
  
There are various different means to do this
+
The goal of this rule is to allow ssh traffic to the lxc:
 +
# iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 2221 -j DNAT --to-destination 10.0.3.100:22
  
==Creating the filesystem==
+
This rule forwards tcp traffic originating on port 2221 to the IP address of the lxc on port 22.
 +
{{Note|Make sure to allow traffic on 2221/tcp on the host and to allow 22/tcp traffic on the lxc.}}
  
===Bootstrap===
+
To ssh into the container from another PC on the LAN, one needs to ssh on port 2221 to the host. The host will then forward that traffic to the container.
Bootstrap an install ( [http://blog.mudy.info/tag/mkarchroot/ mkarchroot], [http://wiki.debian.org/Debootstrap debootstrap], [http://www.xen-tools.org/software/rinse/faq.html rinse], [[Install From Existing Linux]] ). You can also just copy/use an existing installation’s complete root filesystem.
 
  
===Download existing===
+
$ ssh -p 2221 host.lan
You can download a base install tar ball. OpenVZ templates work just fine.
 
  
===Using the lxc tools===
+
====== Example ufw rule ======
/usr/bin/lxc-debian {create|destroy|purge|help}
+
If using {{pkg|ufw}}, append the following at the bottom of {{ic|/etc/ufw/before.rules}} to make this persistent:
/usr/bin/lxc-fedora {create|destroy|purge|help}
 
  
==Creating the device nodes==
+
{{hc|/etc/ufw/before.rules|
  
Since ''udev'' does not work within the container, you'll want to make sure that a certain minimum amount of devices is created for it. This may be done with the following script :
 
#!/bin/bash
 
ROOT=$(pwd)
 
DEV=${ROOT}/dev
 
mv ${DEV} ${DEV}.old
 
mkdir -p ${DEV}
 
mknod -m 666 ${DEV}/null c 1 3
 
mknod -m 666 ${DEV}/zero c 1 5
 
mknod -m 666 ${DEV}/random c 1 8
 
mknod -m 666 ${DEV}/urandom c 1 9
 
mkdir -m 755 ${DEV}/pts
 
mkdir -m 1777 ${DEV}/shm
 
mknod -m 666 ${DEV}/tty c 5 0
 
mknod -m 600 ${DEV}/console c 5 1
 
mknod -m 666 ${DEV}/tty0 c 4 0
 
mknod -m 666 ${DEV}/full c 1 7
 
mknod -m 600 ${DEV}/initctl p
 
mknod -m 666 ${DEV}/ptmx c 5 2
 
  
=Container configuration=
+
*nat
 +
:PREROUTING ACCEPT [0:0]
 +
-A PREROUTING -i eth0 -p tcp --dport 2221 -j DNAT --to-destination 10.0.3.101:22
 +
COMMIT
 +
}}
  
==Configuration file==
+
===== Running containers as non-root user =====
 +
To create and start containers as a non-root user, extra configuration must be applied.
  
The main configuration files are used to describe how to originally create a container. Though these files may be located anywhere, /etc/lxc is probably a good place.
+
Create the usernet file under {{ic|/etc/lxc/lxc-usernet}}. According to the {{ic|lxc-usernet}} man page, the entry per line is:
  
'''23/Aug/2010: Be aware that the kernel may not handle additional whitespace in the configuration file. This has been experienced on "lxc.cgroup.devices.allow" settings but may also be true on other settings. If in doubt use only one space wherever whitespace is required.'''
+
user type bridge number
  
===Basic settings===
+
Configure the file with the user you want to use to create containers. The bridge will be the same you defined in {{ic|/etc/default/lxc-net}}.
  
lxc.utsname = $CONTAINER_NAME<br>
+
A copy of the {{ic|/etc/lxc/default.conf}} is needed in the non-root user's home directory, e.g. {{ic|~/.config/lxc/default.conf}} (create the directory if needed).
lxc.mount = $CONTAINER_FSTAB
 
lxc.rootfs = $CONTAINER_ROOTFS<br>
 
lxc.network.type = veth
 
lxc.network.flags = up
 
lxc.network.link = br0
 
lxc.network.hwaddr = $CONTAINER_MACADDR
 
lxc.network.ipv4 = $CONTAINER_IPADDR
 
lxc.network.name = $CONTAINER_DEVICENAME
 
  
====Basic settings explained====
+
Running containers as a non-root user requires {{ic|+x}} permissions on {{ic|~/.local/share/}}. Make that change with [[chmod]] before starting a container.
  
'''lxc.utsname''' : This will be the name of the cgroup for the container. Once the container is started, you should be able to see a new folder named ''/cgroup/$CONTAINER_NAME''.
+
=== Container creation ===
 +
Containers are built using {{ic|lxc-create}}. With the release of lxc-3.0.0-1, upstream has deprecated locally stored templates.
  
Furthermore, this will also be the value returned by ''hostname'' from within the container. Assuming you've not removed access, the container may overwrite this with it's init script.
+
To build an Arch container, invoke like this:
 +
# lxc-create -n playtime -t download -- --dist archlinux --release current --arch amd64
  
'''lxc.mount''' : This points to an fstab formatted file that is a listing of the mount points used when ''lxc-start'' is called. This file is further explained [[#Configuring fstab|further]]
+
For other distros, invoke like this and select options from the supported distros displayed in the list:
 +
# lxc-create -n playtime -t download
  
===Terminal settings===
+
{{Tip|Users may optionally install {{Pkg|haveged}} and [[start]] {{ic|haveged.service}} to avoid a perceived hang during the setup process while waiting for system entropy to be seeded.  Without it, the generation of private/GPG keys can add a lengthy wait to the process.}}
  
The following configuration is optional. You may add them to your main configuration file if you wish to login via lxc-console, or through a terminal ( ex.: Ctrl+Alt+F1 ).
+
{{Tip|Users of [[Btrfs]] can append {{ic|-B btrfs}} to create a Btrfs subvolume for storing containerized rootfs. This comes in handy if cloning containers with the help of {{ic|lxc-clone}} command. [[ZFS]] users may use {{ic|-B zfs}}, correspondingly.}}
  
The container can be configured with virtual consoles (tty devices). These may be devices from the host that the container is given permission to use (by its configuration file) or they may be devices created locally within the container.
+
{{Note|Users wanting the legacy templates can find them in {{AUR|lxc-templates}} or alternatively, users can build their own templates with {{AUR|distrobuilder}}.}}
  
The host's virtual consoles are accessed using the key sequence ALT+Fn (or CTRL+ALT+Fn from within an X11 session). The left ALT key reaches consoles 1 through 12 and the right ALT key reaches consoles 13 through 24. Further virtual consoles may be reached by the ALT+Right Arrow key sequence which steps to the next virtual console.
+
=== Container configuration ===
 +
The examples below can be used with ''privileged'' and ''unprivileged'' containers alike.  Note that for unprivileged containers, additional lines will be present by default which are not shown in the examples, including the {{ic|1=lxc.idmap = u 0 100000 65536}} and the {{ic|1=lxc.idmap = g 0 100000 65536}} values optionally defined in the [[#Enable support to run unprivileged containers (optional)]] section.
  
The container's local virtual consoles may be accessed using the "lxc-console" command.
+
==== Basic config with networking ====
 +
{{Note|With the release of lxc-1:2.1.0-1, many of the configuration options have changed.  Existing containers need to be updated; users are directed to the table of these changes in the [https://discuss.linuxcontainers.org/t/lxc-2-1-has-been-released/487 v2.1 release notes].}}
  
==== Host Virtual Consoles ====
+
System resources to be virtualized/isolated when a process is using the container are defined in {{ic|/var/lib/lxc/CONTAINER_NAME/config}}. By default, the creation process will make a minimum setup without networking support.  Below is an example config with networking supplied by {{ic|lxc-net.service}}:
  
The container may access the host's virtual consoles if the host is not using them and the container's configuration allows it. Typical container configuration would deny access to all devices and then allow access to specific devices like this:
+
{{hc|/var/lib/lxc/playtime/config|<nowiki>
 +
# Template used to create this container: /usr/share/lxc/templates/lxc-archlinux
 +
# Parameters passed to the template:
 +
# For additional config options, please look at lxc.container.conf(5)
  
  lxc.cgroup.devices.deny = a          # Deny all access to devices
+
# Distribution configuration
  lxc.cgroup.devices.allow = c 4:0 rwm # /dev/tty0
+
lxc.include = /usr/share/lxc/config/common.conf
  lxc.cgroup.devices.allow = c 4:1 rwm # /dev/tty1
+
lxc.arch = x86_64
  lxc.cgroup.devices.allow = c 4:2 rwm # /dev/tty2
 
  
For a container to be able to use a host's virtual console it must not be in use by the host. This will most likely require the host's /etc/inittab to be modified to ensure no getty or other process runs on any virtual console that is to be used by the container.
+
# Container specific configuration
 +
lxc.rootfs.path = dir:/var/lib/lxc/playtime/rootfs
 +
lxc.uts.name = playtime
  
After editing the host's /etc/inittab file, issung a "killall -HUP init" will terminate any getty processes that are no longer configured and this will free up the virtual conosole for use by the container.
+
# Network configuration
 +
lxc.net.0.type = veth
 +
lxc.net.0.link = lxcbr0
 +
lxc.net.0.flags = up
 +
lxc.net.0.hwaddr = ee:ec:fa:e9:56:7d
 +
</nowiki>}}
  
Note that local virtual consoles take precedence over host virtual consoles. This is described in the next section.
+
==== Mounts within the container ====
 +
For ''privileged'' containers, one can select directories on the host to bind mount to the container. This can be advantageous for example if the same architecture is being containerize and one wants to share pacman packages between the host and container. Another example could be shared directories.  The syntax is simple:
  
==== Local Virtual Consoles ====
+
lxc.mount.entry = /var/cache/pacman/pkg var/cache/pacman/pkg none bind 0 0
  
The number of local virtual consoles that the container has is defined in the container's configuration file (normally on the host in /etc/lxc). It is defined thus:
+
{{Note|This will not work without filesystem permission modifications on the host if using ''unprivileged'' containers.}}
 +
==== Xorg program considerations (optional) ====
 +
In order to run programs on the host's display, some bind mounts need to be defined so that the containerized programs can access the host's resources.  Add the following section to {{ic|/var/lib/lxc/playtime/config}}:
 +
## for xorg
 +
lxc.mount.entry = /dev/dri dev/dri none bind,optional,create=dir
 +
lxc.mount.entry = /dev/snd dev/snd none bind,optional,create=dir
 +
lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none bind,optional,create=dir,ro
 +
lxc.mount.entry = /dev/video0 dev/video0 none bind,optional,create=file
  
  lxc.tty = n
+
If you still get a permission denied error in your LXC guest, then you may need to call {{ic|xhost +}} in your host to allow the guest to connect to the host's display server. Take note of the security concerns of opening up your display server by doing this.
 +
In addition you might need to add the following line '''before''' the above bind mount lines.
 +
lxc.mount.entry = tmpfs tmp tmpfs defaults
  
where n is the number of local virtual consoles required.
+
{{Note|This will not work if using ''unprivileged'' containers.}}
  
The local virtual consoles are numbered starting at tty1 and take precedence over any of the host's virtual consoles that the container might be entitled to use. This means that, for example, if n = 2 then the container will not be able to use the host's tty1 and tty2 devices even entitled to do so by its configuration file. Setting n to 0 will prevent local virtual consoles from being created thus allowing full access to any of host's virtual consoles that the container might be entitled to use.
+
==== OpenVPN considerations ====
  
==== /dev/tty Device Files ====
+
Users wishing to run [[OpenVPN]] within the container are requested to direct to either [[OpenVPN (client) in Linux containers]] and/or [[OpenVPN (server) in Linux containers]].
  
The container must have a tty device file (e.g. /dev/tty1) for each virtual console (host or local). These can be created thus:
+
== Managing containers ==
 +
=== Basic usage ===
 +
To list all installed LXC containers:
 +
# lxc-ls -f
  
    # mknod -m 666 /dev/tty1 c 4 1
+
Systemd can be used to [[start]] and to [[stop]] LXCs via {{ic|lxc@CONTAINER_NAME.service}}.  [[Enable]] {{ic|lxc@CONTAINER_NAME.service}} to have it start when the host system boots.
    # mknod -m 666 /dev/tty2 c 4 2
 
  
and so on...
+
{{Warning|See {{Bug|61078}} wherein this service unit is currently broken as shipped and will require a [[Systemd#Drop-in files]] modification to work properly.}}
  
In the above, c means character device, 4 is the major device number (tty devices) and 1,2,3, etc, is the minor device number (specific tty device). Note that tty0 is special and always refers to the current virtual console.
+
Users can also start/stop LXCs without systemd.
 +
Start a container:
 +
# lxc-start -n CONTAINER_NAME
  
For further info on tty devices: http://www.kernel.org/pub/linux/docs/device-list/devices.txt
+
Stop a container:
 +
# lxc-stop -n CONTAINER_NAME
  
'''If a virtual console's device file does not exist in the container then the container can not use the virtual console.'''
+
To login into a container:
 +
# lxc-console -n CONTAINER_NAME
  
==== Configuring Log-In Ability ====
+
If when login you get pts/0 and lxc/tty1 use:
 +
# lxc-console -n CONTAINER_NAME -t 0
  
The container's virtual consoles may be used for login sessions if the container runs "getty" services on their tty devices. This is normally done by the container's "init" process and is configured in the container's "/etc/inittab" file using lines like this:
+
Once logged, treat the container like any other linux system, set the root password, create users, install packages, etc.
  
  c1:2345:respawn:/sbin/agetty -8 38400 tty1 linux
+
To attach to a container:
 +
# lxc-attach -n CONTAINER_NAME --clear-env
  
There is one line per device. The first part "c1" is just a unique label, the second part defines applicable run levels, the third part tells init to start a new getty when the current one terminates and the last part gives the command line for the getty. For further information refer to "man init"
+
It works nearly the same as lxc-console, but you are automatically accessing root prompt inside the container, bypassing login. Without the {{ic| --clear-env}} flag, the host will pass its own environment variables into the container (including {{ic|$PATH}}, so some commands will not work when the containers are based on another distribution).
  
If there is no getty process on a virtual console it will not be possible to log in via that virtual console. A getty is not required on a virtual console unless it is to be used to log in.
+
=== Advanced usage ===
  
If a virtual console is to allow root logins it also needs to be listed in the container's /etc/securetty file.
+
==== LXC clones ====
 +
Users with a need to run multiple containers can simplify administrative overhead (user management, system updates, etc.) by using snapshots.  The strategy is to setup and keep up-to-date a single base container, then, as needed, clone (snapshot) it.  The power in this strategy is that the disk space and system overhead are truly minimized since the snapshots use an overlayfs mount to only write out to disk, only the differences in data.  The base system is read-only but changes to it in the snapshots are allowed via the overlayfs.
  
==== Troubleshooting virtual consoles ====
+
{{Expansion|The note needs a reference.}}
  
If lxc.tty is set to a number, n, then no host devices numbered n or below will be accessible even if the above configuration is present because they will be replaced with local virtual consoles instead.
+
{{Note|overlayfs for unprivileged containers is not supported in the current mainline Arch Linux kernel due to security considerations.}}
  
A tty device file's major number will change from 4 to 136 if it is a local virtual console. This change is visible within the container but not when viewing the container's devices from the host's filesystem. This information is useful when troubleshooting.
+
For example, setup a container as outlined above.  We will call it "base" for the purposes of this guide. Now create 2 snapshots of "base" which we will call "snap1" and "snap2" with these commands:
 +
# lxc-copy -n base -N snap1 -B overlayfs -s
 +
# lxc-copy -n base -N snap2 -B overlayfs -s
  
This can be checked from within a container thus:
+
{{Note|If a static IP was defined for the "base" lxc, that will need to manually changed in the config for "snap1" and for "snap2" before starting them. If the process is to be automated, a script using sed can do this automatically although this is beyond the scope of this wiki section.}}
  
  # ls -Al /dev/tty*
+
The snapshots can be started/stopped like any other container.  Users can optionally destroy the snapshots and all new data therein with the following command.  Note that the underlying "base" lxc is untouched:
  crw------- 1 root root 136, 10 Aug 21 21:28 /dev/tty1
+
# lxc-destroy -n snap1 -f
  crw------- 1 root root  4, 2  Aug 21 21:28 /dev/tty2
 
  
==== Pseudo Terminals ====
+
Systemd units and wrapper scripts to manage snapshots for [[pi-hole]] and [[OpenVPN]] are available to automate the process in [https://github.com/graysky2/lxc-service-snapshots lxc-service-snapshots].
  
  lxc.pseudo = 1024
+
=== Converting a privileged container to an unprivileged container ===
 +
Once the system has been configured to use unprivileged containers (see, [[#Enable support to run unprivileged containers (optional)]]), {{AUR|nsexec-bzr}} contains a utility called {{ic|uidmapshift}} which is able to convert an existing ''privileged'' container to an ''unprivileged'' container to avoid a total rebuild of the image.
  
Maximum amount of pseudo terminals that are may be created in /dev/pts. [[User:Delerious010|Delerious010]] 18:57, 3 December 2009 (EST) Currently, assuming the kernel was compiled with CONFIG_DEVPTS_MULTIPLE_INSTANCES, this tells lxc-start to mount the devpts filesystem with the newinstance flag.
+
{{Warning|
 +
* It is recommended to backup the existing image before using this utility!
 +
* This utility will not shift UIDs and GIDs in [[ACL]], you will need to shift them on your own.
 +
}}
  
===Host device access settings===
+
Invoke the utility to convert over like so:
 +
# uidmapshift -b /var/lib/lxc/foo 0 100000 65536
  
lxc.cgroup.devices.deny = a # Deny all access to devices<br>
+
Additional options are available simply by calling {{ic|uidmapshift}} without any arguments.
lxc.cgroup.devices.allow = c 1:3 rwm # dev/null
 
lxc.cgroup.devices.allow = c 1:5 rwm # dev/zero<br>
 
lxc.cgroup.devices.allow = c 5:1 rwm # dev/console
 
lxc.cgroup.devices.allow = c 5:0 rwm # dev/tty
 
lxc.cgroup.devices.allow = c 4:0 rwm # dev/tty0<br>
 
lxc.cgroup.devices.allow = c 1:9 rwm # dev/urandom
 
lxc.cgroup.devices.allow = c 1:8 rwm # dev/random
 
lxc.cgroup.devices.allow = c 136:* rwm # dev/pts/*
 
lxc.cgroup.devices.allow = c 5:2 rwm # dev/pts/ptmx<br>
 
# No idea what this is .. dev/bsg/0:0:0:0 ???
 
lxc.cgroup.devices.allow = c 254:0 rwm
 
  
====Host device access settings explained====
+
== Running Xorg programs ==
 +
Either attach to or [[SSH]] into the target container and prefix the call to the program with the DISPLAY ID of the host's X session.  For most simple setups, the display is always 0.
  
'''lxc.cgroup.devices.deny''' : By settings this to ''a'', we're stating that the container has access to no devices unless explicitely defined within the configuration file.
+
An example of running Firefox from the container in the host's display:
 +
$ DISPLAY=:0 firefox
  
==Configuration file notes==
+
Alternatively, to avoid directly attaching to or connecting to the container, the following can be used on the host to automate the process:
===At runtime /dev/ttyX devices are recreated===
+
# lxc-attach -n playtime --clear-env -- sudo -u YOURUSER env DISPLAY=:0 firefox
  
If you've enabled multiple DevPTS instances in your kernel, lxc-start will recreate ''lxc.tty'' amount of ''/dev/ttyX'' devices when it is executed.
+
== Troubleshooting ==
  
This means that you will have ''lxc.tty'' amount of pseudo ttys. If you're planning on accessing the container via a "real" terminal ( Ctrl+Alt+FX ), make sure that it's a number that is inferior to ''lxc.tty''.
+
=== Root login fails ===
  
To tell whether it's been re-created, just log into the container via either lxc-console or ssh and perform a ''ls -Al'' command on the tty. Devices with a major number of 4 are "real" tty devices where as a major number of 136 indicates a pts.
+
If you get the following error when you try to login using lxc-console:
  
Be aware that this is only visible from within the container itself and not from the host.
+
login: root
 +
Login incorrect
  
===Containers have access to host's TTY nodes===
+
And the container's {{ic|journalctl}} shows:
  
If you do not properly restrict the container's access to the /dev/tty nodes, the container may have access to the host's.
+
pam_securetty(login:auth): access denied: tty 'pts/0' is not secure !
  
Taking into consideration that, as previously mentioned, lxc-start recreates ''lxc.tty'' amount of /dev/tty devices, any tty nodes present in the container that are of a greater minor number than ''lxc.tty'' will be linked to the host's.
+
Add {{ic|pts/0}} to the list of terminal names in {{ic|/etc/securetty}} on the '''container''' filesystem, see [http://unix.stackexchange.com/questions/41840/effect-of-entries-in-etc-securetty/41939#41939]. You can also opt to delete {{ic|/etc/securetty}} on the '''container''' to allow always root to login, see [https://github.com/systemd/systemd/issues/852].
  
====To access the container from a host TTY====
+
Alternatively, create a new user in lxc-attach and use it for logging in to the system, then switch to root.
  
# On the host, verify no getty is started for that tty by checking ''/etc/inittab''.
+
# lxc-attach -n playtime
# In the container, start a getty for that tty.
+
[root@playtime]# useradd -m -Gwheel newuser
 +
[root@playtime]# passwd newuser
 +
[root@playtime]# passwd root
 +
[root@playtime]# exit
 +
# lxc-console -n playtime
 +
[newuser@playtime]$ su
  
====To prevent access to the host TTY====
+
=== No network-connection with veth in container config===
  
Please have a look at the configuration statements found in [[#Host device access settings|host device access settings]].
+
If you cannot access your LAN or WAN with a networking interface configured as '''veth''' and setup through {{ic|/etc/lxc/''containername''/config}}.
 +
If the virtual interface gets the ip assigned and should be connected to the network correctly.
 +
ip addr show veth0
 +
inet 192.168.1.111/24
 +
You may disable all the relevant static ip formulas and try setting the ip through the booted container-os like you would normaly do.
  
Via the ''lxc.cgroup.devices.deny = a'' we're preventing access to all host level devices. And then, throuh ''lxc.cgroup.devices.allow = c 4:'''1''' rwm'' we're allowing access to the host's /dev/tty'''1'''. In the above example, simply removing all allow statements for major number 4 and minor > 1 should be sufficient.
+
Example {{ic|''container''/config}}
  
====To test this access====
+
...
 +
lxc.net.0.type = veth
 +
lxc.net.0.name = veth0
 +
lxc.net.0.flags = up
 +
lxc.net.0.link = {{ic|bridge}}
 +
...
  
I may be off here, but looking at the output of the ''ls'' command below should show you both the ''major'' and ''minor'' device numbers. These are located after the user and group and represented as : 4, 2
+
And then assign your IP through your preferred method '''inside''' the container, see also [[Network configuration#Network management]].
  
# Set lxc.tty to 1
+
=== Error: unknown command ===
# Make there that the container has dev/tty1 and /dev/tty2
 
# ''lxc-start'' the container
 
# ''lxc-console'' into the container
 
# ''ls -Al /dev/tty''<br>crw------- 1 root root 4, 2 Dec  2 00:20 /dev/tty2
 
# ''echo "test output" > /dev/tty2''
 
# ''Ctrl+Alt+F2'' to view the host's second terminal
 
# You should see "test output" printed on the screen
 
  
===Configuration troubleshooting===
+
The error may happen when you type a basic command (''ls'', ''cat'', etc.) on an attached container that have different Linux distribution from the host system (e.g. Debian container in Arch Linux host system). When you attach, use the argument {{ic|--clear-env}}:
  
====console access denied: Permission denied====
+
# lxc-attach -n ''container_name'' --clear-env
  
If, when executing lxc-console, you receive the error ''lxc-console: console access denied: Permission denied'' you've most likely either omitted lxc.tty or set it to 0.
+
=== Error: Failed at step KEYRING spawning... ===
  
====lxc-console does not provide a login prompt====
+
Services in an unprivileged container may fail with the following message
  
Though you're reaching a tty on the container, it most likely is not running a getty. You'll want to double check that you have a getty defined in the container's ''/etc/inittab'' for the specific tty.
+
some.service: Failed to change ownership of session keyring: Permission denied
 +
some.service: Failed to set up kernel keyring: Permission denied
 +
some.service: Failed at step KEYRING spawning ....: Permission denied
  
==Configuring fstab==
+
Create a file {{ic|/etc/lxc/unpriv.seccomp}} containing
none $CONTAINER_ROOTFS/dev/pts devpts defaults 0 0
 
none $CONTAINER_ROOTFS/proc    proc  defaults 0 0
 
none $CONTAINER_ROOTFS/sys    sysfs  defaults 0 0
 
none $CONTAINER_ROOTFS/dev/shm tmpfs  defaults 0 0
 
  
This fstab is used by lxc-start when mounting the container. As such, you can define any mount that would be possible on the host such as bind mounting to the host's own filesystem. However, please be aware of any and all security implications that this may have.
+
{{hc|/etc/lxc/unpriv.seccomp|
 +
2
 +
blacklist
 +
[all]
 +
keyctl errno 38}}
  
'''Warning''' : You certainly do not want to bind mount the host's /dev to the container as this would allow it to, amongst other things, reboot the host.
+
Then add the following line to the container configuration '''after''' lxc.idmap
  
=Container Creation and Destruction=
+
lxc.seccomp.profile = /etc/lxc/unpriv.seccomp
  
==Creation==
+
== See also ==
lxc-create -f $CONTAINER_CONFIGPATH -n $CONTAINER_NAME
 
  
''lxc-create'' will create /var/lib/lxc/$CONTAINER_NAME with a new copy of the container configuration file found in $CONTAINER_CONFIGPATH.
+
* [https://www.stgraber.org/2013/12/20/lxc-1-0-blog-post-series/ LXC 1.0 Blog Post Series]
 
+
* [https://stgraber.org/2016/03/11/lxd-2-0-blog-post-series-012/ LXD 2.0: Blog post series]
As such, if you need to make modifications to the container's configuration file, it's advisable to modify only the original file and then perform ''lxc-destroy'' and ''lxc-create'' operations afterwards. No data will be lost by doing this.
+
* [http://www.ibm.com/developerworks/linux/library/l-lxc-containers/ LXC@developerWorks]
 
+
* [http://l3net.wordpress.com/tag/lxc/ LXC articles on l3net]
'''Note''' : When copying the file over, lxc-create will strip all comments from the file.
 
 
 
'''Note''' : As of lxc-git from atleast ''2009-12-01'', performing lxc-create no longer splits the config file into multiple files and folders. Therefore, we only have the configuration file to worry about.
 
 
 
==Destruction==
 
lxc-destroy -n $CONTAINER_NAME
 
 
 
This will delete /var/lib/lxc/$CONTAINER_NAME which only contains configuration files. No data will be lost.
 
 
 
=Readying the host for virtualization=
 
==/etc/inittab==
 
# Comment out any getty that are not required
 
 
 
==/etc/rc.sysinit replacement==
 
Since we're running in a virtual environment, a number of steps undertaken by rc.sysinit are superfluous and may even flat out fail or stall. As such, until the initscripts are made virtualization aware, this'll take some hack and slash.
 
 
 
For now, simply replace the file :
 
#!/bin/bash
 
# Whatever is needed to clean out old daemon/service pids from your container
 
rm -f $(find /var/run -name '*pid')
 
rm -f /var/lock/subsys/*<br>
 
# Configure network settings
 
## You can either use dhcp here, manually configure your
 
## interfaces or try to get the rc.d/network script working.
 
## There have been reports that network failed in this
 
## environment.
 
route add default gw 192.168.10.1
 
echo > /etc/resolv.conf search your-domain
 
echo >> /etc/resolv.conf nameserver 192.168.10.1<br>
 
# Initally we don't have any container originated mounts
 
rm -f /etc/mtab
 
touch /etc/mtab
 
 
 
==/etc/rc.conf cleanup==
 
You may want to remove any and all hardware related daemons from the DAEMONS line. Furthermore, depending on your situation, you may also want to remove the ''network'' daemon.
 
 
 
==TBC==
 

Latest revision as of 07:42, 11 January 2020

Linux Containers (LXC) is an operating-system-level virtualization method for running multiple isolated Linux systems (containers) on a single control host (LXC host). It does not provide a virtual machine, but rather provides a virtual environment that has its own CPU, memory, block I/O, network, etc. space and the resource control mechanism. This is provided by the namespaces and cgroups features in the Linux kernel on the LXC host. It is similar to a chroot, but offers much more isolation.

Alternatives for using containers are systemd-nspawn, docker or rktAUR.

Privileged containers or unprivileged containers

LXCs can be setup to run in either privileged or unprivileged configurations.

In general, running an unprivileged container is considered safer than running a privileged container, since unprivileged containers have an increased degree of isolation by virtue of their design. Key to this is the mapping of the root UID within the container to a non-root UID on the host, which makes it more difficult for a hack inside the container to lead to consequences on the host system. In other words, if an attacker manages to escape the container, he or she should find themselves with limited or no rights on the host.

The Arch linux, linux-lts and linux-zen kernel packages currently provide out-of-the-box support for unprivileged containers. Similarly, with the linux-hardened package, unprivileged containers are only available for the system administrator; with additional kernel configuration changes required, as user namespaces are disabled by default for normal users there. This article contains information for users to run either type of container, but additional steps may be required in order to use unprivileged containers.

An example to illustrate unprivileged containers

To illustrate the power of UID mapping, consider the output below from a running, unprivileged container. Therein, we see the containerized processes owned by the containerized root user in the output of ps:

[root@unprivileged_container /]# ps -ef | head -n 5
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 17:49 ?        00:00:00 /sbin/init
root        14     1  0 17:49 ?        00:00:00 /usr/lib/systemd/systemd-journald
dbus        25     1  0 17:49 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+    26     1  0 17:49 ?        00:00:00 /usr/lib/systemd/systemd-networkd

On the host, however, those containerized root processes are actually shown to be running as the mapped user (ID>100000), rather than the host's actual root user:

[root@host /]# lxc-info -Ssip --name sandbox
State:          RUNNING
PID:            26204
CPU use:        10.51 seconds
BlkIO use:      244.00 KiB
Memory use:     13.09 MiB
KMem use:       7.21 MiB
[root@host /]# ps -ef | grep 26204 | head -n 5
UID        PID  PPID  C STIME TTY          TIME CMD
100000   26204 26200  0 12:49 ?        00:00:00 /sbin/init
100000   26256 26204  0 12:49 ?        00:00:00 /usr/lib/systemd/systemd-journald
100081   26282 26204  0 12:49 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
100000   26284 26204  0 12:49 ?        00:00:00 /usr/lib/systemd/systemd-logind

Setup

Required software

Installing lxc and arch-install-scripts will allow the host system to run privileged lxcs.

Enable support to run unprivileged containers (optional)

Users wishing to run unprivileged containers on linux-hardened or their custom kernel need to complete several additional setup steps.

Firstly, a kernel is required that has support for User Namespaces (a kernel with CONFIG_USER_NS). All Arch Linux kernels have support for CONFIG_USER_NS. However, due to more general security concerns, the linux-hardened kernel does ship with User Namespaces enabled only for the root user. You have two options to create unprivileged containers there:

  • Start your unprivileged containers only as root.
  • Enable the sysctl setting kernel.unprivileged_userns_clone to allow normal users to run unprivileged containers. This can be done for the current session with sysctl kernel.unprivileged_userns_clone=1 and can be made permanent with sysctl.d(5).

Enable the control groups PAM module by modifying /etc/pam.d/system-login to additionally contain the following line:

session optional pam_cgfs.so -c freezer,memory,name=systemd,unified

Secondly, modify /etc/lxc/default.conf to contain the following lines:

lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536

Finally, create both /etc/subuid and /etc/subgid to contain the mapping to the containerized uid/gid pairs for each user who shall be able to run the containers. The example below is simply for the root user (and systemd system unit):

/etc/subuid
root:100000:65536
/etc/subgid
root:100000:65536

Host network configuration

LXCs support different virtual network types and devices (see lxc.container.conf(5)). A bridge device on the host is required for most types of virtual networking which is illustrated in this section.

There are several main setups to consider:

  1. A host bridge
  2. A NAT bridge

The host bridge requires the host's network manager to manage a shared bridge interface. The host and any lxc will be assigned an IP address in the same network (for example 192.168.1.x). This might be more simplistic in cases where the goal is to containerize some network-exposed service like a webserver, or VPN server. The user can think of the lxc as just another PC on the physical LAN, and forward the needed ports in the router accordingly. The added simplicity can also be thought of as an added threat vector, again, if WAN traffic is being forwarded to the lxc, having it running on a separate range presents a smaller threat surface.

The NAT bridge does not require the host's network manager to manage the bridge. lxc ships with lxc-net which creates a NAT bridge called lxcbr0. The NAT bridge is a standalone bridge with a private network that is not bridged to the host's ethernet device or to a physical network. It exists as a private subnet in the host.

Using a host bridge

See Network bridge.

Using a NAT bridge

Install dnsmasq which is a dependency for lxc-net and before starting the bridge, first create a configuration file for it:

/etc/default/lxc-net
# Leave USE_LXC_BRIDGE as "true" if you want to use lxcbr0 for your
# containers.  Set to "false" if you'll use virbr0 or another existing
# bridge, or mavlan to your host's NIC.
USE_LXC_BRIDGE="true"

# If you change the LXC_BRIDGE to something other than lxcbr0, then
# you will also need to update your /etc/lxc/default.conf as well as the
# configuration (/var/lib/lxc/<container>/config) for any containers
# already created using the default config to reflect the new bridge
# name.
# If you have the dnsmasq daemon installed, you'll also have to update
# /etc/dnsmasq.d/lxc and restart the system wide dnsmasq daemon.
LXC_BRIDGE="lxcbr0"
LXC_ADDR="10.0.3.1"
LXC_NETMASK="255.255.255.0"
LXC_NETWORK="10.0.3.0/24"
LXC_DHCP_RANGE="10.0.3.2,10.0.3.254"
LXC_DHCP_MAX="253"
# Uncomment the next line if you'd like to use a conf-file for the lxcbr0
# dnsmasq.  For instance, you can use 'dhcp-host=mail1,10.0.3.100' to have
# container 'mail1' always get ip address 10.0.3.100.
#LXC_DHCP_CONFILE=/etc/lxc/dnsmasq.conf

# Uncomment the next line if you want lxcbr0's dnsmasq to resolve the .lxc
# domain.  You can then add "server=/lxc/10.0.3.1' (or your actual $LXC_ADDR)
# to your system dnsmasq configuration file (normally /etc/dnsmasq.conf,
# or /etc/NetworkManager/dnsmasq.d/lxc.conf on systems that use NetworkManager).
# Once these changes are made, restart the lxc-net and network-manager services.
# 'container1.lxc' will then resolve on your host.
#LXC_DOMAIN="lxc"
Tip: Make sure the bridge's IP range does not interfere with your local network.

Then we need to modify the LXC container template so our containers use our bridge:

/etc/lxc/default.conf
lxc.net.0.type = veth
lxc.net.0.link = lxcbr0
lxc.net.0.flags = up
lxc.net.0.hwaddr = 00:16:3e:xx:xx:xx

Optionally create a configuration file to manually define the IP address of any containers:

/etc/lxc/dnsmasq.conf
dhcp-host=playtime,10.0.3.100

Now start and enable lxc-net.service to create the bridge interface.

Firewall considerations

Since the lxc is running on the 10.0.3.x subnet, access to services such as ssh, httpd, etc. will need to be actively forwarded to the lxc. In principal, the firewall on the host needs to forward traffic incoming traffic on the expected port on the container.

Example iptables rule

The goal of this rule is to allow ssh traffic to the lxc:

# iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 2221 -j DNAT --to-destination 10.0.3.100:22

This rule forwards tcp traffic originating on port 2221 to the IP address of the lxc on port 22.

Note: Make sure to allow traffic on 2221/tcp on the host and to allow 22/tcp traffic on the lxc.

To ssh into the container from another PC on the LAN, one needs to ssh on port 2221 to the host. The host will then forward that traffic to the container.

$ ssh -p 2221 host.lan
Example ufw rule

If using ufw, append the following at the bottom of /etc/ufw/before.rules to make this persistent:

/etc/ufw/before.rules


*nat
:PREROUTING ACCEPT [0:0]
-A PREROUTING -i eth0 -p tcp --dport 2221 -j DNAT --to-destination 10.0.3.101:22
COMMIT
Running containers as non-root user

To create and start containers as a non-root user, extra configuration must be applied.

Create the usernet file under /etc/lxc/lxc-usernet. According to the lxc-usernet man page, the entry per line is:

user type bridge number

Configure the file with the user you want to use to create containers. The bridge will be the same you defined in /etc/default/lxc-net.

A copy of the /etc/lxc/default.conf is needed in the non-root user's home directory, e.g. ~/.config/lxc/default.conf (create the directory if needed).

Running containers as a non-root user requires +x permissions on ~/.local/share/. Make that change with chmod before starting a container.

Container creation

Containers are built using lxc-create. With the release of lxc-3.0.0-1, upstream has deprecated locally stored templates.

To build an Arch container, invoke like this:

# lxc-create -n playtime -t download -- --dist archlinux --release current --arch amd64

For other distros, invoke like this and select options from the supported distros displayed in the list:

# lxc-create -n playtime -t download
Tip: Users may optionally install haveged and start haveged.service to avoid a perceived hang during the setup process while waiting for system entropy to be seeded. Without it, the generation of private/GPG keys can add a lengthy wait to the process.
Tip: Users of Btrfs can append -B btrfs to create a Btrfs subvolume for storing containerized rootfs. This comes in handy if cloning containers with the help of lxc-clone command. ZFS users may use -B zfs, correspondingly.
Note: Users wanting the legacy templates can find them in lxc-templatesAUR or alternatively, users can build their own templates with distrobuilderAUR.

Container configuration

The examples below can be used with privileged and unprivileged containers alike. Note that for unprivileged containers, additional lines will be present by default which are not shown in the examples, including the lxc.idmap = u 0 100000 65536 and the lxc.idmap = g 0 100000 65536 values optionally defined in the #Enable support to run unprivileged containers (optional) section.

Basic config with networking

Note: With the release of lxc-1:2.1.0-1, many of the configuration options have changed. Existing containers need to be updated; users are directed to the table of these changes in the v2.1 release notes.

System resources to be virtualized/isolated when a process is using the container are defined in /var/lib/lxc/CONTAINER_NAME/config. By default, the creation process will make a minimum setup without networking support. Below is an example config with networking supplied by lxc-net.service:

/var/lib/lxc/playtime/config
# Template used to create this container: /usr/share/lxc/templates/lxc-archlinux
# Parameters passed to the template:
# For additional config options, please look at lxc.container.conf(5)

# Distribution configuration
lxc.include = /usr/share/lxc/config/common.conf
lxc.arch = x86_64

# Container specific configuration
lxc.rootfs.path = dir:/var/lib/lxc/playtime/rootfs
lxc.uts.name = playtime

# Network configuration
lxc.net.0.type = veth
lxc.net.0.link = lxcbr0
lxc.net.0.flags = up
lxc.net.0.hwaddr = ee:ec:fa:e9:56:7d

Mounts within the container

For privileged containers, one can select directories on the host to bind mount to the container. This can be advantageous for example if the same architecture is being containerize and one wants to share pacman packages between the host and container. Another example could be shared directories. The syntax is simple:

lxc.mount.entry = /var/cache/pacman/pkg var/cache/pacman/pkg none bind 0 0
Note: This will not work without filesystem permission modifications on the host if using unprivileged containers.

Xorg program considerations (optional)

In order to run programs on the host's display, some bind mounts need to be defined so that the containerized programs can access the host's resources. Add the following section to /var/lib/lxc/playtime/config:

## for xorg
lxc.mount.entry = /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry = /dev/snd dev/snd none bind,optional,create=dir
lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none bind,optional,create=dir,ro
lxc.mount.entry = /dev/video0 dev/video0 none bind,optional,create=file

If you still get a permission denied error in your LXC guest, then you may need to call xhost + in your host to allow the guest to connect to the host's display server. Take note of the security concerns of opening up your display server by doing this. In addition you might need to add the following line before the above bind mount lines.

lxc.mount.entry = tmpfs tmp tmpfs defaults
Note: This will not work if using unprivileged containers.

OpenVPN considerations

Users wishing to run OpenVPN within the container are requested to direct to either OpenVPN (client) in Linux containers and/or OpenVPN (server) in Linux containers.

Managing containers

Basic usage

To list all installed LXC containers:

# lxc-ls -f

Systemd can be used to start and to stop LXCs via lxc@CONTAINER_NAME.service. Enable lxc@CONTAINER_NAME.service to have it start when the host system boots.

Warning: See FS#61078 wherein this service unit is currently broken as shipped and will require a Systemd#Drop-in files modification to work properly.

Users can also start/stop LXCs without systemd. Start a container:

# lxc-start -n CONTAINER_NAME

Stop a container:

# lxc-stop -n CONTAINER_NAME

To login into a container:

# lxc-console -n CONTAINER_NAME

If when login you get pts/0 and lxc/tty1 use:

# lxc-console -n CONTAINER_NAME -t 0

Once logged, treat the container like any other linux system, set the root password, create users, install packages, etc.

To attach to a container:

# lxc-attach -n CONTAINER_NAME --clear-env

It works nearly the same as lxc-console, but you are automatically accessing root prompt inside the container, bypassing login. Without the --clear-env flag, the host will pass its own environment variables into the container (including $PATH, so some commands will not work when the containers are based on another distribution).

Advanced usage

LXC clones

Users with a need to run multiple containers can simplify administrative overhead (user management, system updates, etc.) by using snapshots. The strategy is to setup and keep up-to-date a single base container, then, as needed, clone (snapshot) it. The power in this strategy is that the disk space and system overhead are truly minimized since the snapshots use an overlayfs mount to only write out to disk, only the differences in data. The base system is read-only but changes to it in the snapshots are allowed via the overlayfs.

Tango-view-fullscreen.pngThis article or section needs expansion.Tango-view-fullscreen.png

Reason: The note needs a reference. (Discuss in Talk:Linux Containers#)
Note: overlayfs for unprivileged containers is not supported in the current mainline Arch Linux kernel due to security considerations.

For example, setup a container as outlined above. We will call it "base" for the purposes of this guide. Now create 2 snapshots of "base" which we will call "snap1" and "snap2" with these commands:

# lxc-copy -n base -N snap1 -B overlayfs -s
# lxc-copy -n base -N snap2 -B overlayfs -s
Note: If a static IP was defined for the "base" lxc, that will need to manually changed in the config for "snap1" and for "snap2" before starting them. If the process is to be automated, a script using sed can do this automatically although this is beyond the scope of this wiki section.

The snapshots can be started/stopped like any other container. Users can optionally destroy the snapshots and all new data therein with the following command. Note that the underlying "base" lxc is untouched:

# lxc-destroy -n snap1 -f

Systemd units and wrapper scripts to manage snapshots for pi-hole and OpenVPN are available to automate the process in lxc-service-snapshots.

Converting a privileged container to an unprivileged container

Once the system has been configured to use unprivileged containers (see, #Enable support to run unprivileged containers (optional)), nsexec-bzrAUR contains a utility called uidmapshift which is able to convert an existing privileged container to an unprivileged container to avoid a total rebuild of the image.

Warning:
  • It is recommended to backup the existing image before using this utility!
  • This utility will not shift UIDs and GIDs in ACL, you will need to shift them on your own.

Invoke the utility to convert over like so:

# uidmapshift -b /var/lib/lxc/foo 0 100000 65536

Additional options are available simply by calling uidmapshift without any arguments.

Running Xorg programs

Either attach to or SSH into the target container and prefix the call to the program with the DISPLAY ID of the host's X session. For most simple setups, the display is always 0.

An example of running Firefox from the container in the host's display:

$ DISPLAY=:0 firefox

Alternatively, to avoid directly attaching to or connecting to the container, the following can be used on the host to automate the process:

# lxc-attach -n playtime --clear-env -- sudo -u YOURUSER env DISPLAY=:0 firefox

Troubleshooting

Root login fails

If you get the following error when you try to login using lxc-console:

login: root
Login incorrect

And the container's journalctl shows:

pam_securetty(login:auth): access denied: tty 'pts/0' is not secure !

Add pts/0 to the list of terminal names in /etc/securetty on the container filesystem, see [1]. You can also opt to delete /etc/securetty on the container to allow always root to login, see [2].

Alternatively, create a new user in lxc-attach and use it for logging in to the system, then switch to root.

# lxc-attach -n playtime
[root@playtime]# useradd -m -Gwheel newuser
[root@playtime]# passwd newuser
[root@playtime]# passwd root
[root@playtime]# exit
# lxc-console -n playtime
[newuser@playtime]$ su

No network-connection with veth in container config

If you cannot access your LAN or WAN with a networking interface configured as veth and setup through /etc/lxc/containername/config. If the virtual interface gets the ip assigned and should be connected to the network correctly.

ip addr show veth0 
inet 192.168.1.111/24

You may disable all the relevant static ip formulas and try setting the ip through the booted container-os like you would normaly do.

Example container/config

...
lxc.net.0.type = veth
lxc.net.0.name = veth0
lxc.net.0.flags = up
lxc.net.0.link = bridge
...

And then assign your IP through your preferred method inside the container, see also Network configuration#Network management.

Error: unknown command

The error may happen when you type a basic command (ls, cat, etc.) on an attached container that have different Linux distribution from the host system (e.g. Debian container in Arch Linux host system). When you attach, use the argument --clear-env:

# lxc-attach -n container_name --clear-env

Error: Failed at step KEYRING spawning...

Services in an unprivileged container may fail with the following message

some.service: Failed to change ownership of session keyring: Permission denied
some.service: Failed to set up kernel keyring: Permission denied
some.service: Failed at step KEYRING spawning ....: Permission denied

Create a file /etc/lxc/unpriv.seccomp containing

/etc/lxc/unpriv.seccomp
2
blacklist
[all]
keyctl errno 38

Then add the following line to the container configuration after lxc.idmap

lxc.seccomp.profile = /etc/lxc/unpriv.seccomp

See also