Difference between revisions of "Linux Containers"

From ArchWiki
Jump to: navigation, search
m (Network Configuration: rewrote section regarding LXC Host network configuration, bridge setup, ip forwarding, iptables, sysctl. Added examples with simple empty bridge and internet-sharing bridge configurations)
m (Container setup: moved lxc virtual networks information)
Line 176: Line 176:
  
 
with the template specific options ''-P'' you can add a list of packages to the installation.
 
with the template specific options ''-P'' you can add a list of packages to the installation.
 +
 +
==== Virtual Network Types ====
 +
 +
LXC containers support the following networking types:
 +
* '''empty''' - creates only loopback interface and assigns it to the container.
 +
* '''veth''' - a virtual etherned device is created with one side assigned to the container and other side attached to a bridge on LXC host. If the bridge is not specified, then the veth pair device will be created but not attached to any bridge. Using '''veth''' with '''bridge''' is useful when you want to create virtual networks for LXC containers and LXC host.
 +
* '''macvlan''' - a macvlan interface is created and assigned to the container. macvlan interfaces can only communicate to other macvlan interfaces on the same LXC host. This is useful when you want to create different networks for different LXC containers and you do not need to access LXC containers from LXC host via network.
 +
* '''vlan''' - a vlan interface is linked with the interface specified in container's configuration and is assigned to a the container.
 +
* '''phys''' - an already existing interface is assigned to the container. This is useful when you want to assign a physical network interface to a LXC container.
 +
* '''none''' - will cause container to use host's network namespace.
 +
 +
It is possible to configure container with several network virtualization types at the same time. This wiki page will configure only one at a time for simplicity.
 +
 +
In your container config file, you will need to assign an IP address:
 +
 +
lxc.network.ipv4 = 192.168.100.2/24
 +
 +
When you enter your container, you must set the default gateway to the netctl address, which in this example was 192.168.100.1. In any container including {{Pkg|ip}} the following command will work:
 +
 +
ip route add default via 192.168.100.1
 +
 +
Or on distros such as Ubuntu that use /etc/network:
 +
 +
{{hc|/etc/network/if-up.d/routes|
 +
#! /bin/sh
 +
route add default gw 192.168.100.1
 +
exit 0}}
  
 
==Container configuration==
 
==Container configuration==

Revision as of 22:05, 6 May 2014

Tango-document-new.pngThis article is a stub.Tango-document-new.png

Notes: Some parts of this are dated, ideally this page would be a summary of container tools and discuss LXC, chroot, systemd-nspawn, and docker + the basics required to get each going, with a more detailed subpage on each. (Discuss in Talk:Linux Containers#)

LinuX Containers (LXC) is an operating system-level virtualization method for running multiple isolated Linux systems (containers) on a single control host (LXC host).

LXC does not provide a virtual machine, but rather provides a virtual environment that has its own CPU, memory, block I/O, network etc. space. This is provided by cgroups features in Linux kernel on LXC host. It is similar to a chroot, but offers much more isolation.

Docker is built on top of LXC, enabling easy image management and deployment services for application-specific containers.

This document is intended as an overview on setting up and deploying containers. A certain amount of prerequisite knowledge and skills is required (networking setup, running commands as root, installing packages from AUR, kernel configuration, mounting filesystems etc.).

Setup

Virtualization features for LXC Containers are provided by Linux Kernel and LXC Userspace tools. This section will cover basic information on how to setup LXC capable system.

Packages

The lxc package is available in the official repositories. It provides LXC Userspace tools which are used to manage LXC containers on LXC Host. Install the lxc package from official repositories.

It is also highly recommended to install bridge-utils and netctl which will be useful when configuring different network virtualization types. See also Bridge with netctl.

You can also optionally install OpenVPN, see also OpenVPN Bridge.

LXC depends on the control group filesystem being mounted. The standard location for it is /sys/fs/cgroup. The cgroup filesystem is mounted automatically by systemd.

Depending on which Linux OS you want to install on your container, you might need to install additional packages which are used in container templates. If you plan to create Arch Linux containers, installing arch-install-scripts from the official repositories is enough.

To install other OS containers, you need these OS specific packages:

Testing Setup

Once the lxc package is installed, running lxc-checkconfig will print out a list of your system's capabilities. For correctly configured system the output should be similar to:

$ lxc-checkconfig
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: missing
Network namespace: enabled
Multiple /dev/pts instances: enabled

--- Control groups ---
Cgroup: enabled
Cgroup clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled

--- Misc ---
Veth pair device: enabled
Macvlan: enabled
Vlan: enabled
File capabilities: enabled

If, however lxc-checkconfig command is showing missing components, that would usually mean that your kernel is not properly configured for full LXC support. Linux kernel package from official repositories has LXC support. You can check kernel's LXC configuration before actually booting the kernel by setting CONFIG environment variable to your kernel's config:

$ CONFIG=/path/to/kernel/config /usr/bin/lxc-checkconfig

Network Configuration

This section provides information on required network configuration on LXC host before you create LXC containers.

LXC containers support different virtual network types (see #Virtual Network Types below). For most virtual networking types to work you will need to configure a bridge device on your host. LXC expects br0 interface available during creation of some containers, it will also be used in the examples below with veth networking. The preferred way to setup a Bridge in Arch is with Netctl. Make sure you have netctl package installed:

$ pacamn -Sy netctl

Bridge (Simple)

You can setup an empty bridge if you do NOT need internet access in your LXC containers:

/etc/netctl/lxcbridge
Template error: are you trying to use the = sign? Visit Help:Template#Escape template-breaking characters for workarounds.

Enable lxcbridge and start it:

$ netctl enable lxcbridge
$ netctl start lxcbridge


Note: if you ever need to change configuration of netctl profile you need to reenable it by running netctl reenable lxcbridge for automatic service to pick up the changes. After re-enabling it run netctl restart lxcbridge. For more info consult Netctl page.

Bridge (Internet-shared)

If you need internet connection on your LXC containers or want them to be able to access the network LXC host is on from LXC containers - you can add network interfaces to lxcbridg. In the examples below we add and configure enp3s0 network interface to LXC bridge which has internet access:

Static IP
/etc/netctl/lxcbridge
Template error: are you trying to use the = sign? Visit Help:Template#Escape template-breaking characters for workarounds.
DHCP
/etc/netctl/lxcbridge
Template error: are you trying to use the = sign? Visit Help:Template#Escape template-breaking characters for workarounds.
IP Forwarding

You will also have to enable IP Forwarding on LXC Host:

Template error: are you trying to use the = sign? Visit Help:Template#Escape template-breaking characters for workarounds.

To make changes persist upon reboot:

/etc/sysctl.d/40-ip-forward.conf
Template error: are you trying to use the = sign? Visit Help:Template#Escape template-breaking characters for workarounds.

And also apply this iptables rule (make sure you have iptables package installed):

$ iptables -t nat -A POSTROUTING -o enp3s0 -j MASQUERADE

To make changes persist upon reboot:

$ iptables-save > /etc/iptables/iptables.rules
$ systemctl enable iptables
$ systemctl start iptables

Starting containers on Boot

You can make LXC containers start on boot by activating container specific systemd service:

systemctl enable lxc@CONTAINER_NAME.service

Container setup

Note Configuring a container that runs systemd requires specific configuration that is discussed here.

There are various different means to do this

Creating the filesystem

Bootstrap

Bootstrap an install ( mkarchroot, debootstrap, rinse, Install From Existing Linux ). You can also just copy/use an existing installation’s complete root filesystem.

For example, install a small debian to /home/lxc/debianfs

yaourt -S debootstrap # install debootstrap from AUR
# method 1:
sudo debootstrap wheezy /home/lxc/debianfst http://ftp.us.debian.org/debian  # use us mirror site install wheezy version
# or, method 2:  use faster tar ball method
sudo debootstrap --make-tarball wheezy.packages.tgz sid http://debian.osuosl.org/debian/
sudo debootstrap --unpack-tarball wheezy.packages.tgz wheezy debianfs

Download existing

You can download a base install tar ball. OpenVZ templates work just fine.

Using the lxc tools

/usr/bin/lxc-debian {create|destroy|purge|help}
/usr/bin/lxc-fedora {create|destroy|purge|help}

Nowadays you can create small and simple archlinux container

# lxc-create -n containername -t archlinux -- -P vim,dhclient

with the template specific options -P you can add a list of packages to the installation.

Virtual Network Types

LXC containers support the following networking types:

  • empty - creates only loopback interface and assigns it to the container.
  • veth - a virtual etherned device is created with one side assigned to the container and other side attached to a bridge on LXC host. If the bridge is not specified, then the veth pair device will be created but not attached to any bridge. Using veth with bridge is useful when you want to create virtual networks for LXC containers and LXC host.
  • macvlan - a macvlan interface is created and assigned to the container. macvlan interfaces can only communicate to other macvlan interfaces on the same LXC host. This is useful when you want to create different networks for different LXC containers and you do not need to access LXC containers from LXC host via network.
  • vlan - a vlan interface is linked with the interface specified in container's configuration and is assigned to a the container.
  • phys - an already existing interface is assigned to the container. This is useful when you want to assign a physical network interface to a LXC container.
  • none - will cause container to use host's network namespace.

It is possible to configure container with several network virtualization types at the same time. This wiki page will configure only one at a time for simplicity.

In your container config file, you will need to assign an IP address:

lxc.network.ipv4 = 192.168.100.2/24

When you enter your container, you must set the default gateway to the netctl address, which in this example was 192.168.100.1. In any container including ip the following command will work:

ip route add default via 192.168.100.1

Or on distros such as Ubuntu that use /etc/network:

/etc/network/if-up.d/routes
#! /bin/sh
route add default gw 192.168.100.1
exit 0

Container configuration

Configuration file

The main configuration files are used to describe how to originally create a container. Though these files may be located anywhere, /etc/lxc is probably a good place.

23/Aug/2010: Be aware that the kernel may not handle additional whitespace in the configuration file. This has been experienced on "lxc.cgroup.devices.allow" settings but may also be true on other settings. If in doubt use only one space wherever whitespace is required.

Basic settings

lxc.utsname = $CONTAINER_NAME
lxc.mount = $CONTAINER_FSTAB lxc.rootfs = $CONTAINER_ROOTFS
lxc.network.type = veth lxc.network.flags = up lxc.network.link = br0 lxc.network.hwaddr = $CONTAINER_MACADDR lxc.network.ipv4 = $CONTAINER_IPADDR lxc.network.name = $CONTAINER_DEVICENAME
Basic settings explained

lxc.utsname : This will be the name of the cgroup for the container. Once the container is started, you should be able to see a new folder named /cgroup/$CONTAINER_NAME.

Furthermore, this will also be the value returned by hostname from within the container. Assuming you have not removed access, the container may overwrite this with it's init script.

lxc.mount : This points to an fstab formatted file that is a listing of the mount points used when lxc-start is called. This file is further explained further

Terminal settings

The following configuration is optional. You may add them to your main configuration file if you wish to login via lxc-console, or through a terminal ( e.g.: Ctrl+Alt+F1 ).

The container can be configured with virtual consoles (tty devices). These may be devices from the host that the container is given permission to use (by its configuration file) or they may be devices created locally within the container.

The host's virtual consoles are accessed using the key sequence Alt+Fn (or Ctrl+Alt+Fn from within an X11 session). The left Alt key reaches consoles 1 through 12 and the right Alt key reaches consoles 13 through 24. Further virtual consoles may be reached by the Alt+→ key sequence which steps to the next virtual console.

The container's local virtual consoles may be accessed using the "lxc-console" command.

Host Virtual Consoles

The container may access the host's virtual consoles if the host is not using them and the container's configuration allows it. Typical container configuration would deny access to all devices and then allow access to specific devices like this:

 lxc.cgroup.devices.deny = a          # Deny all access to devices
 lxc.cgroup.devices.allow = c 4:0 rwm # /dev/tty0
 lxc.cgroup.devices.allow = c 4:1 rwm # /dev/tty1
 lxc.cgroup.devices.allow = c 4:2 rwm # /dev/tty2

For a container to be able to use a host's virtual console it must not be in use by the host. This will most likely require the host's /etc/inittab to be modified to ensure no getty or other process runs on any virtual console that is to be used by the container.

After editing the host's /etc/inittab file, issung a killall -HUP init will terminate any getty processes that are no longer configured and this will free up the virtual conosole for use by the container.

Note that local virtual consoles take precedence over host virtual consoles. This is described in the next section.

Local Virtual Consoles

The number of local virtual consoles that the container has is defined in the container's configuration file (normally on the host in /etc/lxc). It is defined thus:

 lxc.tty = n

where n is the number of local virtual consoles required.

The local virtual consoles are numbered starting at tty1 and take precedence over any of the host's virtual consoles that the container might be entitled to use. This means that, for example, if n = 2 then the container will not be able to use the host's tty1 and tty2 devices even entitled to do so by its configuration file. Setting n to 0 will prevent local virtual consoles from being created thus allowing full access to any of host's virtual consoles that the container might be entitled to use.

/dev/tty Device Files

The container must have a tty device file (e.g. /dev/tty1) for each virtual console (host or local). These can be created thus:

# mknod -m 666 /dev/tty1 c 4 1
# mknod -m 666 /dev/tty2 c 4 2

and so on...

In the above, c means character device, 4 is the major device number (tty devices) and 1, 2, 3, etc., is the minor device number (specific tty device). Note that /dev/tty0 is special and always refers to the current virtual console.

For further info on tty devices, read this: http://www.kernel.org/pub/linux/docs/device-list/devices.txt

If a virtual console's device file does not exist in the container, then the container cannot use the virtual console.

Configuring Log-In Ability

The container's virtual consoles may be used for login sessions if the container runs "getty" services on their tty devices. This is normally done by the container's "init" process and is configured in the container's /etc/inittab file using lines like this:

 c1:2345:respawn:/sbin/agetty -8 38400 tty1 linux

There is one line per device. The first part c1 is just a unique label, the second part defines applicable run levels, the third part tells init to start a new getty when the current one terminates and the last part gives the command line for the getty. For further information refer to man init.

If there is no getty process on a virtual console it will not be possible to log in via that virtual console. A getty is not required on a virtual console unless it is to be used to log in.

If a virtual console is to allow root logins it also needs to be listed in the container's /etc/securetty file.

Troubleshooting virtual consoles

If lxc.tty is set to a number, n, then no host devices numbered n or below will be accessible even if the above configuration is present because they will be replaced with local virtual consoles instead.

A tty device file's major number will change from 4 to 136 if it is a local virtual console. This change is visible within the container but not when viewing the container's devices from the host's filesystem. This information is useful when troubleshooting.

This can be checked from within a container thus:

 # ls -Al /dev/tty*
 crw------- 1 root root 136, 10 Aug 21 21:28 /dev/tty1
 crw------- 1 root root   4, 2  Aug 21 21:28 /dev/tty2
Pseudo Terminals
 lxc.pseudo = 1024

Maximum amount of pseudo terminals that may be created in /dev/pts. Currently, assuming the kernel was compiled with CONFIG_DEVPTS_MULTIPLE_INSTANCES, this tells lxc-start to mount the devpts filesystem with the newinstance flag.

Host device access settings

lxc.cgroup.devices.deny = a # Deny all access to devices
lxc.cgroup.devices.allow = c 1:3 rwm # dev/null lxc.cgroup.devices.allow = c 1:5 rwm # dev/zero
lxc.cgroup.devices.allow = c 5:1 rwm # dev/console lxc.cgroup.devices.allow = c 5:0 rwm # dev/tty lxc.cgroup.devices.allow = c 4:0 rwm # dev/tty0
lxc.cgroup.devices.allow = c 1:9 rwm # dev/urandom lxc.cgroup.devices.allow = c 1:8 rwm # dev/random lxc.cgroup.devices.allow = c 136:* rwm # dev/pts/* lxc.cgroup.devices.allow = c 5:2 rwm # dev/pts/ptmx
# No idea what this is .. dev/bsg/0:0:0:0 ??? lxc.cgroup.devices.allow = c 254:0 rwm
Host device access settings explained

lxc.cgroup.devices.deny : By settings this to a, we are stating that the container has access to no devices unless explicitely defined within the configuration file.

Configuration file notes

At runtime /dev/ttyX devices are recreated

If you have enabled multiple DevPTS instances in your kernel, lxc-start will recreate lxc.tty amount of /dev/ttyX devices when it is executed.

This means that you will have lxc.tty amount of pseudo ttys. If you are planning on accessing the container via a "real" terminal (Ctrl+Alt+FX), make sure that it is a number that is inferior to lxc.tty.

To tell whether it has been re-created, just log in to the container via either lxc-console or SSH and perform a ls -Al command on the tty. Devices with a major number of 4 are "real" tty devices whereas a major number of 136 indicates a pts.

Be aware that this is only visible from within the container itself and not from the host.

Containers have access to host's TTY nodes

If you do not properly restrict the container's access to the /dev/tty nodes, the container may have access to the host's.

Taking into consideration that, as previously mentioned, lxc-start recreates lxc.tty amount of /dev/tty devices, any tty nodes present in the container that are of a greater minor number than lxc.tty will be linked to the host's.

To access the container from a host TTY
  1. On the host, verify no getty is started for that tty by checking /etc/inittab.
  2. In the container, start a getty for that tty.
To prevent access to the host TTY

Please have a look at the configuration statements found in host device access settings.

Via the lxc.cgroup.devices.deny = a we are preventing access to all host level devices. And then, throuh lxc.cgroup.devices.allow = c 4:1 rwm we are allowing access to the host's /dev/tty1. In the above example, simply removing all allow statements for major number 4 and minor > 1 should be sufficient.

To test this access

I may be off here, but looking at the output of the ls command below should show you both the major and minor device numbers. These are located after the user and group and represented as : 4, 2

  1. Set lxc.tty to 1
  2. Make there that the container has dev/tty1 and /dev/tty2
  3. lxc-start the container
  4. lxc-console into the container
  5. ls -Al /dev/tty
    crw------- 1 root root 4, 2 Dec 2 00:20 /dev/tty2
  6. echo "test output" > /dev/tty2
  7. Ctrl+Alt+F2 to view the host's second terminal
  8. You should see "test output" printed on the screen

Configuration troubleshooting

console access denied: Permission denied

If, when executing lxc-console, you receive the error lxc-console: console access denied: Permission denied you have most likely either omitted lxc.tty or set it to 0.

lxc-console does not provide a login prompt

Though you are reaching a tty on the container, it most likely is not running a getty. You will want to double check that you have a getty defined in the container's /etc/inittab for the specific tty.

If using systemd chances are that a problem with the getty@.service script will bite you. The script only starts a getty if /dev/tty0 exists. And since this condition is not met in the container, you get no getty. Use this patch, to let lxc-console finally work.

--- /usr/lib/systemd/system/getty@.service.orig 2013-05-30 12:55:28.000000000 +0000
+++ /usr/lib/systemd/system/getty@.service      2013-06-16 23:05:49.827146901 +0000
@@ -20,7 +20,8 @@
 # On systems without virtual consoles, don't start any getty. (Note
 # that serial gettys are covered by serial-getty@.service, not this
 # unit
-ConditionPathExists=/dev/tty0
+ConditionVirtualization=|lxc
+ConditionPathExists=|/dev/tty0
 
 [Service]
 # the VT is cleared by TTYVTDisallocate

For more than one getty you have to explicitly enable the needed service (and decrease lxc.tty in the container configuration) by doing this:

# ln -sf /usr/lib/systemd/system/getty@.service /etc/systemd/system/getty.target.wants/getty@ttyX.service

The ttyX should be replaced by the tty you want to use such as tty2. In the real system a configurable number of getty-services is automatically created from the systemd-logind.service

Configuring fstab

none $CONTAINER_ROOTFS/dev/pts devpts defaults 0 0
none $CONTAINER_ROOTFS/proc    proc   defaults 0 0
none $CONTAINER_ROOTFS/sys     sysfs  defaults 0 0
none $CONTAINER_ROOTFS/dev/shm tmpfs  defaults 0 0

This fstab is used by lxc-start when mounting the container. As such, you can define any mount that would be possible on the host such as bind mounting to the host's own filesystem. However, please be aware of any and all security implications that this may have.

Warning : You certainly do not want to bind mount the host's /dev to the container as this would allow it to, amongst other things, reboot the host.

Container Creation and Destruction

Creation

lxc-create -f $CONTAINER_CONFIGPATH -n $CONTAINER_NAME

lxc-create will create /var/lib/lxc/$CONTAINER_NAME with a new copy of the container configuration file found in $CONTAINER_CONFIGPATH.

As such, if you need to make modifications to the container's configuration file, it's advisable to modify only the original file and then perform lxc-destroy and lxc-create operations afterwards. No data will be lost by doing this.

Note : When copying the file over, lxc-create will strip all comments from the file.

Note : As of lxc-git from atleast 2009-12-01, performing lxc-create no longer splits the config file into multiple files and folders. Therefore, we only have the configuration file to worry about.

Destruction

lxc-destroy -n $CONTAINER_NAME

This will delete /var/lib/lxc/$CONTAINER_NAME which only contains configuration files. No data will be lost.

Readying the host for virtualization

/etc/inittab

  1. Comment out any getty that are not required

/etc/rc.sysinit replacement

Since we are running in a virtual environment, a number of steps undertaken by rc.sysinit are superfluous and may even flat out fail or stall. As such, until the initscripts are made virtualization aware, this will take some hack and slash.

For now, simply replace the file :

#!/bin/bash
# Whatever is needed to clean out old daemon/service pids from your container
rm -f $(find /var/run -name '*pid')
rm -f /var/lock/subsys/*
# Configure network settings ## You can either use dhcp here, manually configure your ## interfaces or try to get the rc.d/network script working. ## There have been reports that network failed in this ## environment. ip route add default via 192.168.10.1 echo > /etc/resolv.conf search your-domain echo >> /etc/resolv.conf nameserver 192.168.10.1
# Initally we do not have any container originated mounts rm -f /etc/mtab touch /etc/mtab

/etc/rc.conf cleanup

You may want to remove any and all hardware related daemons from the DAEMONS line. Furthermore, depending on your situation, you may also want to remove the network daemon.

TBC

Known Problems

Using systemd inside a docker container results in a segfault

See docker github issue, launching /usr/lib/systemd/systemd --system results in a segfault, last tested with systemd 208-10.

Container cannot be shutdown if using systemd

lxc-shutdown should be used for clean shutdown or reboot of the container, but only the reboot is working out of the box when using systemd.

Shutdown will be signalled to the container with SIGPWR but current systemd doesn't have any services in place to handle the sigpwr.target. But for the container we can simply reuse the poweroff.target and get exactly what we want.

# ln -s /usr/lib/systemd/system/poweroff.target ${CONTAINER_RFS}/etc/systemd/system/sigpwr.target

See also