Kubernetes (aka. k8s) is an open-source system for automating the deployment, scaling, and management of containerized applications.
A k8s cluster consists of its control-plane components and node components (each representing one or more host machines running a container runtime and
kubelet.service). There are two options to install kubernetes, "the real one", described here, and a local install with k3s, kind, or minikube.
Install deployment tools
When bootstrapping a Kubernetes cluster with
kubeadm, install kubeadm and kubelet on each node.
When manually creating a Kubernetes cluster install etcdAUR and the package group kubernetes-control-plane (for a control-plane node) and kubernetes-node (for a worker node).
To control a kubernetes cluster, install kubectl on the control-plane hosts and any external host that is supposed to be able to interact with the cluster.
Install container runtime
Both control-plane and regular worker nodes require a container runtime for their
kubelet instances which is used for hosting containers.
Install either containerd or cri-o to meet this dependency.
Install and configure prerequisites
To setup forwarding IPv4 and letting iptables see bridged traffic, execute the below instructions:
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf overlay br_netfilter EOF sudo modprobe overlay sudo modprobe br_netfilter # sysctl params required by setup, params persist across reboots cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 EOF # Apply sysctl params without reboot sudo sysctl --system
(Optionally) verify that the
br_netfilter, overlay modules are loaded by running the following commands:
lsmod | grep br_netfilter lsmod | grep overlay
(Optionally) verify that the
net.ipv4.ip_forward system variables are set to 1 in your
sysctl config by running the following command:
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
Refer to official document for more details.
To install a rootless
containerd, use nerdctl-full-binAUR, which is nerdctl full pkg, bundle with containerd/CNI plugin/RootlessKit:
containerd-rootless-setuptool.sh install sudo systemctl enable --now containerd.service
Configuring the systemd cgroup driver
systemd cgroup driver
To use the
systemd cgroup driver in
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] ... [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] SystemdCgroup = true
/etc/containerd/config.toml doesn't exists, The default configuration can be generated via
containerd config default > /etc/containerd/config.toml
Remember to restart
containerd.service to make the change take effect.
See this official document for a deeper discussion on whether to keep
cgroupfs driver or use
systemd cgroup driver.
(Optional) Install package manager
helm is a tool for managing pre-configured Kubernetes resources which may be helpful for getting started.
All nodes in a cluster (control-plane and worker) require a running instance of
All provided systemd services accept CLI overrides in environment files:
Disable swap on the host, as
kubelet.service will otherwise fail to start.
Specify pod CIDR range
The pod CIDR addresses refer to the IP address range that is assigned to pods within a Kubernetes cluster. When pods are scheduled to run on nodes in the cluster, they are assigned IP addresses from this CIDR range.
The pod CIDR range is specified when deploying a Kubernetes cluster and is confined within the cluster network. It should not overlap with other IP ranges used within the cluster, such as the service CIDR range.
The networking setup for the cluster has to be configured for the respective container runtime. This can be done using cni-plugins.
kubeadm init phase Pass the virtual network's CIDR to
kubeadm init with e.g.
Specify container runtime
The container runtime has to be configured and started, before
kubelet.service can make use of it.
When using containerd as container runtime, it is required to provide
kubeadm init or
kubeadm join with its CRI endpoint. To do so, specify their flag
For example a complete
kubeadm init using containerd as CRI endpoint looks like:
kubeadm init --pod-network-cidr=10.85.0.0/16 --cri-socket /run/containerd/containerd.sock
kubeadm join using containerd as CRI endpoint looks like:
kubeadm join --token <token> <control-plane-host>:<control-plane-port> --discovery-token-ca-cert-hash sha256:<hash> --cri-socket=/run/containerd/containerd.sock
When using CRI-O as container runtime, it is requiredto provide
kubeadm init or
kubeadm join with its CRI endpoint:
/etc/crio/crio.conf). This is not compatible with kubelet's default (
cgroupfs) when using kubelet < v1.22.
Change kubelet's default by appending
--cgroup-driver='systemd' to the
KUBELET_ARGS environment variable in
/etc/kubernetes/kubelet.env upon first start (i.e. before using
Note that the
KUBELET_EXTRA_ARGS variable, used by older versions is now no longer read by the default
When kubeadm updates from 1.19.x to 1.20.x, then it should be possible to use https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file as explained on https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-control-plane-node, as in https://github.com/cri-o/cri-o/pull/4440/files, instead of the above. (TBC, untested.)
After the node has been configured, the CLI flag could (but does not have to) be replaced by a configuration entry for
(Optional) Use cilium's replacement for
When use cilium as CNI plugin set
kubeadm init flag
--skip-phases=addon/kube-proxy to provision a Kubernetes cluster without
kube-proxy. Cilium will install a full replacement during installation phase.
See this for details.
Before creating a new kubernetes cluster with
kubeadm start and enable
kubelet.servicewill fail (but restart) until configuration for it is present.
When creating a new kubernetes cluster with
kubeadm a control-plane has to be created before further worker nodes can join it.
- If the cluster is supposed to be turned into a high availability cluster (a stacked etcd topology) later on
kubeadm initneeds to be provided with
--control-plane-endpoint=<IP or domain>(it is not possible to do this retroactively!).
- It is possible to use a config file for
kubeadm initinstead of a set of parameters.
Initialize a control-plane
kubeadm init to initialize a control-plane on a host machine:
# kubeadm init --node-name=<name_of_the_node> --pod-network-cidr=<CIDR> --cri-socket=<SOCKET>
If run successfully,
kubeadm init will have generated configurations for the
kubelet and various control-plane components below
Finally, it will output commands ready to be copied and pasted to setup kubectl and make a worker node join the cluster (based on a token, valid for 24 hours).
kubectl with the freshly created control-plane node, setup the configuration (either as root or as a normal user):
$ mkdir -p $HOME/.kube # cp -i /etc/kubernetes/admin.conf $HOME/.kube/config # chown $(id -u):$(id -g) $HOME/.kube/config
Installing a Pod network add-on
Pod network add-on (CNI plugins) implements the Kubernetes network model differently from simple solutions like flannel to more complicated solutions like calico
An increasingly adopted advanced CNI plugin is cilium, which achieves impressive performance with eBPF. To install
cilium as CNI plugin, use cilium-cli:
For more details on pod network, see this offcial document.
With the token information generated in #Control-plane it is possible to make a node machine join an existing cluster:
# kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash> --node-name=<name_of_the_node> --cri-socket=<SOCKET>
If you are using Cilium and find the working node remains to be
NotReady, check the status of working node using:
kubectl describe node <node-id> --namespace=kube-system
If you found the following condition status:
Type Status Reason ---- ------ ------ NetworkUnavailable Fasle CiliumIsUp Ready False KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Try to restart
kubelet on the working node:
sudo systemctl restart containerd sudo systemctl restart kubelet
Tips and tricks
Tear down a cluster
When it is necessary to start from scratch, use kubectl to tear down a cluster.
kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
<node name> is the name of the node that should be drained and reset.
kubectl get node -A to list all nodes.
Then reset the node:
# kubeadm reset
Operating from Behind a Proxy
kubeadm reads the
no_proxy environment variables. Kubernetes internal networking should be included in the latest one, for example
where the second one is the default service network CIDR.
Failed to get container stats
Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
it is necessary to add configuration for the kubelet (see relevant upstream ticket).
systemCgroups: '/systemd/system.slice' kubeletCgroups: '/systemd/system.slice'
Pods cannot communicate when using Flannel CNI and systemd-networkd
See upstream bug report.
systemd-networkd assigns a persistent MAC address to every link. This policy is defined in its shipped configuration file
/usr/lib/systemd/network/99-default.link. However, Flannel relies on being able to pick its own MAC address. To override systemd-networkd's behaviour for
flannel* interfaces, create the following configuration file:
[Match] OriginalName=flannel* [Link] MACAddressPolicy=none
If the cluster is already running, you might need to manually delete the
flannel.1 interface and the
kube-flannel-ds-* pod on each node, including the master. The pods will be recreated immediately and they themselves will recreate the
Delete the interface
# ip link delete flannel.1
kube-flannel-ds-* pod. Use the following command to delete all
kube-flannel-ds-* pods on all nodes:
$ kubectl -n kube-system delete pod -l="app=flannel"
CoreDNS Pod pending forever, the control plane node remains
When bootstrap the Kubernetes with
kubeadm init on a single machine, and there is no other machine
kubeadm join the cluster, the control-plane node is default to be tainted. As a result, no workload will be scheduled on the working machine.
One can confirm the control-plane node is tainted by the following commands:
kubectl get nodes -o json | jq '.items.spec.taints
To temporarily allow scheduling on the control-plane node, execute:
kubectl taint nodes <your-node-name> node-role.kubernetes.io/control-plane:NoSchedule-
kubelet to apply the updates
sudo systemctl restart containerd sudo systemctl restart kubelet
- Kubernetes Documentation - The upstream documentation
- Kubernetes Cluster with Kubeadm - Upstream documentation on how to setup a Kubernetes cluster using kubeadm
- Kubernetes Glossary - The official glossary explaining all Kubernetes specific terminology
- Kubernetes Addons - A list of third-party addons
- Kubelet Config File - Documentation on the Kubelet configuration file
- Taints and Tolerations - Documentation on node affinities and taints