1. Computer Environment

OS: ubuntu 20.04

2. Package version

kubernetes: 1.26.12

Note: kubeflow pipeline v2.0.5 compatible with kubernetes up to v1.26

3. Component introduction

3.1 kubernetes: a container cluster manager.
3.2 kubectl: command line interface of kubernetes.
3.3 kubeadm: the command to bootstrap the cluster.
3.4 kubelet: pod manager.
3.5 pod: the minimal unit managed by kubernetes, just like a virtual computer.
3.6 container: an application which has standalone bin, lib, filesystem, a running container is a process.
3.7 containerd: a container runtime used by kubernetes.

4. Steps of installing kubernetes

4.1 Install containerd on all nodes (namely computers wanting to add the cluster)

4.1.1 Illustration

To run containers in Pods, kubernetes uses a container runtime.
By default, kubernetes uses the Container Runtime Interface (CRI) to interface with your chosen container runtime.
If you don't specify a runtime, kubeadm automatically tries to detect an installed container runtime by scanning through a list of known endpoints. If multiple or no container runtimes are detected, kubernetes will throw an error and will request that you specify which one you want to use.

4.1.2 Forwarding IPv4 and letting iptables see bridged traffic on all nodes of the cluster

$ cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

$ sudo modprobe overlay
$ sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
$ cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
$ sudo sysctl --system

# Verify that the br_netfilter, overlay modules are loaded by running the following commands:
$ lsmod | grep br_netfilter
$ lsmod | grep overlay

# Verify that the net.bridge.bridge-nf-call-iptables, net.bridge.bridge-nf-call-ip6tables, and net.ipv4.ip_forward system variables are set to 1 in your sysctl config by running the following command
$ sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward

[1]

4.1.3 Steps of Installing containerd

[2]

Step 1: Installing containerd
Download the containerd---.tar.gz archive from https://github.com/containerd/containerd/releases , verify its sha256sum, and extract it under /usr/local:

$ tar Cxzvf /usr/local containerd-1.6.2-linux-amd64.tar.gz

bin/
bin/containerd-shim-runc-v2
bin/containerd-shim
bin/ctr
bin/containerd-shim-runc-v1
bin/containerd
bin/containerd-stress

The containerd binary is built dynamically for glibc-based Linux distributions such as Ubuntu and Rocky Linux. This binary may not work on musl-based distributions such as Alpine Linux. Users of such distributions may have to install containerd from the source or a third party package.

systemd
If you intend to start containerd via systemd, you should also download the containerd.service unit file from https://raw.githubusercontent.com/containerd/containerd/main/containerd.service into /usr/local/lib/systemd/system/containerd.service, and run the following commands:

$ systemctl daemon-reload
$ systemctl enable --now containerd

Step 2: Installing runc
Download the runc. binary from https://github.com/opencontainers/runc/releases , verify its sha256sum, and install it as /usr/local/sbin/runc.

$ install -m 755 runc.amd64 /usr/local/sbin/runc

The binary is built statically and should work on any Linux distribution.

Step 3: Installing CNI plugins
Download the cni-plugins---.tgz archive from https://github.com/containernetworking/plugins/releases , verify its sha256sum, and extract it under /opt/cni/bin:

$ mkdir -p /opt/cni/bin
$ tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.1.1.tgz

./
./macvlan
./static
./vlan
./portmap
./host-local
./vrf
./bridge
./tuning
./firewall
./host-device
./sbr
./loopback
./dhcp
./ptp
./ipvlan
./bandwidth
The binaries are built statically and should work on any Linux distribution.

To start containerd service:

$ sudo systemctl start containerd 

4.1.4 Set systemd as the cgroup driver for containerd

Illustration:
On Linux, control groups are used to constrain resources that are allocated to processes.
Both the kubelet and the underlying container runtime need to interface with control group to enforce resource management for pods and containers and set resources such as cpu/memory requests and limits. To interface with control groups, the kubelet and the container runtime need to use a cgroup driver. It's critical that the kubelet and the container runtime use the same croup driver and are configured the same.
When systemd is chosen as the init system for a Linux distribution, the init process generates and consumes a root control group (cgroup) and acts as a cgroup manager.
Use systemd as the cgroup driver for the kubelet and the container runtime when systemd is the selected init system.

Steps of setting systemd as the cgroup driver for containerd:

Step1: Set in containerd's config file

In containerd's config file /etc/containerd/config.toml (default path after installing containerd), set:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true

Step2: Restart containerd service to make its config file take effect

# check status of containerd service
$ sudo systemctl status containerd  
$ sudo systemctl restart containerd

or

$ sudo systemctl stop containerd
$ sudo systemctl start containerd

4.1.5 Overiding the sandbox (pause) images of containerd

In your containerd config you can overwrite the sandbox image by setting the following config:
# registry.k8s.io website is not accessible in china, use registry.aliyuncs.com/google_containers instead

[plugins."io.containerd.grpc.v1.cri"]
#sandbox_image = "registry.k8s.io/pause:3.2"
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"

4.2 Install kubeadm, kubelet, and kubeclt

4.2.1 Illustration

kubeadm, kubelet, and kubeclt need to be installed on all nodes of the cluster.

4.2.2 Steps of installing kubeadm, kubelet, and kubeclt

Update the apt package index and install packages needed to use the Kubernetes apt repository:

$ sudo apt-get update

# apt-transport-https may be a dummy package; if so, you can skip that package
$ sudo apt-get install -y apt-transport-https ca-certificates curl gpg

Download the public signing key for the Kubernetes package repositories. The same signing key is used for all repositories so you can disregard the version in the URL:

$ curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.26/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

Add the appropriate Kubernetes apt repository. Please note that this repository have packages only for Kubernetes 1.26; for other Kubernetes minor versions, you need to change the Kubernetes minor version in the URL to match your desired minor version (you should also check that you are reading the documentation for the version of Kubernetes that you plan to install).

# This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list
$ echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.26/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

Update the apt package index, install kubelet, kubeadm and kubectl, and pin their version:

$ sudo apt-get update
$ sudo apt-get install -y kubelet kubeadm kubectl
$ sudo apt-mark hold kubelet kubeadm kubectl

Attention: In releases older than Debian 12 and Ubuntu 22.04, /etc/apt/keyrings does not exist by default; you can create it by running sudo mkdir -m 755 /etc/apt/keyrings

The kubelet is now restarting every few seconds, as it waits in a crashloop for kubeadm to tell it what to do.

4.2.3 Config systemd as the cgroup driver for kubelet

A minimal example of configuring the field explicitly:

#kubeadm-config.yaml
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: v1.21.0
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd

Such a configuration file can then be passed to the kubeadm command:

kubeadm init --config kubeadm-config.yaml

Note:
Kubeadm uses the same KubeletConfiguration for all nodes in the cluster. The KubeletConfiguration is stored in a ConfigMap object under the kube-system namespace.

Executing the sub commands init, join and upgrade would result in kubeadm writing the KubeletConfiguration as a file under /var/lib/kubelet/config.yaml and passing it to the local node kubelet.

5. Error & Solution

[ERROR: unit containerd.service has entered the 'failed' state with result 'exit-code']
$ systemctl start containerd
--> The unit containerd.service has entered the 'failed' state with result 'exit-code'
[SOLUTION]
$ gedit /var/log/syslog
crtl+f search for 'containerd', found:
"containerd: failed to load TOML from /etc/containerd/config.toml: invalid plugin key URI "cri" expect io.containerd.x.vx",
$ sudo gedit /ect/containerd/config.toml
--> changed plugin key "uri" to “io.containerd.grpc.v1.cri",
then $ systemctl start containerd, still the same error,
check /var/log/syslog, found:
maye-Inspiron-5547 containerd[831750]: containerd: failed to load TOML: /etc/containerd/config.toml: (180, 6): duplicated tables

This is saying that in file config.toml there are plugin keys with same name, in my case, two [plugins."io.containerd.grpc.v1.cri".registry], delete one, then ok.

[ERROR: Error while dialing: dial unix /var/run/containerd/containerd.sock: connect: no such file or directory]
[SOLUTION]
This is due to containerd service not started.

$ systemctl status containerd    # check status of containerd service
$ systemctl enable containerd    # load containerd service
$ systemctl start containerd     # start containerd service

[ERROR: kubernetes: when deleting a resource, its status stucks at 'Terminating']

[SOLUTION]
This is due to that the resource is being used by other resource in the kubernetes cluster, such as a namespace, pv, pvc, need to patch the resource to remove finalizers (namely resources who are using this resource), then the resource will be deleted, and can be created again. If a resource is in Terminating status, it can not be recreated, will raise error.

kubectl patch pv <pv-name> -p '{"metadata":{"finalizers":null}}'

Attention:
无法使用kubectl delete命令直接删除已有的PV或PVC,删除后会一直处于Terminating状态。Kubernetes为了防止误删除PV和PVC导致数据丢失,存在数据保护机制,无法使用delete命令直接删除。
执行以下命令,先解除保护机制,再删除PV或PVC。
如果已经使用kubectl delete命令删除PV或PVC,会一直处在Terminating状态,在执行下面patch命令后会直接删除,无需重复执行kubectl delete命令。

# PV
kubectl patch pv <pv-name> -p '{"metadata":{"finalizers":null}}'
kubectl delete pv <pv-name>

# PVC
kubectl patch pvc <pvc-name> -p '{"metadata":{"finalizers":null}}'
kubectl delete pvc <pvc-name>

[ERROR: deleted pod automatically created again]

[SOLUTION]
This is because that this pod is created by a deployment, if this pod is deleted manually, the deployment who creates it will create it automatically, so need to delete the deployment, then the pod is deleted together with the deployment.

$ kubectl get deployment --all-namespaces

kubeflow-user-example-com ml-pipeline-ui-artifact 0/1 1 0 68m
kubeflow-user-example-com ml-pipeline-visualizationserver 0/1 1 0 68m

$ kubectl delete deployment ml-pipeline-visualizationserver -n kubeflow-user-example-com 

deployment.apps "ml-pipeline-visualizationserver" deleted

References:


  1. https://v1-26.docs.kubernetes.io/docs/setup/production-environment/container-runtimes/ ↩︎

  2. https://github.com/containerd/containerd/blob/main/docs/getting-started.md ↩︎