Start a kubernetes cluster

1. check if service of container runtime -- containerd -- is running on all computers who want to join the kubernetes cluster.

$ systemctl status containerd
# if containerd is not running, start it
$ systemctl start containerd

● containerd.service - containerd container runtime
Loaded: loaded (/usr/local/lib/systemd/system/containerd.service; enabled;>
Active: active (running) since Fri 2024-02-09 15:10:34 CST; 12min ago
Docs: https://containerd.io
Process: 906 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUC>
Main PID: 915 (containerd)
Tasks: 385
Memory: 221.1M
CPU: 24.201s
CGroup: /system.slice/containerd.service
├─ 915 /usr/local/bin/containerd
├─2593 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
├─2613 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
├─2614 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
├─2615 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
├─3230 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
├─3588 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
├─5795 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
├─5840 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
├─5873 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
├─5919 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
├─6415 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
├─6969 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -i>
lines 1-23

2. initialize the control-plane node

kubectl init --image-repository=registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16

kubeadm init first runs a series of prechecks to ensure that the machine is ready to run Kubernetes. These prechecks expose warnings and exit on errors. kubeadm init then downloads and installs the cluster control plane components. This may take several minutes. After it finishes you should see:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a Pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>

To make kubectl work for your non-root user, run these commands, which are also part of the kubeadm init output:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Make a record of the kubeadm join command that kubeadm init outputs. You need this command to join nodes to your cluster.

The token is used for mutual authentication between the control-plane node and the joining nodes. The token included here is secret. Keep it safe, because anyone with this token can add authenticated nodes to your cluster. These tokens can be listed, created, and deleted with the kubeadm token command.

After kubeadm init :

(base) maye@maye-Inspiron-5547:~$ kubectl get pods --all-namespaces

 (base) maye@maye-Inspiron-5547:~$ kubectl get pod --all-namespaces
NAMESPACE     NAME                                         READY   STATUS    RESTARTS   AGE
kube-system   coredns-5bbd96d687-m65qk                     1/1     Running   0          4m48s
kube-system   coredns-5bbd96d687-wdqlb                     1/1     Running   0          4m48s
kube-system   etcd-maye-inspiron-5547                      1/1     Running   6          5m7s
kube-system   kube-apiserver-maye-inspiron-5547            1/1     Running   0          5m
kube-system   kube-controller-manager-maye-inspiron-5547   1/1     Running   0          5m7s
kube-system   kube-proxy-v78wf                             1/1     Running   0          4m48s
kube-system   kube-scheduler-maye-inspiron-5547            1/1     Running   0          5m4s
(base) maye@maye-Inspiron-5547:~$

Note:

The default image repository "registry.k8s.io/" is not accessible in china, specifying an accessible image repository in china:
--image repository=registry.aliyuncs.com/google_containers .
If --pod-network-cidr=10.244.0.0/16 is not specified, the error:

pods of flannel not ready
$ journalctl -fu kubelet --->
"loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory"

will be raised when deploying pod-network addon -- flannel.
3. "READY 1/2" in output of kubectl get pod means that there are 2 containers in this pod, and 1 container is ready.

Attention:
You MUST disable swap if the kubelet is not properly configured to use swap. For example, sudo swapoff -a will disable swapping temporarily. To make this change persistent across reboots, make sure swap is disabled in config files like /etc/fstab, systemd.swap, depending how it was configured on your system.

Note:
The control-plane node is the machine where the control plane components run, including etcd (the cluster database) and the API Server (which the kubectl command line tool communicates with).

3. deploy a pod network addon -- flannel

kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

If you use custom podCIDR (not 10.244.0.0/16) you first need to download the above manifest and modify the network to match your one.

After kubeadm init and deploy flannel :

(base) maye@maye-Inspiron-5547:/run/flannel$ kubectl get pods --all-namespaces

NAMESPACE      NAME                                         READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-gtzmm                        1/1     Running   0          81s
kube-system    coredns-66f779496c-rc7nc                     1/1     Running   0          7m53s
kube-system    coredns-66f779496c-zlc5c                     1/1     Running   0          7m52s
kube-system    etcd-maye-inspiron-5547                      1/1     Running   6          8m8s
kube-system    kube-apiserver-maye-inspiron-5547            1/1     Running   7          8m12s
kube-system    kube-controller-manager-maye-inspiron-5547   1/1     Running   0          8m9s
kube-system    kube-proxy-gfp8z                             1/1     Running   0          7m53s
kube-system    kube-scheduler-maye-inspiron-5547            1/1     Running   8          8m8s
(base) maye@maye-Inspiron-5547:/run/flannel$

Note:

Flannel is an overlay network provider that can be used with Kubernetes.
Flannel can be added to any existing Kubernetes cluster though it's simplest to add flannel before any pods using the pod network have been started.

4. Control plane node isolation

By default, your cluster will not schedule Pods on the control plane nodes for security reasons. If you want to be able to schedule Pods on the control plane nodes, for example for a single machine Kubernetes cluster, run:

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

The output will look something like:

node "test-01" untainted
...

This will remove the node-role.kubernetes.io/control-plane:NoSchedule taint from any nodes that have it, including the control plane nodes, meaning that the scheduler will then be able to schedule Pods everywhere.

5. Joining your nodes

The nodes are where your workloads (containers and Pods, etc) run. To add new nodes to your cluster do the following for each machine:

SSH to the machine

Become root (e.g. sudo su -)

Install and start a container runtime if not have one.

Run the command that was output by kubeadm init. For example:

kubeadm join --token <token> <control-plane-host>:<control-plane-port> --discovery-token-ca-cert-hash sha256:<hash>

root@maye-laptop:/home/maye# kubeadm join <control-plane-host>:<control-plane-port> --token u0dcnr.n3u9hypqq5egwt6a --discovery-token-ca-cert-hash sha256:a5043aacbc38dfeef20b49efac3001feced742eb477f5ba75ffb2b045e325660
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:

Certificate signing request was sent to apiserver and a response was received.
The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

root@maye-laptop:/home/maye#

A few seconds later, you should notice this node in the output from kubectl get nodes when run on the control-plane node.

If you do not have the token, you can get it by running the following command on the control-plane node:

kubeadm token list

The output is similar to this:

TOKEN                    TTL  EXPIRES              USAGES           DESCRIPTION            EXTRA GROUPS
8ewj1p.9r9hcjoqgajrj4gi  23h  2018-06-12T02:51:28Z authentication,  The default bootstrap  system:
                                                   signing          token generated by     bootstrappers:
                                                                    'kubeadm init'.        kubeadm:
                                                                                           default-node-token

By default, tokens expire after 24 hours. If you are joining a node to the cluster after the current token has expired, you can create a new token by running the following command on the control-plane node:

kubeadm token create

The output is similar to this:

5didvk.d09sbcov8ph2amjw

If you don't have the value of --discovery-token-ca-cert-hash, you can get it by running the following command chain on the control-plane node:

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | \
   openssl dgst -sha256 -hex | sed 's/^.* //'

The output is similar to:

8cb2de97839780a412b93877f8507ad6c94f73add17d5d7058e91741c9d5ec78

Check ip on ubuntu:

(base) maye@maye-Inspiron-5547:~/Documents/kubernetes_install$ ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp1s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
link/ether 34:17:eb:74:ca:12 brd ff:ff:ff:ff:ff:ff
3: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether a0:88:69:7b:2e:51 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.104/24 brd 192.168.0.255 scope global dynamic noprefixroute wlp2s0
valid_lft 5391sec preferred_lft 5391sec
inet6 fe80::f882:bcec:8131:d30e/64 scope link noprefixroute
valid_lft forever preferred_lft forever
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 1a:60:99:38:79:c9 brd ff:ff:ff:ff:ff:ff
inet 10.244.0.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::1860:99ff:fe38:79c9/64 scope link
valid_lft forever preferred_lft forever
5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
link/ether 56:44:b3:26:f3:3c brd ff:ff:ff:ff:ff:ff
inet 10.244.0.1/24 brd 10.244.0.255 scope global cni0
valid_lft forever preferred_lft forever
inet6 fe80::5444:b3ff:fe26:f33c/64 scope link
valid_lft forever preferred_lft forever
16: vetha31afc9f@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
link/ether f2:c4:a4:87:37:aa brd ff:ff:ff:ff:ff:ff link-netns cni-0d173452-808f-f2f9-e57f-9346f9452705
inet6 fe80::f0c4:a4ff:fe87:37aa/64 scope link
valid_lft forever preferred_lft forever
17: vetha182ad0d@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
link/ether 46:46:6a:5d:5b:4b brd ff:ff:ff:ff:ff:ff link-netns cni-d135db16-3318-9d24-7e74-19d91b403162
inet6 fe80::4446:6aff:fe5d:5b4b/64 scope link
valid_lft forever preferred_lft forever
(base) maye@maye-Inspiron-5547:~/Documents/kubernetes_install$

After join one node to the cluster:

(base) maye@maye-Inspiron-5547:~$ kubectl get nodes

NAME                 STATUS   ROLES           AGE     VERSION
maye-inspiron-5547   Ready    control-plane   141m    v1.28.4
maye-laptop          Ready    <none>          6m31s   v1.28.4

(base) maye@maye-Inspiron-5547:~$ kubectl get pods --all-namespaces

NAMESPACE      NAME                                         READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-gtzmm                        1/1     Running   0          138m
kube-flannel   kube-flannel-ds-mvk6h                        1/1     Running   0          10m
kube-system    coredns-66f779496c-rc7nc                     1/1     Running   0          144m
kube-system    coredns-66f779496c-zlc5c                     1/1     Running   0          144m
kube-system    etcd-maye-inspiron-5547                      1/1     Running   6          144m
kube-system    kube-apiserver-maye-inspiron-5547            1/1     Running   7          144m
kube-system    kube-controller-manager-maye-inspiron-5547   1/1     Running   0          144m
kube-system    kube-proxy-gdvhc                             1/1     Running   0          10m
kube-system    kube-proxy-gfp8z                             1/1     Running   0          144m
kube-system    kube-scheduler-maye-inspiron-5547            1/1     Running   8          144m

Note:

As the cluster nodes are usually initialized sequentially, the CoreDNS Pods are likely to all run on the first control-plane node. To provide higher availability, please rebalance the CoreDNS Pods with kubectl -n kube-system rollout restart deployment coredns after at least one new node is joined.

6. clean up

6.1 drain nodes which are not control plane, namely remove pods on the node.

kubectl drain <node name> --delete-emptydir-data --force --ignore-daemonsets

6.2 reset the state installed by kubeadm on the node:

$ kubeadm reset

(base) maye@maye-Inspiron-5547:~$ sudo kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0209 21:11:53.835482 385063 preflight.go:56] [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
[reset] Deleted contents of the etcd data directory: /var/lib/etcd
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file. (base) maye@maye-Inspiron-5547:~$

The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually:

iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If you want to reset the IPVS tables, you must run the following command:

ipvsadm -C

6.3 Now remove the node:

kubectl delete node <node name>

6.4 Clean up the control plane

You can use kubeadm reset on the control plane host to trigger a best-effort clean up.

If you wish to start over, run kubeadm init or kubeadm join with the appropriate arguments.

7. Debug a pod

7.1 check details of the pod using `kubectl describe pod -n ', Error message will be shown in "Events: " of the output.

(base) maye@maye-Inspiron-5547:~$ kubectl describe pod kube-flannel-ds-mvk6h -n kube-flannel

...
Events:
Type Reason Age From Message

Normal Scheduled 11m default-scheduler Successfully assigned kube-flannel/kube-flannel-ds-mvk6h to maye-laptop
Normal Pulled 11m kubelet Container image "docker.io/flannel/flannel-cni-plugin:v1.2.0" already present on machine
Normal Created 11m kubelet Created container install-cni-plugin
Normal Started 11m kubelet Started container install-cni-plugin
Normal Pulled 11m kubelet Container image "docker.io/flannel/flannel:v0.23.0" already present on machine
Normal Created 11m kubelet Created container install-cni
Normal Started 10m kubelet Started container install-cni
Normal Pulled 10m kubelet Container image "docker.io/flannel/flannel:v0.23.0" already present on machine
Normal Created 10m kubelet Created container kube-flannel
Normal Started 10m kubelet Started container kube-flannel
(base) maye@maye-Inspiron-5547:~$

7.2 check log of containers running in the pod:

kubectl logs <pod-name> -n <namespace>

7.3 check log of kubelet, who manages pods and cotainers.

$ journalctl -fu kubelet

journalctl -- see the log of processes started by systemd
-f -- 实时查看新增的条目
-u -- 指定看哪个 service unit 的log

or,

$ gedit /var/log/syslog

ctrl + f 查找关键字，如kubelet

8. Error & Solution

[ERROR: running with swap on is not supported.]
[SOLUTION]

 $ sudo gedit /etc/fstab

comment out the line starting with '/swapfile'.

[ERROR: failed to pull image registry.k8s.io/kube-apiserver:v1.28]
[SOLUTION]
This step is to download the image needed by kubenetes, and the default repository is registry.k8s.io, which is not accessible in mainland of china, use aliyun repository mirror -- registry.aliyuncs.com/google_containers -- instead:

$ kubeadm init --image-repository=registry.aliyuncs.com/google_containers

[ERROR kubelet not running: kubelet is not running or not healthy.]
[SOLUTION]
This error is caused by that kubelet and containerd use different cgroup driver, set both to systemd.

Set containerd's cgroup driver:

$ sudo gedit /etc/containerd/config.toml

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"

If /etc/containerd/config.toml not exist, create it by:

$ mkdir -p /etc/containerd
$ containerd config defaut | sudo tee /etc/containerd/config.toml

Set kubelet's cgroup driver:

# kubeadm-config.yaml
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: v1.21.0
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd

kubeadm init --config kubeadm-config.yaml

[ERROR Port is in use: Port 10250 is in use.]

$ sudo kubeadm init --image-repository=registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16

[sudo] password for maye:
I0209 21:11:06.817757 384177 version.go:256] remote version is much newer: v1.29.1; falling back to: stable-1.26
[init] Using Kubernetes version: v1.26.13
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher

[SOLUTION]
reset the node.

 $ sudo systemctl stop kubelet.service   
 $ kubeadm reset   
 $ kubeadm init --image-repository=registry.aliyuncs.com/google_containers 
--pod-network-cidr=10.244.0.0/16

[ERROR File alreay exists: /etc/kubernetes/manifests/kube-apiserver.yaml already exists]
[SOLUTION]

$ rm -rf /etc/kubernetes/manifests/*
$ rm -rf /var/lib/etcd

[ERROR pod status is stuck at ContainerCreating: failed to pull image "registry.k8s.io/pause:3.8]

$ kubectl get pods -n kube-system

kube-proxy-s729z 0/1 ContainerCreating

$ kubectl describe pod kube-proxy-s729z -n kube-system

Event:
failed to get sandbox image "registry.k8s.io/pause:3.8": failed to pull image "registry.k8s.io/pause:3.8"

[SOLUTION]
This error is caused by that pulling image fails from registry.k8s.io on some node, replace the image repository to an accessible on in china:

$ sudo gedit /etc/containerd/config.toml

[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"

[ERROR: unable to create new content in namespace xxx because it is being terminated.]

kubectl apply -f xxx.yaml

unable to create new content in namespace xxx because it is being terminated.

[SOLUTION]
This error is caused by deleting a namespace before deleting the resources in it via kubectl delete -f xxx.yaml. The solution is :
step 1:
$ kubectl get namespace <terminating-namespace> -o json > /home/maye/maye.json
step2: $ sudo gedit maye.json

delete the field "finalizers", e.g:

# in file maye.json
finalizers: ["finalizers.knative-serving.io/namespaces"]

delete the whole key-value map, usually filed "finalizers" in "metadata" or "spec" or both, delete all "finalizers" fields.
step 3: replace the existing resource (the terminating namespace in this example) with the one defined in the file maye.json:
$ kubectl replace --raw "/api/v1/namespaces/<terminating-namespace>/finalize" -f /home/maye/maye.json

[ERROR: pods of flannel not ready: open /run/flannel/subnet.env: no such file or directory]

$ journalctl -fu kubelet

"loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory"

[SOLUTION]
step 1: clean up the cluster:

$ kubectl drain <node name> --delete-emptydir-data --force --ignore-daemonsets
$ kubeadm reset

step 2: init the cluster again with pod-network-cidr specified:

$ kubeadm init --image-repository=registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16

step 3: deploy fannel:

(base) maye@maye-Inspiron-5547:~$ kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

[ERROR: port 10250 is in use]

root@maye-laptop:/home/maye# kubeadm join --token ercj5r.8i8sspccfgpx1z3q 192.168.0.104:6443 --discovery-token-ca-cert-hash sha256:75a8426aebf9dc7b52c6b36bf281435aa0fe0064e94947154825eaa87ca11ab0

[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher

[SOLUTION]

# solution for [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: # /etc/kubernetes/kubelet.conf already exists
root@maye-laptop:/home/maye# rm /etc/kubernetes/kubelet.conf
root@maye-laptop:/home/maye# rm /etc/kubernetes/pki/ca.crt

# solution for [ERROR Port-10250]: Port 10250 is in use
# check which process is using the pod:
root@maye-laptop:/home/maye# sudo lsof -i:10250

COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
kubelet 24207 root   23u  IPv6 451262      0t0  TCP *:10250 (LISTEN)

# reset the node
root@maye-laptop:/home/maye# kubeadm reset
# clean up custom iptables on the node
root@maye-laptop:/home/maye# iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
# join the cluster again
root@maye-laptop:/home/maye# kubeadm join --token ercj5r.8i8sspccfgpx1z3q 192.168.0.104:6443 --discovery-token-ca-cert-hash sha256:75a8426aebf9dc7b52c6b36bf281435aa0fe0064e94947154825eaa87ca11ab0

[ERROR ErrImagePull: dial tcp: lookup registry.aliyuncs.com on 127.0.0.53:53: read udp 127.0.0.1:33252->127.0.0.53:53: i/o timeout]

kubectl get pod --all-namespaces

kube-system kube-proxy-v955q 0/1 ErrImagePull

(base) maye@maye-Inspiron-5547:~$ kubectl describe pod kube-proxy-v955q -n kube-system

Name: kube-proxy-v955q
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: kube-proxy
Node: maye-laptop/192.168.0.102
Start Time: Mon, 08 Jan 2024 17:32:09 +0800
Labels: controller-revision-hash=85b545b64f
k8s-app=kube-proxy
pod-template-generation=1
Annotations:
Status: Running
IP: 192.168.0.102
IPs:
IP: 192.168.0.102
Controlled By: DaemonSet/kube-proxy
Containers:
kube-proxy:
Container ID: containerd://1195654b063b604a9ece797d141b185fc8714c74f7d628b06cbb5a83aca6af9e
Image: registry.aliyuncs.com/google_containers/kube-proxy:v1.26.12
Image ID: registry.aliyuncs.com/google_containers/kube-proxy@sha256:cf83e7ff3ae5565370b6b0e9cfa6233b27eb6113a484a0074146b1bbb0bd54e3
Port:
Host Port:
...
Events:
Type Reason Age From Message

Normal Scheduled 3m11s default-scheduler Successfully assigned kube-system/kube-proxy-v955q to maye-laptop
Warning Failed 2m55s kubelet Failed to pull image "registry.aliyuncs.com/google_containers/kube-proxy:v1.26.12": rpc error: code = Unknown desc = failed to pull and unpack image "registry.aliyuncs.com/google_containers/kube-proxy:v1.26.12": failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://registry.aliyuncs.com/v2/google_containers/kube-proxy/manifests/sha256:cf83e7ff3ae5565370b6b0e9cfa6233b27eb6113a484a0074146b1bbb0bd54e3": dial tcp: lookup registry.aliyuncs.com on 127.0.0.53:53: read udp 127.0.0.1:45655->127.0.0.53:53: i/o timeout
Warning Failed 99s (x2 over 2m55s) kubelet Error: ErrImagePull
Warning Failed 99s kubelet Failed to pull image "registry.aliyuncs.com/google_containers/kube-proxy:v1.26.12": rpc error: code = Unknown desc = failed to pull and unpack image "registry.aliyuncs.com/google_containers/kube-proxy:v1.26.12": failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://registry.aliyuncs.com/v2/google_containers/kube-proxy/manifests/sha256:cf83e7ff3ae5565370b6b0e9cfa6233b27eb6113a484a0074146b1bbb0bd54e3": dial tcp: lookup registry.aliyuncs.com on 127.0.0.53:53: read udp 127.0.0.1:33252->127.0.0.53:53: i/o timeout
Normal BackOff 84s (x2 over 2m55s) kubelet Back-off pulling image "registry.aliyuncs.com/google_containers/kube-proxy:v1.26.12"
Warning Failed 84s (x2 over 2m55s) kubelet Error: ImagePullBackOff
Normal Pulling 70s (x3 over 3m8s) kubelet Pulling image "registry.aliyuncs.com/google_containers/kube-proxy:v1.26.12"
Normal Pulled 46s kubelet Successfully pulled image "registry.aliyuncs.com/google_containers/kube-proxy:v1.26.12" in 23.881696233s (23.881733738s including waiting)
Normal Created 45s kubelet Created container kube-proxy
Normal Started 45s kubelet Started container kube-proxy

[SOLUTION]
ping registry.aliyuncs.com on the host, if ok, then this is caused by temporary not good network, it does not matter, since kubelet will Back-off pulling image until Successfully pulled image, so just wait for a while.

[ERROR: kubeadm init: user is not running as root]

(base) maye@maye-InOR IsPrivilegedUser]: spiron-5547:~$ kubeadm init --image-repository=registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16

I0209 21:10:22.068710 383339 version.go:256] remote version is much newer: v1.29.1; falling back to: stable-1.26
[init] Using Kubernetes version: v1.26.13
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR IsPrivilegedUser]: user is not running as root
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...

[SOLUTION]

$ sudo kubeadm init --image-repository=registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16

posted on 2024-02-09 18:13 zhenxia-jiuyou 阅读(28) 评论(0) 编辑收藏举报

刷新页面返回顶部

导航

Start a kubernetes cluster

1. check if service of container runtime -- containerd -- is running on all computers who want to join the kubernetes cluster.

2. initialize the control-plane node

3. deploy a pod network addon -- flannel

4. Control plane node isolation

5. Joining your nodes

6. clean up

6.1 drain nodes which are not control plane, namely remove pods on the node.

6.2 reset the state installed by kubeadm on the node:

6.3 Now remove the node:

6.4 Clean up the control plane

7. Debug a pod

7.1 check details of the pod using `kubectl describe pod -n ', Error message will be shown in "Events: " of the output.

7.2 check log of containers running in the pod:

7.3 check log of kubelet, who manages pods and cotainers.

8. Error & Solution