Docker基础知识 (19) - Kubernetes(二) | 部署 K8s 集群(一主一从)
Kubernetes,也被称为 K8s 或 Kube,是谷歌推出的业界最受欢迎的容器管理/运维工具(容器编排器)。它是一套自动化容器管理/运维的开源平台,包括部署、调度和节点集群的扩展等。
Kubernetes 的详细介绍,请参考 "系统架构与设计(6)- Kubernetes(K8s)"。
本文要部署 K8s 集群(一主一从)。
1. 部署环境
虚拟机: Virtual Box 6.1.30(Windows 版)
操作系统: Linux CentOS 7.9 64位
Docker 版本:20.10.7
Docker Compose 版本:2.6.1
Kubernetes 版本:1.23.0
工作目录:/home/k8s
Linux 用户:非 root 权限用户 (用户名自定义,这里以 xxx 表示),属于 docker 用户组
主机列表:
主机名 | IP | 角色 | 操作系统 |
k8s-master | 192.168.0.10 | master | CentOS 7.9 |
k8s-node01 | 192.168.0.11 | node | CentOS 7.9 |
1) 设置主机名
在 Master 主机上运行如下命令:
$ sudo hostnamectl set-hostname k8s-master
在 Node 主机上运行如下命令:
$ sudo hostnamectl set-hostname k8s-node01
修改 Master 和 Node 主机的 /etc/hosts 文件:
$ sudo vim /etc/hosts
# 添加
192.168.0.10 k8s-master
192.168.0.11 k8s-node01
2) 关闭 SELinux
(1) 临时关闭,运行如下命令
$ sudo setenforce 0
$ getenforce
Permissive
注:设置为 permissive 模式后,SELinux 被临时关闭,系统重启就失效。对应的临时开启的命令是:
$ sudo setenforce 1
$ getenforce
Enforcing
(2) 永久关闭,运行如下命令
$ sudo sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
$ getenforce
Enforcing
注:命令运行后,还是 Enforcing 模式。这是因为永久关闭,需要重启系统才能生效。重启后运行如下命令:
$ getenforce
Disabled
对应的永久开启的命令是:
$ sudo sed -i 's/^SELINUX=.*/SELINUX=enforcing/' /etc/selinux/config
也需要重启系统才能生效。
3) 关闭 SWAP
(1) 临时关闭,运行如下命令
$ sudo swapoff -a
$ free -m
total used free shared buff/cache available Mem: 1837 132 1588 8 116 1569 Swap: 0 0 0
注:Swap 被临时关闭,系统重启就失效。对应的临时开启的命令是:
$ sudo swapon -a
$ free -m
total used free shared buff/cache available Mem: 1837 133 1587 8 116 1568 Swap: 2047 0 2047
(2) 永久关闭,运行如下命令
$ sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab # 注释 /etc/fstab 中相应的条目
$ free -m
total used free shared buff/cache available Mem: 1837 133 1587 8 117 1568 Swap: 2047 0 2047
注:命令运行后,Swap 还是开启状态。这是因为永久关闭,需要重启系统才能生效。重启后运行如下命令:
$ free -m
total used free shared buff/cache available Mem: 1837 129 1546 8 161 1562 Swap: 0 0 0
永久开启,就是把 “sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab” 命令注释掉的条目恢复,需要重启系统才能生效。
4) 配置防火墙
关闭防火墙,命令如下。
$ sudo systemctl stop firewalld && systemctl disable firewalld
启动防火墙
$ sudo systemctl enable firewalld
5) 时间同步
$ sudo yum -y install ntpdate
$ sudo ntpdate time.windows.com
6)安装 Docker 和 Compose
Docker 安装配置请参考 “Docker基础知识 (1) - Docker 架构、Docker 安装、Docker 镜像加速”。
Docker compose 安装配置请参考 “Docker基础知识 (4) - Docker Compose”。
注:以上永久关闭 SELinux、永久关闭 SWAP、关闭防火墙、时间同步、安装 Docker 和 Compose 等操作,在 Master 和 Node 上都要执行。
2. 在 Master 和 Node 上安装 Kubernetes
1) 配置 YUM 源
$ sudo vim /etc/yum.repos.d/kubernetes.repo
[kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=0 repo_gpgcheck=0 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
2) 安装 Kubernetes 组件 (指定版本)
$ sudo yum install -y kubelet-1.23.0 kubeadm-1.23.0 kubectl-1.23.0
$ kubelet --version
Kubernetes v1.23.0
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:15:11Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?
3. 在 Master 上初始化集群
1) 初始化 Master 主机
以下两种运行方式,都可以初始化 Master 主机:
(1) 命令参数方式
$ sudo kubeadm init \
--apiserver-advertise-address=192.168.0.10 \
--image-repository=registry.aliyuncs.com/google_containers \
--kubernetes-version=v1.23.0 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16 \
--ignore-preflight-errors=all
参数说明:
--apiserver-advertise-address:设置 Master 节点 API Server 的监听地址
--image-repository:设置容器镜像拉取地址;
--kubernetes-version:设置 K8S 版本,需与您安装的保持一致;
--service-cidr:集群内部虚拟网络,Pod 统一访问入口;
--pod-network-cidr:Pod 网络,与部署 CNI 网络组件 yaml 文件中需保持一致;
--ignore-preflight-errors:其错误将显示为警告的检查列表,值为 'all' 忽略所有检查中的错误;
更多 kubeadm init 参数可查看官方文档:https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/
(2) 配置文件方式
$ cd /home/k8s
$ kubeadm config print init-defaults > init.default.yaml
$ vim init.default.yaml
apiVersion: kubeadm.k8s.io/v1beta3 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 192.168.0.10 bindPort: 6443 nodeRegistration: criSocket: /var/run/dockershim.sock name: k8s-master taints: null --- apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta3 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controllerManager: {} dns: type: CoreDNS etcd: local: dataDir: /var/lib/etcd imageRepository: registry.aliyuncs.com/google_containers kind: ClusterConfiguration kubernetesVersion: v1.23.0 networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 podSubnet: 10.244.0.0/16 scheduler: {}
注:imageRepository: 默认从 k8s.gcr.io 拉取镜像,这是国外地址,修改成 registry.aliyuncs.com/google_containers。
localAPIEndpoint.advertiseAddress: 修改为 192.168.0.10。
networking.podSubnet: 添加这一项,值为 10.244.0.0/16。
$ sudo kubeadm init --config=init.default.yaml
注:可以在运行 kubeadm init --config 命令之前,先运行 kubeadm config images pull --config=init.default.yaml 拉取镜像。
2) kubeadm init 执行结果
(1) 执行失败
[init] Using Kubernetes version: v1.23.0 ... [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
注:kubeadm init 执行失败,查看 kubelet 状态:
$ sudo journalctl -f -u kubelet
... Nov 16 01:56:05 k8s-master kubelet[8634]: E1116 01:56:05.490013 8634 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\"" Nov 16 01:56:05 k8s-master systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE Nov 16 01:56:05 k8s-master systemd[1]: Unit kubelet.service entered failed state. Nov 16 01:56:05 k8s-master systemd[1]: kubelet.service failed.
从 log 可以看出是驱动问题,即 docker 的驱动与 kubelet 驱动不一致。
查看 docker 驱动:
$ docker info | grep Cgroup
Cgroup Driver: cgroupfs
查看 kubelet 驱动:
$ sudo cat /var/lib/kubelet/config.yaml | grep cgroup
cgroupDriver: systemd
修改 docker 驱动,查看 /etc/docker/daemon.json 文件(如果没有,手动创建),添加以下内容:
{
...
"exec-opts":["native.cgroupdriver=systemd"]
}
重启 docker 和 重置 Master 节点:
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
$ sudo kubeadm reset
执行以上命令后,再次初始化 Master 主机。
(2) 执行成功
... Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.0.10:6443 --token 67lwvx.7nhlvr3y74g7yccg \ --discovery-token-ca-cert-hash sha256:c62bfdbb2a65c5ad5bdee19596f0130b92c93d12ececb8898deaeb2b54b1e7eb
注:kubeadm join 命令(两行),是用来在 Node 运行连接 Master 的,每次 kubeadm init 成功生成的这个命令里带的 sha256 不一样,要确保在 Node 上运行的命令是最后生成的。
这个 kubeadm join 命令内容,可以运行如下命令获得:
$ kubeadm token create --print-join-command
3) 创建 kubectl 连接认证文件
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ kubectl get nodes # 查看工作节点状态
NAME STATUS ROLES AGE VERSION k8s-master NotReady control-plane,master 52s v1.23.0
注:NotReady 是因为没有配置网络插件。
4) 添加 Flannel 网络插件
Flannel 是一种专为 Kubernetes 设计的简单易配置得 Pod 网络插件,它在众多开源的 CNI (Container Network Interface)插件中部署相对简单、相关文档较多的一个。
https://github.com/flannel-io/flannel
在 Master 主机上安装 Flannel 插件,步骤如下。
(1) 配置 kube-flannel.yml 文件
$ cd /home/k8s
$ wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml # 下载资源配置清单
$ kubectl apply -f kube-flannel.yml
namespace/kube-flannel created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds created
(2) 重启网络
$ sudo systemctl restart network
$ sudo systemctl restart kubelet
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION k8s-master Ready control-plane,master 4m v1.23.0
注:Flannel 网络生效可能会有延时,可以查看 flannel 相关 pod 是否处于 running 状态,命令如下:
$ kubectl get pod -n kube-flannel
4. 在 Node 上加入集群
1) 创建 kubectl 连接认证文件
从 Master 主机上复制 /etc/kubernetes/admin.conf 文件到本地 /etc/kubernetes/admin.conf。
$ sudo scp root@192.168.0.10:/etc/kubernetes/admin.conf /etc/kubernetes/admin.conf
$ sudo chmod +r /etc/kubernetes/admin.conf
$ echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
$ source ~/.bash_profile
2) 把 Node 主机加入集群
在 Node 主机上执行在 Master 主机上 kubeadm init 输出的 kubeadm join 命令,命令格式如下:
$ sudo kubeadm join 192.168.0.10:6443 --token 67lwvx.7nhlvr3y74g7yccg \
--discovery-token-ca-cert-hash sha256:c62bfdbb2a65c5ad5bdee19596f0130b92c93d12ececb8898deaeb2b54b1e7eb
参数说明:
--token:集群 Master 的 token
--discovery-token-ca-cert-hash:验证根 CA 公钥是否与此哈希匹配
更多 join 参数可查看官方文档:https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-join/#join-workflow
3) kubeadm join 执行结果
(1) 执行失败
[preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... [kubelet-check] Initial timeout of 40s passed. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
注:kubeadm join 执行失败,查看 kubelet 状态:
$ sudo journalctl -f -u kubelet
... Nov 16 04:49:27 k8s-node01 kubelet[30432]: E1116 04:49:27.087250 30432 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\"" Nov 16 04:49:27 k8s-node01 systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE Nov 16 04:49:27 k8s-node01 systemd[1]: Unit kubelet.service entered failed state. Nov 16 04:49:27 k8s-node01 systemd[1]: kubelet.service failed.
从 log 可以看出是驱动问题,即 docker 的驱动与 kubelet 驱动不一致,参考上文 Master 节点上的方法修改,改完后再运行 kubeadm join 命令。
(2) 执行成功
[preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
# 查看节点
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION k8s-master Ready control-plane,master 129m v1.23.0 k8s-node01 NotReady <none> 7m15s v1.23.0
注:k8s-node01 的状态 NotReady 是因为没有配置网络插件.
4) 添加 Flannel 网络插件
在 Node 主机上安装 Flannel 插件,步骤如下。
(1) 配置 kube-flannel.yml 文件
$ cd /home/k8s
$ wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml # 下载资源配置清单
$ kubectl apply -f kube-flannel.yml
namespace/kube-flannel created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds created
(2) 重启网络
$ sudo systemctl restart network
$ sudo systemctl restart kubelet
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION k8s-master Ready control-plane,master 140m v1.23.0 k8s-node01 Ready <none> 18m v1.23.0
5. 卸载 Kubernetes (Master/Node 上)
1) Flannel 插件清理
$ cd /home/k8s
$ kubectl delete -f kube-flannel.yml
$ sudo systemctl restart kubelet
2) 重置节点
$ sudo kubeadm reset
[reset] Reading configuration from the cluster... [reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. [reset] Are you sure you want to proceed? [y/N]: y ...
3) 卸载组件
# 卸载 3 个 k8s 重要组件
$ sudo yum -y remove kubelet kubeadm kubectl
Loaded plugins: fastestmirror Resolving Dependencies --> Running transaction check ---> Package kubeadm.x86_64 0:1.23.0-0 will be erased ---> Package kubectl.x86_64 0:1.23.0-0 will be erased ---> Package kubelet.x86_64 0:1.23.0-0 will be erased --> Processing Dependency: kubelet for package: kubernetes-cni-0.8.7-0.x86_64 --> Running transaction check ---> Package kubernetes-cni.x86_64 0:0.8.7-0 will be erased --> Finished Dependency Resolution ...
# 删除配置目录
$ rm -rf ~/.kube/
4) 清理 k8s 相关的 docker 镜像
单独安装的 k8s 可以运行如下命令,清理全部 k8s 镜像:
$ docker rm $(docker ps -a -q)
$ docker rmi $(docker images -q)
也可以根据 docker images 命令的输出列表,手动运行 docker rmi 命令逐个删除 k8s 的镜像:
$ sudo docker images
...
$ sudo docker rmi [IMAGE ID]
...
重启 docker
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker