【kubernetes 问题排查】使用 kubeadm 部署问题汇总
- 引言
- 问题汇总
- The connection to the server localhost:8080 was refused - did you specify the right host or port?
- kubectl get cs 出现 Unhealthy
- failed to run Kubelet: misconfiguration: kubelet cgroup driver: “systemd“ is different from docker
- Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
- Kubeadm:如何解决kubectl get cs显示scheduler Unhealthy,controller-manager Unhealthy
- 解决 Kubernetes 中 Kubelet 组件报 failed to get cgroup 错误
- 解决k8s"failed to set bridge addr: "cni0" already has an IP address different from 10.244.1.1/24"
- 排查coredns 解析不到domain
- 集群证书到期
- 部署高可用 HAProxy
引言
再使用kubeadm部署集群时会多少遇到一些问题,这里做下记录,方便后面查找问题时有方向,同时也为刚要入坑的你指明下方向,让你少走点弯路
问题汇总
The connection to the server localhost:8080 was refused - did you specify the right host or port?
# root 用户 export KUBECONFIG=/etc/kubernetes/admin.conf # 非root mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl get cs 出现 Unhealthy
修改 /etc/kubernetes/manifests/kube-scheduler.yaml
和 /etc/kubernetes/manifests/kube-controller-manager.yaml
注释掉 - --port=0
failed to run Kubelet: misconfiguration: kubelet cgroup driver: “systemd“ is different from docker
{ "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ] }
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config kubectl get nodes export KUBECONFIG=/etc/kubernetes/kubelet.conf kubectl get node
Kubeadm:如何解决kubectl get cs显示scheduler Unhealthy,controller-manager Unhealthy
root用户
配置文件路径为:
/etc/kubernetes/manifests/kube-scheduler.yaml
/etc/kubernetes/manifests/kube-controller-manager.yaml
user账号
$HOME/.kube/
将两个配置文件中 –port
注释掉
解决 Kubernetes 中 Kubelet 组件报 failed to get cgroup 错误
解决k8s"failed to set bridge addr: "cni0" already has an IP address different from 10.244.1.1/24"
ifconfig cni0 down ip link delete cni0
- 查看
kubelet
日志
journalctl -xeu kubelet
排查coredns 解析不到domain
思路:在pod中使用 ping/dig 进行测试,由于集群是一个扁平化的网络在集群节点上可以互通,故可以使用coredns ip进行测试
使用dig @<coredns ip> baidu.com +trace
进行
集群证书到期
更新证书
kubeadm certs renew all # 注意不更新kubelet的
更新kubelet证书,默认采用证书轮换自动续期,如果已经到期了需要重新生成kubelet.conf
mv /etc/kubernetes/kubelet.conf . mv /var/lib/kubelet/pki/kubelet-client* . # 替换NODE为节点名称 可能需要--config kubeadm.conf kubeadm config print init-defaults > kubeadm.conf kubeadm kubeconfig user --org system:nodes --client-name system:node:$NODE --config /root/kubeadm.conf> kubelet.conf # 修改kubelet.conf apiserver cp kubelet.conf /etc/kubernetes/kubelet.conf systemctl restart kubelet
部署高可用 HAProxy
Haproxy
<details> <summary>查看 haproxy.cfg 配置文件</summary> ``` bash global log 127.0.0.1 local0 log 127.0.0.1 local1 notice maxconn 4096 #chroot /usr/share/haproxy #user haproxy #group haproxy #daemon stats socket /var/lib/haproxy/stats defaults mode http log global option httplog option dontlognull option http-server-close option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s maxconn 3000 frontend kube-apiserver bind *:6444 mode tcp timeout client 1h log global option tcplog default_backend kube-apiserver backend kube-apiserver option httpchk GET /healthz http-check expect status 200 mode tcp option ssl-hello-chk balance roundrobin balance roundrobin server k8s-master01 192.168.4.41:6443 check #主要配置这里 server k8s-master02 192.168.4.42:6443 check server k8s-master03 172.17.0.66:6443 check ``` </details> ``` bash docker run -d --restart=always --name haproxy -p 6444:6444 \ -v ~/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg \ haproxy ```
Keepalived
-
ip address associated with VRID 160 not present in MASTER advert : xx.xx.xx.xx
可能是keepalive master的virtual_router_id和局域网内的其它的keepalive master的virtual_router_id有冲突
修改/etc/keepalived/keepalived.conf
中的virtual_router_id
重启即可 -
配置文件详解
#!/bin/bash VIRTUAL_IP=192.168.4.200 # 设置虚拟IP INTERFACE=eth0 #所使用的的网卡 NETMASK_BIT=24 CHECK_PORT=6444 RID=10 VRID=160 #虚拟路由ID 需要唯一 MCAST_GROUP=224.0.0.18 docker run -itd --restart=always --name=Keepalived-K8S \ --net=host --cap-add=NET_ADMIN \ -e VIRTUAL_IP=$VIRTUAL_IP \ -e INTERFACE=$INTERFACE \ -e CHECK_PORT=$CHECK_PORT \ -e RID=$RID \ -e VRID=$VRID \ -e NETMASK_BIT=$NETMASK_BIT \ -e MCAST_GROUP=$MCAST_GROUP \ wise2c/keepalived-k8s
使用 ip addr
即可看到所设置网卡上的虚拟IP(PS:ifconfig
是看不到的)
本文来自博客园,作者:流年灬似氺,转载请注明原文链接:https://www.cnblogs.com/lic0914/p/16193679.html
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人