环境

版本:

CentOS		: 7.6.1810 (core)
K8s  		: v1.9.1
kernel		: 3.10.0-1062.9.1.el7.x86_64
iptables	: iptables-1.4.21-28.el7.x86_64

 

现象

在问题节点上的pod内无法通过serviceIP访问其他服务,问题节点上的nodePort也无法访问

如: 问题节点ip: 10.144.10.1 正常节点ip: 10.144.10.10  
kubectl get pods -owide |grep 10.144.10.1
tomcat-test-xxxx-xxxx  ..........   10.144.10.1 
kubectl exec -it tomcat-test-xxxx-xxxx -- telnet <serviceName> <port>
telnet: <serviceName>: name or service not known....

 

排查思路

  1. 确认iptables规则
    登录问题主机,通过iptables -nvL查看,iptables无路由规则,则说明kube-proxy存在异常导致iptables未正常写入
     

  2. 确认kube-proxy服务
    通过systemctl status kube-proxy发现存在报错:
     

E0221 23:06:48.798718       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:48.798973       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:49.802079       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:49.802941       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:50.810388       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:50.810436       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:51.812714       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:51.812799       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:52.816963       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:52.817650       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:53.818810       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:53.819444       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:54.821242       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:54.821894       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:55.823180       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized

 
报错为未认证,初步怀疑为kube-proxy.kubeconfig文件上下文配置及用户配置异常,并与正常节点进行比对:
异常节点配置与正常节点均为如下配置,说明上述两个配置为正常:
 

cat /etc/kubernetes/kube-proxy.kubeconfig
......omit......
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kube-proxy
name: default
current-context: default
kind: Config
preferences: {}
users:
- name: kube-proxy
user:
as-user-extra: {}
......omit......

 
3. 检查kube-proxy客户端证书文件,发现证书已过期
 

# 若使用内嵌式证书配置,则将编码用base64转义后查看,若使用绝对路径证书配置,则可以直接查看对应文件
echo 'sadfno1je.....' |base64 -d |openssl x509 -text  |grep -A 2 Validity
		Validity
			Not Before: Apr 5 04:34:00 2018 GMT
			Not After : Apr 5 04:34:00 2019 GMT

 
4. 由于kube-proxy证书中不涉及节点ip,因此可以直接用正常节点的kube-proxy.kubeconfig文件替换异常节点kube-proxy.kubeconfig文件,并重启kube-proxy服务:
 

# kube-proxy.kubeconfig文件路径根据kube-proxy启动配置的kubeconfig路径而定
# ps -ef |grep kube-proxy |grep kubeconfig即可确认config文件
scp root@<readyNodeIP>:/etc/kubernetes/kube-proxy.kubeconfig  /etc/kubernetes/kube-proxy.kubeconfig

 
若无正常节点,可基于已有ca.pem证书重签kube-proxy证书及私钥
 

cat kube-proxy-csr.json				
	{
	  "CN": "system:kube-proxy",			# user
	  "hosts": [],
	  "key": {
		"algo": "rsa",
		"size": 2048
	  },
	  "names": [
		{
		  "C": "CN",
		  "ST": "Beijing",					# 跟原证书保持一致
		  "L": "Beijing",					# 跟原证书保持一致
		  "O": "k8s",			# 跟原证书保持一致
		  "OU": "system"					# 跟原证书保持一致
		}
	  ]
	}

cat ca-config.json
{
	"signing": {
		"default": {
			"expiry": "87600h"
		},
		"profiles": {
			"kubernetes": {
				"usages": [
					"signing",
					"key encipherment",
					"server auth",
					"client auth"
				],
				"expiry": "87600h"		# 将该时间加长,防止短期内再次过期
			}
		}
	}
}

# 备份/etc/kubernetes下所有文件后,使用cfssl签发证书
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy

# 创建后将kube-proxy.pem及kube-proxy-key.pem文件进行base64转义,之后替换kube-proxy.kubeconfig中client-cert-data及client-key-data部分

# 替换后重启kube-proxy服务
systemctl restart kube-proxy

 
4. 查看kube-proxy状态,无Unauthorized报错,但存在iptables报错:
 

systemctl status kube-proxy
......omit......
kube-proxy: E0410 10:13:35.198005    9004 proxier.go:1667] Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5'
kube-proxy: Try `iptables-restore -h' for more information.
kube-proxy: )
kube-proxy: E0410 10:13:35.201017    9004 proxier.go:1667] Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5'
kube-proxy: Try `iptables-restore -h' for more information.
kube-proxy: )

 
5. 该问题为k8s 1.9版本与iptables-1.4.21-28.el7.x86_64版本兼容性问题(k8s高版本已修复),查看正常节点与异常节点iptables版本:
 

# 正常节点:
rpm -qa |grep iptables
iptables-1.4.21-16.el7.x86_64

# 异常节点:
rpm -qa |grep iptables
iptables-1.4.21-28.el7.x86_64

 
6. 对异常节点iptables进行降级操作(rpm包可从oracle官网下载)
 

rpm -Uvh --oldpackage iptables-1.4.21-16.el7.x86_64.rpm

Preparing...					######################## [100%]
Updating / installing...
    1:iptables-1.4.21-16.el7	######################## [100%]
Cleaning up / removing...
	2:iptables-1.4.21-28.el7	######################## [100%]

 
7. 检查rpm包已降级,重启kube-proxy后查看报错已消失,问题已解决

 posted on 2022-07-14 16:00  shelterCJJ  阅读(64)  评论(0)    收藏  举报