环境
版本:
CentOS : 7.6.1810 (core)
K8s : v1.9.1
kernel : 3.10.0-1062.9.1.el7.x86_64
iptables : iptables-1.4.21-28.el7.x86_64
现象
在问题节点上的pod内无法通过serviceIP访问其他服务,问题节点上的nodePort也无法访问
如: 问题节点ip: 10.144.10.1 正常节点ip: 10.144.10.10kubectl get pods -owide |grep 10.144.10.1
tomcat-test-xxxx-xxxx .......... 10.144.10.1
kubectl exec -it tomcat-test-xxxx-xxxx -- telnet <serviceName> <port>
telnet: <serviceName>: name or service not known....
排查思路
-
确认iptables规则
登录问题主机,通过iptables -nvL查看,iptables无路由规则,则说明kube-proxy存在异常导致iptables未正常写入
-
确认kube-proxy服务
通过systemctl status kube-proxy发现存在报错:
E0221 23:06:48.798718 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:48.798973 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:49.802079 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:49.802941 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:50.810388 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:50.810436 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:51.812714 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:51.812799 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:52.816963 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:52.817650 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:53.818810 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:53.819444 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:54.821242 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
E0221 23:06:54.821894 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Unauthorized
E0221 23:06:55.823180 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Service: Unauthorized
报错为未认证,初步怀疑为kube-proxy.kubeconfig文件上下文配置及用户配置异常,并与正常节点进行比对:
异常节点配置与正常节点均为如下配置,说明上述两个配置为正常:
cat /etc/kubernetes/kube-proxy.kubeconfig
......omit......
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kube-proxy
name: default
current-context: default
kind: Config
preferences: {}
users:
- name: kube-proxy
user:
as-user-extra: {}
......omit......
3. 检查kube-proxy客户端证书文件,发现证书已过期
# 若使用内嵌式证书配置,则将编码用base64转义后查看,若使用绝对路径证书配置,则可以直接查看对应文件
echo 'sadfno1je.....' |base64 -d |openssl x509 -text |grep -A 2 Validity
Validity
Not Before: Apr 5 04:34:00 2018 GMT
Not After : Apr 5 04:34:00 2019 GMT
4. 由于kube-proxy证书中不涉及节点ip,因此可以直接用正常节点的kube-proxy.kubeconfig文件替换异常节点kube-proxy.kubeconfig文件,并重启kube-proxy服务:
# kube-proxy.kubeconfig文件路径根据kube-proxy启动配置的kubeconfig路径而定
# ps -ef |grep kube-proxy |grep kubeconfig即可确认config文件
scp root@<readyNodeIP>:/etc/kubernetes/kube-proxy.kubeconfig /etc/kubernetes/kube-proxy.kubeconfig
若无正常节点,可基于已有ca.pem证书重签kube-proxy证书及私钥
cat kube-proxy-csr.json
{
"CN": "system:kube-proxy", # user
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Beijing", # 跟原证书保持一致
"L": "Beijing", # 跟原证书保持一致
"O": "k8s", # 跟原证书保持一致
"OU": "system" # 跟原证书保持一致
}
]
}
cat ca-config.json
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h" # 将该时间加长,防止短期内再次过期
}
}
}
}
# 备份/etc/kubernetes下所有文件后,使用cfssl签发证书
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy
# 创建后将kube-proxy.pem及kube-proxy-key.pem文件进行base64转义,之后替换kube-proxy.kubeconfig中client-cert-data及client-key-data部分
# 替换后重启kube-proxy服务
systemctl restart kube-proxy
4. 查看kube-proxy状态,无Unauthorized报错,但存在iptables报错:
systemctl status kube-proxy
......omit......
kube-proxy: E0410 10:13:35.198005 9004 proxier.go:1667] Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5'
kube-proxy: Try `iptables-restore -h' for more information.
kube-proxy: )
kube-proxy: E0410 10:13:35.201017 9004 proxier.go:1667] Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5'
kube-proxy: Try `iptables-restore -h' for more information.
kube-proxy: )
5. 该问题为k8s 1.9版本与iptables-1.4.21-28.el7.x86_64版本兼容性问题(k8s高版本已修复),查看正常节点与异常节点iptables版本:
# 正常节点:
rpm -qa |grep iptables
iptables-1.4.21-16.el7.x86_64
# 异常节点:
rpm -qa |grep iptables
iptables-1.4.21-28.el7.x86_64
6. 对异常节点iptables进行降级操作(rpm包可从oracle官网下载)
rpm -Uvh --oldpackage iptables-1.4.21-16.el7.x86_64.rpm
Preparing... ######################## [100%]
Updating / installing...
1:iptables-1.4.21-16.el7 ######################## [100%]
Cleaning up / removing...
2:iptables-1.4.21-28.el7 ######################## [100%]
7. 检查rpm包已降级,重启kube-proxy后查看报错已消失,问题已解决
posted on
浙公网安备 33010602011771号