k8s的flannel的pod运行一段时间init error
问题现象
使用Kubeadm部署的flannel网络运行一段时间后,提示init:Error错误,查看具体的信息如下:
[root@node1 ~]# kubectl describe pod kube-flannel-ds-amd64-cglhm -n kube-system
Name: kube-flannel-ds-amd64-cglhm
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: node1/192.168.1.205
Start Time: Wed, 09 Jan 2019 22:34:28 -0500
Labels: app=flannel
controller-revision-hash=6bbd4cd779
pod-template-generation=1
tier=node
Annotations: <none>
Status: Running
IP: 192.168.1.205
Controlled By: DaemonSet/kube-flannel-ds-amd64
Init Containers:
install-cni:
Container ID:
Image: quay.io/coreos/flannel:v0.10.0-amd64
Image ID:
Port: <none>
Host Port: <none>
Command:
cp
Args:
-f
/etc/kube-flannel/cni-conf.json
/etc/cni/net.d/10-flannel.conflist
State: Waiting
Reason: RunContainerError
Last State: Terminated
Reason: ContainerCannotRun
Message: OCI runtime create failed: docker-runc did not terminate sucessfully: unknown
Exit Code: 128
Started: Thu, 10 Jan 2019 15:47:59 -0500
Finished: Thu, 10 Jan 2019 15:47:59 -0500
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/etc/cni/net.d from cni (rw)
/etc/kube-flannel/ from flannel-cfg (rw)
/var/run/secrets/kubernetes.io/serviceaccount from flannel-token-4px5t (ro)
Containers:
kube-flannel:
Container ID: docker://d80792918c91bddb163dccecc563233140dc184db56154aa162898ee0507d98b
Image: quay.io/coreos/flannel:v0.10.0-amd64
Image ID: docker://sha256:f0fad859c909baef1b038ef8d2f6e76fc252e25a3d9af37b82ce70623fb7cd6f
Port: <none>
Host Port: <none>
Command:
/opt/bin/flanneld
Args:
--ip-masq
--kube-subnet-mgr
State: Waiting
Reason: RunContainerError
Last State: Terminated
Reason: ContainerCannotRun
Message: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:70: creating device nodes caused \\\"cannot allocate memory\\\"\"": unknown
Exit Code: 128
Started: Thu, 10 Jan 2019 15:47:53 -0500
Finished: Thu, 10 Jan 2019 15:47:53 -0500
Ready: False
Restart Count: 38
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: kube-flannel-ds-amd64-cglhm (v1:metadata.name)
POD_NAMESPACE: kube-system (v1:metadata.namespace)
Mounts:
/etc/kube-flannel/ from flannel-cfg (rw)
/run from run (rw)
/var/run/secrets/kubernetes.io/serviceaccount from flannel-token-4px5t (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
run:
Type: HostPath (bare host directory volume)
Path: /run
HostPathType:
cni:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
flannel-cfg:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-flannel-cfg
Optional: false
flannel-token-4px5t:
Type: Secret (a volume populated by a Secret)
SecretName: flannel-token-4px5t
Optional: false
QoS Class: Guaranteed
Node-Selectors: beta.kubernetes.io/arch=amd64
Tolerations: :NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 34m (x10524 over 4h23m) kubelet, node1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-flannel-ds-amd64-cglhm": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"signal: broken pipe\"": unknown
Normal SandboxChanged 4m58s (x12379 over 15h) kubelet, node1 Pod sandbox changed, it will be killed and re-created.
[root@node1 ~]# docker version
Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:23:03 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:25:29 2018
OS/Arch: linux/amd64
Experimental: false
问题解决办法
我们查看kube-flannel默认pod分配的内存为50M,网络负载较大时,内存资源是不够的,导致Pod退出,提示Error
[root@node1 home]# cat kube-flannel.yml |grep memory
memory: "50Mi"
memory: "50Mi"
memory: "50Mi"
memory: "50Mi"
memory: "50Mi"
memory: "50Mi"
memory: "50Mi"
memory: "50Mi"
memory: "50Mi"
memory: "50Mi"
修改kube-flannel的memory值为100Mi以上
[root@node1 ~]# cat kube-flannel.yml |grep memory
memory: "100Mi"
memory: "100Mi"
memory: "100Mi"
memory: "100Mi"
memory: "100Mi"
memory: "100Mi"
memory: "100Mi"
memory: "100Mi"
memory: "100Mi"
memory: "100Mi"