EndpointController更新endpoint

因kcm异常而没有更新endpoint

停止kube-controller-manager

删除Pod coredns后endpoint没有更新

kube-proxy没有更新svc kube-dns

启动kube-controller-manager后，去掉了异常coredns Pod IP

pkg/controller/endpoint/endpoints_controller.go
syncService函数
更新endpoints

kube-proxy去掉该Pod IP

针对节点宕机场景，kube-controller-manager每隔5s检查kubelet是否上报心跳，在40s内没收到心跳后会更新pod状态为NotReady和endpoint摘流。

节点重启时间是19:35:07

NodeLifecycleController在19:35:46因kubelet 40s内没有上报心跳，把节点更新成了NotReady，把Pod coredns的状态更新成了NotReady。

NodeLifecycleController间隔5s检查最近40s内kubelet是否上报了心跳

pkg/controller/nodelifecycle/node_lifecycle_controller.go
Run函数
异步更新节点状态

pkg/controller/nodelifecycle/node_lifecycle_controller.go
tryUpdateNodeHealth函数

gracePeriod是40s

Pod状态更新后，监听Pod状态变化的EndpointController会更新endpoint

pkg/controller/endpoint/endpoints_controller.go
podChanged函数
Pod ready状态变化会触发EndpointController更新endpoint

19:35:46 kube-proxy完成摘流

针对k8s管理面节点宕机场景，为了访问到正常的kube-apiserver，除了重试访问service之外，业务服务list/watch kube-apiserver endpoints，遍历endpoints列表来调用。

posted on 2023-05-20 21:33 王景迁阅读(158) 评论(0) 收藏举报

刷新页面返回顶部