Kubernetes ControllerManager 源码解析
Kubernetes ControllerManager 负责管理各种 Controller,同时也是各种 Controller 源码阅读入口,在看 ControllerManager 源码前,首先可以大致看下最新版本中有哪些 Controller。
// First add "special" controllers that aren't initialized normally. These controllers cannot be initialized // in the main controller loop initialization, so we add them here only for the metadata and duplication detection. // app.ControllerDescriptor#RequiresSpecialHandling should return true for such controllers // The only known special case is the ServiceAccountTokenController which *must* be started // first to ensure that the SA tokens for future controllers will exist. Think very carefully before adding new // special controllers. register(newServiceAccountTokenControllerDescriptor(nil)) register(newEndpointsControllerDescriptor()) register(newEndpointSliceControllerDescriptor()) register(newEndpointSliceMirroringControllerDescriptor()) register(newReplicationControllerDescriptor()) register(newPodGarbageCollectorControllerDescriptor()) register(newResourceQuotaControllerDescriptor()) register(newNamespaceControllerDescriptor()) register(newServiceAccountControllerDescriptor()) register(newGarbageCollectorControllerDescriptor()) register(newDaemonSetControllerDescriptor()) register(newJobControllerDescriptor()) register(newDeploymentControllerDescriptor()) register(newReplicaSetControllerDescriptor()) register(newHorizontalPodAutoscalerControllerDescriptor()) register(newDisruptionControllerDescriptor()) register(newStatefulSetControllerDescriptor()) register(newCronJobControllerDescriptor()) register(newCertificateSigningRequestSigningControllerDescriptor()) register(newCertificateSigningRequestApprovingControllerDescriptor()) register(newCertificateSigningRequestCleanerControllerDescriptor()) register(newTTLControllerDescriptor()) register(newBootstrapSignerControllerDescriptor()) register(newTokenCleanerControllerDescriptor()) register(newNodeIpamControllerDescriptor()) register(newNodeLifecycleControllerDescriptor()) register(newServiceLBControllerDescriptor()) // cloud provider controller register(newNodeRouteControllerDescriptor()) // cloud provider controller register(newCloudNodeLifecycleControllerDescriptor()) // cloud provider controller // TODO: persistent volume controllers into the IncludeCloudLoops only set as a cloud provider controller. register(newPersistentVolumeBinderControllerDescriptor()) register(newPersistentVolumeAttachDetachControllerDescriptor()) register(newPersistentVolumeExpanderControllerDescriptor()) register(newClusterRoleAggregrationControllerDescriptor()) register(newPersistentVolumeClaimProtectionControllerDescriptor()) register(newPersistentVolumeProtectionControllerDescriptor()) register(newTTLAfterFinishedControllerDescriptor()) register(newRootCACertificatePublisherControllerDescriptor()) register(newEphemeralVolumeControllerDescriptor()) // feature gated register(newStorageVersionGarbageCollectorControllerDescriptor()) register(newResourceClaimControllerDescriptor()) register(newLegacyServiceAccountTokenCleanerControllerDescriptor()) register(newValidatingAdmissionPolicyStatusControllerDescriptor()) register(newTaintEvictionControllerDescriptor()) register(newServiceCIDRsControllerDescriptor()) register(newStorageVersionMigratorControllerDescriptor())
可以看到根据源码的空行,大致给这些 Controller 分了类。
- 首先是账号校验准入,账号鉴权是一切操作的基础嘛。
- 接着是基础功能的 Controller,Namespace ResourceQuota Deployment等。
- 之后是对接云厂商 LB 的相关 Controller。
- 再往下是存储相关的 Controller。
- 最后是一些新特性相关的 Controller。
大家对哪个 Controller 源码感兴趣可以从这点进去看看,参考上一篇 Kubernetes DeploymentController 源码解析
下面开始本篇正文,一起来看下 ControllerManager 的源码
从入口main方法开始,进程直接 Run 方法启动,代码如下:
// Run runs the KubeControllerManagerOptions. func Run(ctx context.Context, c *config.CompletedConfig) error { logger := klog.FromContext(ctx) stopCh := ctx.Done() // To help debugging, immediately log version logger.Info("Starting", "version", version.Get()) logger.Info("Golang settings", "GOGC", os.Getenv("GOGC"), "GOMAXPROCS", os.Getenv("GOMAXPROCS"), "GOTRACEBACK", os.Getenv("GOTRACEBACK")) // Start events processing pipeline. c.EventBroadcaster.StartStructuredLogging(0) c.EventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: c.Client.CoreV1().Events("")}) defer c.EventBroadcaster.Shutdown() if cfgz, err := configz.New(ConfigzName); err == nil { cfgz.Set(c.ComponentConfig) } else { logger.Error(err, "Unable to register configz") } // Setup any healthz checks we will want to use. var checks []healthz.HealthChecker var electionChecker *leaderelection.HealthzAdaptor if c.ComponentConfig.Generic.LeaderElection.LeaderElect { electionChecker = leaderelection.NewLeaderHealthzAdaptor(time.Second * 20) checks = append(checks, electionChecker) } healthzHandler := controllerhealthz.NewMutableHealthzHandler(checks...) // Start the controller manager HTTP server // unsecuredMux is the handler for these controller *after* authn/authz filters have been applied var unsecuredMux *mux.PathRecorderMux if c.SecureServing != nil { unsecuredMux = genericcontrollermanager.NewBaseHandler(&c.ComponentConfig.Generic.Debugging, healthzHandler) slis.SLIMetricsWithReset{}.Install(unsecuredMux) handler := genericcontrollermanager.BuildHandlerChain(unsecuredMux, &c.Authorization, &c.Authentication) // TODO: handle stoppedCh and listenerStoppedCh returned by c.SecureServing.Serve if _, _, err := c.SecureServing.Serve(handler, 0, stopCh); err != nil { return err } } clientBuilder, rootClientBuilder := createClientBuilders(logger, c) saTokenControllerDescriptor := newServiceAccountTokenControllerDescriptor(rootClientBuilder) run := func(ctx context.Context, controllerDescriptors map[string]*ControllerDescriptor) { controllerContext, err := CreateControllerContext(ctx, c, rootClientBuilder, clientBuilder) if err != nil { logger.Error(err, "Error building controller context") klog.FlushAndExit(klog.ExitFlushTimeout, 1) } if err := StartControllers(ctx, controllerContext, controllerDescriptors, unsecuredMux, healthzHandler); err != nil { logger.Error(err, "Error starting controllers") klog.FlushAndExit(klog.ExitFlushTimeout, 1) } controllerContext.InformerFactory.Start(stopCh) controllerContext.ObjectOrMetadataInformerFactory.Start(stopCh) close(controllerContext.InformersStarted) <-ctx.Done() } // No leader election, run directly if !c.ComponentConfig.Generic.LeaderElection.LeaderElect { controllerDescriptors := NewControllerDescriptors() controllerDescriptors[names.ServiceAccountTokenController] = saTokenControllerDescriptor run(ctx, controllerDescriptors) return nil } id, err := os.Hostname() if err != nil { return err } // add a uniquifier so that two processes on the same host don't accidentally both become active id = id + "_" + string(uuid.NewUUID()) // leaderMigrator will be non-nil if and only if Leader Migration is enabled. var leaderMigrator *leadermigration.LeaderMigrator = nil // If leader migration is enabled, create the LeaderMigrator and prepare for migration if leadermigration.Enabled(&c.ComponentConfig.Generic) { logger.Info("starting leader migration") leaderMigrator = leadermigration.NewLeaderMigrator(&c.ComponentConfig.Generic.LeaderMigration, "kube-controller-manager") // startSATokenControllerInit is the original InitFunc. startSATokenControllerInit := saTokenControllerDescriptor.GetInitFunc() // Wrap saTokenControllerDescriptor to signal readiness for migration after starting // the controller. saTokenControllerDescriptor.initFunc = func(ctx context.Context, controllerContext ControllerContext, controllerName string) (controller.Interface, bool, error) { defer close(leaderMigrator.MigrationReady) return startSATokenControllerInit(ctx, controllerContext, controllerName) } } // Start the main lock go leaderElectAndRun(ctx, c, id, electionChecker, c.ComponentConfig.Generic.LeaderElection.ResourceLock, c.ComponentConfig.Generic.LeaderElection.ResourceName, leaderelection.LeaderCallbacks{ OnStartedLeading: func(ctx context.Context) { controllerDescriptors := NewControllerDescriptors() if leaderMigrator != nil { // If leader migration is enabled, we should start only non-migrated controllers // for the main lock. controllerDescriptors = filteredControllerDescriptors(controllerDescriptors, leaderMigrator.FilterFunc, leadermigration.ControllerNonMigrated) logger.Info("leader migration: starting main controllers.") } controllerDescriptors[names.ServiceAccountTokenController] = saTokenControllerDescriptor run(ctx, controllerDescriptors) }, OnStoppedLeading: func() { logger.Error(nil, "leaderelection lost") klog.FlushAndExit(klog.ExitFlushTimeout, 1) }, }) // If Leader Migration is enabled, proceed to attempt the migration lock. if leaderMigrator != nil { // Wait for Service Account Token Controller to start before acquiring the migration lock. // At this point, the main lock must have already been acquired, or the KCM process already exited. // We wait for the main lock before acquiring the migration lock to prevent the situation // where KCM instance A holds the main lock while KCM instance B holds the migration lock. <-leaderMigrator.MigrationReady // Start the migration lock. go leaderElectAndRun(ctx, c, id, electionChecker, c.ComponentConfig.Generic.LeaderMigration.ResourceLock, c.ComponentConfig.Generic.LeaderMigration.LeaderName, leaderelection.LeaderCallbacks{ OnStartedLeading: func(ctx context.Context) { logger.Info("leader migration: starting migrated controllers.") controllerDescriptors := NewControllerDescriptors() controllerDescriptors = filteredControllerDescriptors(controllerDescriptors, leaderMigrator.FilterFunc, leadermigration.ControllerMigrated) // DO NOT start saTokenController under migration lock delete(controllerDescriptors, names.ServiceAccountTokenController) run(ctx, controllerDescriptors) }, OnStoppedLeading: func() { logger.Error(nil, "migration leaderelection lost") klog.FlushAndExit(klog.ExitFlushTimeout, 1) }, }) } <-stopCh return nil }
首先打印当前的版本信息,注册事件广播(EventBroadcaster),顺便提一下 EventBroadcaster 是基本每个组件启动时候都需要的注册的,用来向 apiserver 报告组件相关的事件,通过 apiserver 保存到 etcd中。
之后会注册 HealthChecker 来对外通过 RESTful API 暴露组件自身的状态信息,接着声明 run 方法,启动所有的 Controller 都是在这个 run 方法里,在这里声明可以在后面复用 run 方法中的逻辑。
后面的代码主要是进行 ControllerManager 的 Leader 选举,若果禁用选举直接初始化所有 Controller (NewControllerDescriptors) 执行 run,或者是选举后的 Leader 实例来初始化所有 Controller (NewControllerDescriptors) 执行 run。
在 run 方法中调用 StartControllers 方法来启动所有的 Controller。
// StartControllers starts a set of controllers with a specified ControllerContext func StartControllers(ctx context.Context, controllerCtx ControllerContext, controllerDescriptors map[string]*ControllerDescriptor, unsecuredMux *mux.PathRecorderMux, healthzHandler *controllerhealthz.MutableHealthzHandler) error { var controllerChecks []healthz.HealthChecker // Always start the SA token controller first using a full-power client, since it needs to mint tokens for the rest // If this fails, just return here and fail since other controllers won't be able to get credentials. if serviceAccountTokenControllerDescriptor, ok := controllerDescriptors[names.ServiceAccountTokenController]; ok { check, err := StartController(ctx, controllerCtx, serviceAccountTokenControllerDescriptor, unsecuredMux) if err != nil { return err } if check != nil { // HealthChecker should be present when controller has started controllerChecks = append(controllerChecks, check) } } // Initialize the cloud provider with a reference to the clientBuilder only after token controller // has started in case the cloud provider uses the client builder. if controllerCtx.Cloud != nil { controllerCtx.Cloud.Initialize(controllerCtx.ClientBuilder, ctx.Done()) } // Each controller is passed a context where the logger has the name of // the controller set through WithName. That name then becomes the prefix of // of all log messages emitted by that controller. // // In StartController, an explicit "controller" key is used instead, for two reasons: // - while contextual logging is alpha, klog.LoggerWithName is still a no-op, // so we cannot rely on it yet to add the name // - it allows distinguishing between log entries emitted by the controller // and those emitted for it - this is a bit debatable and could be revised. for _, controllerDesc := range controllerDescriptors { if controllerDesc.RequiresSpecialHandling() { continue } check, err := StartController(ctx, controllerCtx, controllerDesc, unsecuredMux) if err != nil { return err } if check != nil { // HealthChecker should be present when controller has started controllerChecks = append(controllerChecks, check) } } healthzHandler.AddHealthChecker(controllerChecks...) return nil }
需要首先启动 serviceAccountTokenController,因为后面其他 Controller 都依赖它来获取 credentials。接着通过 for range 遍历 controllerDescriptors 启动所有 Controller,同时把 Controller 各自的 HealthChecker 注册到 healthzHandler。
// StartController starts a controller with a specified ControllerContext // and performs required pre- and post- checks/actions func StartController(ctx context.Context, controllerCtx ControllerContext, controllerDescriptor *ControllerDescriptor, unsecuredMux *mux.PathRecorderMux) (healthz.HealthChecker, error) { logger := klog.FromContext(ctx) controllerName := controllerDescriptor.Name() for _, featureGate := range controllerDescriptor.GetRequiredFeatureGates() { if !utilfeature.DefaultFeatureGate.Enabled(featureGate) { logger.Info("Controller is disabled by a feature gate", "controller", controllerName, "requiredFeatureGates", controllerDescriptor.GetRequiredFeatureGates()) return nil, nil } } if controllerDescriptor.IsCloudProviderController() && controllerCtx.LoopMode != IncludeCloudLoops { logger.Info("Skipping a cloud provider controller", "controller", controllerName, "loopMode", controllerCtx.LoopMode) return nil, nil } if !controllerCtx.IsControllerEnabled(controllerDescriptor) { logger.Info("Warning: controller is disabled", "controller", controllerName) return nil, nil } time.Sleep(wait.Jitter(controllerCtx.ComponentConfig.Generic.ControllerStartInterval.Duration, ControllerStartJitter)) logger.V(1).Info("Starting controller", "controller", controllerName) initFunc := controllerDescriptor.GetInitFunc() ctrl, started, err := initFunc(klog.NewContext(ctx, klog.LoggerWithName(logger, controllerName)), controllerCtx, controllerName) if err != nil { logger.Error(err, "Error starting controller", "controller", controllerName) return nil, err } if !started { logger.Info("Warning: skipping controller", "controller", controllerName) return nil, nil } check := controllerhealthz.NamedPingChecker(controllerName) if ctrl != nil { // check if the controller supports and requests a debugHandler // and it needs the unsecuredMux to mount the handler onto. if debuggable, ok := ctrl.(controller.Debuggable); ok && unsecuredMux != nil { if debugHandler := debuggable.DebuggingHandler(); debugHandler != nil { basePath := "/debug/controllers/" + controllerName unsecuredMux.UnlistedHandle(basePath, http.StripPrefix(basePath, debugHandler)) unsecuredMux.UnlistedHandlePrefix(basePath+"/", http.StripPrefix(basePath, debugHandler)) } } if healthCheckable, ok := ctrl.(controller.HealthCheckable); ok { if realCheck := healthCheckable.HealthChecker(); realCheck != nil { check = controllerhealthz.NamedHealthChecker(controllerName, realCheck) } } } logger.Info("Started controller", "controller", controllerName) return check, nil }
StartController 首先就是三个判断,分别是如果没有开启默认特性,如果是云提供商的插件,如果被禁用,直接 return nil 跳过启动,接下来一行很有趣。
time.Sleep(wait.Jitter(controllerCtx.ComponentConfig.Generic.ControllerStartInterval.Duration, ControllerStartJitter))
在前面 for range 中调用 StartController 在启动每个 Controller 时候加一个 Jitter 抖动时间,默认是1秒,避免压力过大。
后面开始正式启动对应的 Controller,先拿到对应的 initFunc 然后执行,之后封装一个上面说到的 check 返回给上层。
接着看 initFunc 这个方法
type ControllerDescriptor struct { name string initFunc InitFunc requiredFeatureGates []featuregate.Feature aliases []string isDisabledByDefault bool isCloudProviderController bool requiresSpecialHandling bool } func (r *ControllerDescriptor) Name() string { return r.name } func (r *ControllerDescriptor) GetInitFunc() InitFunc { return r.initFunc }
那么 initFunc 是从哪来的,如果注意到前面 run 初始化所有 Controller 的方法 NewControllerDescriptors,在初始化的时候每个 Controller 实例会构建一个 ControllerDescriptor,然后调用 register 方法注册自己,
我们以 DeploymentController 看下。
register(newDeploymentControllerDescriptor())
func newDeploymentControllerDescriptor() *ControllerDescriptor { return &ControllerDescriptor{ name: names.DeploymentController, aliases: []string{"deployment"}, initFunc: startDeploymentController, } }
看到这其实就很清晰了,我们执行 DeploymentController 的 initFunc 方法其实执行的就是 startDeploymentController,其他的 Controller 同理。startDeploymentController 后续参考我前面的文章 Kubernetes DeploymentController 源码解析
到这里 Kubernetes ControllerManager 的启动源码已经结束了,剩下的就是各个 Controller 各自功能的部分了,读者可以根据本文开头的 Controller 列表挑选自己感兴趣的或者工作需用到的阅读就好,不需要每个Controller 的源码都看一遍。