Flink源码学习(3) 入口函数阅读

      1. 注意:上一节学习到了Flink启动流程,包括initialize初始化组件。然后创建工厂对象,生产三个实例,也就是webMonitor,ResourceManager和Dispatcher三个对象。

        具体过程如下

        initializeServices

        1. commonRpcService负责处理 Flink 集群内部节点之间的远程过程调用(RPC)

          1. commonRpcService基于Akka的RpcService实现。RPC服务启动Akka参与者来接受从RpcGateway调用RPC
              1. 主要的作用就是启动一个actorSystem
            1. 具体步骤
              1. 构建一个builder对象构造器
              2. 绑定主机名和端口号
              3. 启动rpcservice
                  1. 实现类akkaRpcService
                  2. akkaRpcServiceUtils初始化actorsystem,一般情况创建remote actor system
                1. 启动一个actor
                  1. 返回一个AkkaRpcService类
                    1. return constructor.apply( actorSystem, AkkaRpcServiceConfiguration.fromConfiguration(configuration), RpcService.class.getClassLoader());
                    2. 进入之后检查端口号IP地址等操作
                    3. 启动一个supervisor = startSupervisorActor();
                        1. 查看代码
                      1. 启动一个单线程的线程池
                      2. 返回一个supervisor,是一个actor
        1. newFixedThreadPool专门负责做io

          1. 初始化一个io线程池
          2. 线程池的数量是cpu核数*4,如果当前节点有n个cpu那么当前这个ioExecutor线程数量为4n
        1. hasServices:高可用服务

          haServices = createHaServices(configuration, ioExecutor, rpcSystem);
        具体的代码,主要是包装一个zookeeper对象,用的curatorFrameworkWrapper
        public static HighAvailabilityServices createHighAvailabilityServices(
                Configuration configuration,
                Executor executor,
                AddressResolution addressResolution,
                RpcSystemUtils rpcSystemUtils,
                FatalErrorHandler fatalErrorHandler)
                throws Exception {
            //在flink-conf中正常来说配置zookeeper
            HighAvailabilityMode highAvailabilityMode = HighAvailabilityMode.fromConfig(configuration);
            
            switch (highAvailabilityMode) {
                case NONE:
                    final Tuple2<String, Integer> hostnamePort = getJobManagerAddress(configuration);
        
                    final String resourceManagerRpcUrl =
                            rpcSystemUtils.getRpcUrl(
                                    hostnamePort.f0,
                                    hostnamePort.f1,
                                    RpcServiceUtils.createWildcardName(
                                            ResourceManager.RESOURCE_MANAGER_NAME),
                                    addressResolution,
                                    configuration);
                    final String dispatcherRpcUrl =
                            rpcSystemUtils.getRpcUrl(
                                    hostnamePort.f0,
                                    hostnamePort.f1,
                                    RpcServiceUtils.createWildcardName(Dispatcher.DISPATCHER_NAME),
                                    addressResolution,
                                    configuration);
                    final String webMonitorAddress =
                            getWebMonitorAddress(configuration, addressResolution);
        
                    return new StandaloneHaServices(
                            resourceManagerRpcUrl, dispatcherRpcUrl, webMonitorAddress);
                case ZOOKEEPER:
                    return createZooKeeperHaServices(configuration, executor, fatalErrorHandler);
                case KUBERNETES:
                    return createCustomHAServices(
                            "org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory",
                            configuration,
                            executor);
        
                case FACTORY_CLASS:
                    return createCustomHAServices(configuration, executor);
        
                default:
                    throw new Exception("Recovery mode " + highAvailabilityMode + " is not supported.");
            }
        }

         

        1. blobServices:大文件传输服务

        主要管理jar包,TM上传log文件
        blobServer =
                BlobUtils.createBlobServer(
                        configuration,
                        Reference.borrowed(workingDirectory.unwrap().getBlobStorageDirectory()),
                        haServices.createBlobStore());
        blobServer.start();

         

        1. heartbeatServices:心跳服务

        在主节点中,有很多角色都有心跳服务,这些角色的心跳服务都是在这个服务的基础之上创建,这个是专门给各个组件提供一个心跳服务的实例的。这个服务是提供心跳服务的服务。
        heartbeatServices = createHeartbeatServices(configuration);
        1. metricRegistry:监控服务

        系统监控,跟踪性能监控服务,跟踪所有已注册的metric
        metricRegistry.startQueryService(metricQueryServiceRpcService, null);
        1. executionGraphInfoStore :存储ExecutreJobGraph

        executionGraphInfoStore =
                createSerializableExecutionGraphStore(
                        configuration, commonRpcService.getScheduledExecutor());
        负责存储和管理作业的执行图(Execution Graph)信息
        目的是用来存储execution Graph的服务,有两种形式
        memory主要是在内存中缓存
        file会持久化到文件中,默认是文件
         
        1. 创建工厂实例clusterComponent

        用于创建:
        webMonitor
        ResourceManager
        dispatcherResource
        1. webMonotor

        用于接受客户端的请求
        创建一个对象,是一个RpcEndpoint子类
        public WebMonitorEndpoint<DispatcherGateway> createRestEndpoint(
                Configuration configuration,
                LeaderGatewayRetriever<DispatcherGateway> dispatcherGatewayRetriever,
                LeaderGatewayRetriever<ResourceManagerGateway> resourceManagerGatewayRetriever,
                TransientBlobService transientBlobService,
                ScheduledExecutorService executor,
                MetricFetcher metricFetcher,
                LeaderElectionService leaderElectionService,
                FatalErrorHandler fatalErrorHandler)
                throws Exception {
            final RestHandlerConfiguration restHandlerConfiguration =
                    RestHandlerConfiguration.fromConfiguration(configuration);
        
            return new DispatcherRestEndpoint(
                    dispatcherGatewayRetriever,
                    configuration,
                    restHandlerConfiguration,
                    resourceManagerGatewayRetriever,
                    transientBlobService,
                    executor,
                    metricFetcher,
                    leaderElectionService,
                    RestEndpointFactory.createExecutionGraphCache(restHandlerConfiguration),
                    fatalErrorHandler);
        }
        注意,webMonitorEndpoint并不是RpcEndpoint的子类
        接受客户端各种提交的请求
        父类是RestServerEndpoint,作用是初始化很多Handler
        观察webMonitorEndpoint,初始化了一堆Handler
        @Override
        protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(
                final CompletableFuture<String> localAddressFuture) {
            ArrayList<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers =
                    new ArrayList<>(30);
        
            final Collection<Tuple2<RestHandlerSpecification, ChannelInboundHandler>>
                    webSubmissionHandlers = initializeWebSubmissionHandlers(localAddressFuture);
            handlers.addAll(webSubmissionHandlers);
            final boolean hasWebSubmissionHandlers = !webSubmissionHandlers.isEmpty();
        
            final Duration asyncOperationStoreDuration =
                    clusterConfiguration.get(RestOptions.ASYNC_OPERATION_STORE_DURATION);
            final Time timeout = restConfiguration.getTimeout();
        
            ClusterOverviewHandler clusterOverviewHandler =
                    new ClusterOverviewHandler(
                            leaderRetriever,
                            timeout,
                            responseHeaders,
                            ClusterOverviewHeaders.getInstance());
        
            DashboardConfigHandler dashboardConfigHandler =
                    new DashboardConfigHandler(
                            leaderRetriever,
                            timeout,
                            responseHeaders,
                            DashboardConfigurationHeaders.getInstance(),
                            restConfiguration.getRefreshInterval(),
                            hasWebSubmissionHandlers,
                            restConfiguration.isWebCancelEnabled());
        
            JobIdsHandler jobIdsHandler =
                    new JobIdsHandler(
                            leaderRetriever,
                            timeout,
                            responseHeaders,
                            JobIdsWithStatusesOverviewHeaders.getInstance());
        
            JobStatusHandler jobStatusHandler =
                    new JobStatusHandler(
                            leaderRetriever,
                            timeout,
                            responseHeaders,
                            JobStatusInfoHeaders.getInstance());
                        
                        ……

         

        这些Handler的作用就是flink web业务的rest服务,Handler==Servlet
        然后看父类的RestEndpoint
        启动的时候都会进行选举
        获胜的角色会调用isLeader方法=》this.grantLeaderShip(),失败notLeader
        然后进行一个定时任务
        这个定时任务专门用来cleanup这些文件是否超过生命周期,要被删除
         
        总结:
        这个服务是用来启动native服务端,绑定后处理器,如果接到客户端restful请求,由webmonitor接受处理,看对应请求执行对应handler
         
        1. resourceManager

        首先:创建ResourceManager,启动ManagerService
        leader Election Service选举后调用isLeader方法
        主节点启动,进入HeartbeatMonitor,如果从节点返回心跳,会被加入heartbeatMonitor
        管理所有的心跳目标对象
        1. 开启定时任务checkTaskManageTimeouts检查TaskManager的心跳
        1. 开启第二个定时任务,检查slotRequest
         
        1. Dispatcher

        主要职责:负责接收用户提交的JobGraph然后启动一个JobManager
        在ZooKeeperLeaderElectionService中defaultDispatcherRunner
        1. 先stopDispatcherLeaderProcess停掉现有的
        2. runActionIfRunning开启dispatcher服务,startNewDispatcherLeaderProcess
          1. 执行onStart方法,启动JobGraphStore,一个用来存储JobGraph的存储组件
        3. 在SessionDispatcherLeaderProcess中启动了createDispatcherIfRunning
          1. createDispatcher方法
            //调用dispatcherGatewayServiceFactory
            Dispatcher = dispatcherFactory.CREATEdISPATCHER
            dispatcher.start()
                调用初始化
                dispatcherBootstraph.initialize
                把所有中断的job恢复执行。如果重新启动的话把之前的执行一半的程序恢复起来,底层调用函数恢复中断任务的执行
                正常提交一个job的时候,由dispatcher接收到来继续提交执行
                提交一个job的时候,startJobManagerRunner来启动一个jobManager
                因此Dispatcher启动的时候有一件重要的事情,就是恢复节点宕机的任务
posted @ 2024-04-03 00:16  Heinrich♣  阅读(24)  评论(0编辑  收藏  举报