Flink源码学习（3）入口函数阅读

注意：上一节学习到了Flink启动流程，包括initialize初始化组件。然后创建工厂对象，生产三个实例，也就是webMonitor，ResourceManager和Dispatcher三个对象。

具体过程如下

initializeServices

commonRpcService负责处理 Flink 集群内部节点之间的远程过程调用（RPC）
1. commonRpcService基于Akka的RpcService实现。RPC服务启动Akka参与者来接受从RpcGateway调用RPC
  1. 具体步骤
    1. 构建一个builder对象构造器
    2. 绑定主机名和端口号
    3. 启动rpcservice
      
      实现类akkaRpcService
      
      akkaRpcServiceUtils初始化actorsystem，一般情况创建remote actor system
      
      启动一个actor
      
      返回一个AkkaRpcService类
      
      return constructor.apply( actorSystem, AkkaRpcServiceConfiguration.fromConfiguration(configuration), RpcService.class.getClassLoader());
      
      进入之后检查端口号IP地址等操作
      
      启动一个supervisor = startSupervisorActor();
      
      查看代码
      
      启动一个单线程的线程池
      
      返回一个supervisor，是一个actor

newFixedThreadPool专门负责做io
1. 初始化一个io线程池
2. 线程池的数量是cpu核数*4，如果当前节点有n个cpu那么当前这个ioExecutor线程数量为4n

hasServices:高可用服务

haServices = createHaServices(configuration, ioExecutor, rpcSystem);

具体的代码，主要是包装一个zookeeper对象，用的curatorFrameworkWrapper

public static HighAvailabilityServices createHighAvailabilityServices(
        Configuration configuration,
        Executor executor,
        AddressResolution addressResolution,
        RpcSystemUtils rpcSystemUtils,
        FatalErrorHandler fatalErrorHandler)
        throws Exception {
    //在flink-conf中正常来说配置zookeeper
    HighAvailabilityMode highAvailabilityMode = HighAvailabilityMode.fromConfig(configuration);
    
    switch (highAvailabilityMode) {
        case NONE:
            final Tuple2<String, Integer> hostnamePort = getJobManagerAddress(configuration);

            final String resourceManagerRpcUrl =
                    rpcSystemUtils.getRpcUrl(
                            hostnamePort.f0,
                            hostnamePort.f1,
                            RpcServiceUtils.createWildcardName(
                                    ResourceManager.RESOURCE_MANAGER_NAME),
                            addressResolution,
                            configuration);
            final String dispatcherRpcUrl =
                    rpcSystemUtils.getRpcUrl(
                            hostnamePort.f0,
                            hostnamePort.f1,
                            RpcServiceUtils.createWildcardName(Dispatcher.DISPATCHER_NAME),
                            addressResolution,
                            configuration);
            final String webMonitorAddress =
                    getWebMonitorAddress(configuration, addressResolution);

            return new StandaloneHaServices(
                    resourceManagerRpcUrl, dispatcherRpcUrl, webMonitorAddress);
        case ZOOKEEPER:
            return createZooKeeperHaServices(configuration, executor, fatalErrorHandler);
        case KUBERNETES:
            return createCustomHAServices(
                    "org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory",
                    configuration,
                    executor);

        case FACTORY_CLASS:
            return createCustomHAServices(configuration, executor);

        default:
            throw new Exception("Recovery mode " + highAvailabilityMode + " is not supported.");
    }
}

blobServices:大文件传输服务

主要管理jar包，TM上传log文件

blobServer =
        BlobUtils.createBlobServer(
                configuration,
                Reference.borrowed(workingDirectory.unwrap().getBlobStorageDirectory()),
                haServices.createBlobStore());
blobServer.start();

heartbeatServices:心跳服务

在主节点中，有很多角色都有心跳服务，这些角色的心跳服务都是在这个服务的基础之上创建，这个是专门给各个组件提供一个心跳服务的实例的。这个服务是提供心跳服务的服务。

heartbeatServices = createHeartbeatServices(configuration);

metricRegistry:监控服务

系统监控，跟踪性能监控服务，跟踪所有已注册的metric

metricRegistry.startQueryService(metricQueryServiceRpcService, null);

executionGraphInfoStore :存储ExecutreJobGraph

executionGraphInfoStore =
        createSerializableExecutionGraphStore(
                configuration, commonRpcService.getScheduledExecutor());

负责存储和管理作业的执行图（Execution Graph）信息

目的是用来存储execution Graph的服务，有两种形式

memory主要是在内存中缓存

file会持久化到文件中，默认是文件

创建工厂实例clusterComponent

用于创建：

webMonitor
ResourceManager
dispatcherResource

webMonotor

用于接受客户端的请求

创建一个对象，是一个RpcEndpoint子类

public WebMonitorEndpoint<DispatcherGateway> createRestEndpoint(
        Configuration configuration,
        LeaderGatewayRetriever<DispatcherGateway> dispatcherGatewayRetriever,
        LeaderGatewayRetriever<ResourceManagerGateway> resourceManagerGatewayRetriever,
        TransientBlobService transientBlobService,
        ScheduledExecutorService executor,
        MetricFetcher metricFetcher,
        LeaderElectionService leaderElectionService,
        FatalErrorHandler fatalErrorHandler)
        throws Exception {
    final RestHandlerConfiguration restHandlerConfiguration =
            RestHandlerConfiguration.fromConfiguration(configuration);

    return new DispatcherRestEndpoint(
            dispatcherGatewayRetriever,
            configuration,
            restHandlerConfiguration,
            resourceManagerGatewayRetriever,
            transientBlobService,
            executor,
            metricFetcher,
            leaderElectionService,
            RestEndpointFactory.createExecutionGraphCache(restHandlerConfiguration),
            fatalErrorHandler);
}

注意，webMonitorEndpoint并不是RpcEndpoint的子类

接受客户端各种提交的请求

父类是RestServerEndpoint，作用是初始化很多Handler

观察webMonitorEndpoint，初始化了一堆Handler

@Override
protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(
        final CompletableFuture<String> localAddressFuture) {
    ArrayList<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers =
            new ArrayList<>(30);

    final Collection<Tuple2<RestHandlerSpecification, ChannelInboundHandler>>
            webSubmissionHandlers = initializeWebSubmissionHandlers(localAddressFuture);
    handlers.addAll(webSubmissionHandlers);
    final boolean hasWebSubmissionHandlers = !webSubmissionHandlers.isEmpty();

    final Duration asyncOperationStoreDuration =
            clusterConfiguration.get(RestOptions.ASYNC_OPERATION_STORE_DURATION);
    final Time timeout = restConfiguration.getTimeout();

    ClusterOverviewHandler clusterOverviewHandler =
            new ClusterOverviewHandler(
                    leaderRetriever,
                    timeout,
                    responseHeaders,
                    ClusterOverviewHeaders.getInstance());

    DashboardConfigHandler dashboardConfigHandler =
            new DashboardConfigHandler(
                    leaderRetriever,
                    timeout,
                    responseHeaders,
                    DashboardConfigurationHeaders.getInstance(),
                    restConfiguration.getRefreshInterval(),
                    hasWebSubmissionHandlers,
                    restConfiguration.isWebCancelEnabled());

    JobIdsHandler jobIdsHandler =
            new JobIdsHandler(
                    leaderRetriever,
                    timeout,
                    responseHeaders,
                    JobIdsWithStatusesOverviewHeaders.getInstance());

    JobStatusHandler jobStatusHandler =
            new JobStatusHandler(
                    leaderRetriever,
                    timeout,
                    responseHeaders,
                    JobStatusInfoHeaders.getInstance());
                
                ……

这些Handler的作用就是flink web业务的rest服务，Handler==Servlet

然后看父类的RestEndpoint

启动的时候都会进行选举

获胜的角色会调用isLeader方法=》this.grantLeaderShip()，失败notLeader

然后进行一个定时任务

这个定时任务专门用来cleanup这些文件是否超过生命周期，要被删除

总结：

这个服务是用来启动native服务端，绑定后处理器，如果接到客户端restful请求，由webmonitor接受处理，看对应请求执行对应handler

resourceManager

首先：创建ResourceManager，启动ManagerService

leader Election Service选举后调用isLeader方法

主节点启动，进入HeartbeatMonitor，如果从节点返回心跳，会被加入heartbeatMonitor

管理所有的心跳目标对象

开启定时任务checkTaskManageTimeouts检查TaskManager的心跳

开启第二个定时任务，检查slotRequest

Dispatcher

主要职责：负责接收用户提交的JobGraph然后启动一个JobManager

在ZooKeeperLeaderElectionService中defaultDispatcherRunner

先stopDispatcherLeaderProcess停掉现有的
runActionIfRunning开启dispatcher服务，startNewDispatcherLeaderProcess
1. 执行onStart方法，启动JobGraphStore，一个用来存储JobGraph的存储组件
在SessionDispatcherLeaderProcess中启动了createDispatcherIfRunning
1. createDispatcher方法
```
//调用dispatcherGatewayServiceFactory
Dispatcher = dispatcherFactory.CREATEdISPATCHER
dispatcher.start()
```
      调用初始化
  
      dispatcherBootstraph.initialize
  
      把所有中断的job恢复执行。如果重新启动的话把之前的执行一半的程序恢复起来，底层调用函数恢复中断任务的执行
  
      正常提交一个job的时候，由dispatcher接收到来继续提交执行
  
      提交一个job的时候，startJobManagerRunner来启动一个jobManager
  
      因此Dispatcher启动的时候有一件重要的事情，就是恢复节点宕机的任务

posted @ 2024-04-03 00:16 Heinrich♣ 阅读(24) 评论(0) 编辑收藏举报

刷新页面返回顶部

ak918xp

努力，努力，天行健同功！

Flink源码学习（3）入口函数阅读

注意：上一节学习到了Flink启动流程，包括initialize初始化组件。然后创建工厂对象，生产三个实例，也就是webMonitor，ResourceManager和Dispatcher三个对象。

具体过程如下

initializeServices

commonRpcService负责处理 Flink 集群内部节点之间的远程过程调用（RPC）

newFixedThreadPool专门负责做io

hasServices:高可用服务

blobServices:大文件传输服务

heartbeatServices:心跳服务

metricRegistry:监控服务

executionGraphInfoStore :存储ExecutreJobGraph

创建工厂实例clusterComponent

webMonotor

resourceManager

Dispatcher

公告

ak918xp

努力，努力，天行健同功！

Flink源码学习（3） 入口函数阅读

注意：上一节学习到了Flink启动流程，包括initialize初始化组件。然后创建工厂对象，生产三个实例，也就是webMonitor，ResourceManager和Dispatcher三个对象。

具体过程如下

initializeServices

commonRpcService负责处理 Flink 集群内部节点之间的远程过程调用（RPC）

newFixedThreadPool专门负责做io

hasServices:高可用服务

blobServices:大文件传输服务

heartbeatServices:心跳服务

metricRegistry:监控服务

executionGraphInfoStore :存储ExecutreJobGraph

创建工厂实例clusterComponent

webMonotor

resourceManager

Dispatcher

公告

Flink源码学习（3）入口函数阅读