Flink源码学习（5）Flink编程的执行流程

Flink上层的API很简单，编程套路较为固定

执行环境 ExecutionEnvironment

数据抽象 DataSet/DataStream

逻辑算子 Source Tramsform Sink

我们以Flink中提供的AdaptiveSchedulerITCase为例子

@Test
public void testGlobalFailoverCanRecoverState() throws Exception {

    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(PARALLELISM);

    env.enableCheckpointing(20L, CheckpointingMode.EXACTLY_ONCE);
    final DataStreamSource<Integer> input = env.addSource(new SimpleSource());

    input.addSink(new DiscardingSink<>());

    env.execute();
}

Env->Datasource->Addsink->Execute

在flink应用程序中所有的操作都是StreamOperator，分为Source，Sink，Stream

内置优化：多个特定能被优化的operator会形成chain

三个类似的概念：

Function高阶算子的参数->形成Operator->底层的表示Tramsform

Flink内置入门程序

public class SocketWindowWordCount {

    public static void main(String[] args) throws Exception {

        // the host and the port to connect to
        final String hostname;
        final int port;
        try {
            final ParameterTool params = ParameterTool.fromArgs(args);
            hostname = params.has("hostname") ? params.get("hostname") : "localhost";
            port = params.getInt("port");
        } catch (Exception e) {
            System.err.println(
                    "No port specified. Please run 'SocketWindowWordCount "
                            + "--hostname <hostname> --port <port>', where hostname (localhost by default) "
                            + "and port is the address of the text server");
            System.err.println(
                    "To start a simple text server, run 'netcat -l <port>' and "
                            + "type the input text into the command line");
            return;
        }

        // get the execution environment
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // get input data by connecting to the socket
        DataStream<String> text = env.socketTextStream(hostname, port, "\n");

        // parse the data, group it, window it, and aggregate the counts
        DataStream<WordWithCount> windowCounts =
                text.flatMap(
                                (FlatMapFunction<String, WordWithCount>)
                                        (value, out) -> {
                                            for (String word : value.split("\\s")) {
                                                out.collect(new WordWithCount(word, 1L));
                                            }
                                        },
                                Types.POJO(WordWithCount.class))
                        .keyBy(value -> value.word)
                        .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
                        .reduce((a, b) -> new WordWithCount(a.word, a.count + b.count))
                        .returns(WordWithCount.class);

        // print the results with a single thread, rather than in parallel
        windowCounts.print().setParallelism(1);

        env.execute("Socket Window WordCount");
    }

    // ------------------------------------------------------------------------

    /** Data type for words with count. */
    public static class WordWithCount {

        public String word;
        public long count;

        @SuppressWarnings("unused")
        public WordWithCount() {}

        public WordWithCount(String word, long count) {
            this.word = word;
            this.count = count;
        }

        @Override
        public String toString() {
            return word + " : " + count;
        }
    }
}

编写好之后打成jar包

mvn clean package -DskipTests

通过命令提交

flink命令脚本通过Java命令启动，在ClieFrontend类来启动jvm进程执行任务的构造和提交

Flink run xxx.jar class arg1 arg2

在flink-bin的bin/flink中，提交一个jar到flink运行

exec "${JAVA_RUN}" $JVM_ARGS $FLINK_ENV_JAVA_OPTS "${log_setting[@]}" -classpath "`manglePathList "$CC_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" org.apache.flink.client.cli.CliFrontend "$@"

这行命令会让我们跳转到CliFrontend类的main方法中

static int mainInternal(final String[] args) {
    EnvironmentInformation.logEnvironmentInfo(LOG, "Command Line Client", args);

    // 1. find the configuration directory
    final String configurationDirectory = getConfigurationDirectoryFromEnv();

    // 2. load the global configuration
    final Configuration configuration =
            GlobalConfiguration.loadConfiguration(configurationDirectory);

    // 3. load the custom command lines
    final List<CustomCommandLine> customCommandLines =
            loadCustomCommandLines(configuration, configurationDirectory);

    int retCode = INITIAL_RET_CODE;
    try {
        final CliFrontend cli = new CliFrontend(configuration, customCommandLines);
        CommandLine commandLine =
                cli.getCommandLine(
                        new Options(),
                        Arrays.copyOfRange(args, min(args.length, 1), args.length),
                        true);
        Configuration securityConfig = new Configuration(cli.configuration);
        DynamicPropertiesUtil.encodeDynamicProperties(commandLine, securityConfig);
        SecurityUtils.install(new SecurityConfiguration(securityConfig));
        retCode = SecurityUtils.getInstalledContext().runSecured(() -> cli.parseAndRun(args));
    } catch (Throwable t) {
        final Throwable strippedThrowable =
                ExceptionUtils.stripException(t, UndeclaredThrowableException.class);
        LOG.error("Fatal error while running command line interface.", strippedThrowable);
        strippedThrowable.printStackTrace();
    }
    return retCode;
}

打印输出一些环境信息

通过flinki conf dir找到conf文件，解析文件

加载flinkyarnSessionCli和DefaultCli

在cli.parseAndRun中会执行run方法，方法里面会解析jar包，拿到主类参数依赖等信息

getPackagedProgram拿到jarFile,userClassPaths,entryPointClassName,configuration,savepointRestoreSettings,args

try (PackagedProgram program = getPackagedProgram(programOptions, effectiveConfiguration)) {
    executeProgram(effectiveConfiguration, program);
}

然后调用execute来执行

protected void executeProgram(final Configuration configuration, final PackagedProgram program)
        throws ProgramInvocationException {
    ClientUtils.executeProgram(
            new DefaultExecutorServiceLoader(), configuration, program, false, false);
}

executeProgram里面通过反射的方式调用运行，mainMethod.invoke(null, (Object) args)

反射是一种在运行时获取和操作类的信息的机制。它允许我们：

获取类的名称、方法、字段等信息。

创建类的实例。

调用类的方法和访问字段。

动态修改类的结构。

想要拿到一个类或者调用某个类的方法，首先需要获取到该类的class对象

通过对象静态属性.class来获取对应的Class对象

Class<TestClass> class = TestClass.class;

Object类中的getClass()方法，适合在有对象示例的情况下使用

Class<TestClass> class = tc.getClass();

只要通过给定类的字符串名称就可以获取该类，更为拓展，使用class类的forName静态方法

Class<TestClass> class = Class.forName("TestClass");

Java反射和new的区别

反射是动态编译，意思就是说只有运行时才会去获得该对象的实例，举例：Spring就是用反射。new是静态编译，在编译的时候所有模块加载入exe

首先new出来的对象我们无法访问其中的私有属性，但是通过反射出来的对象我们可以通过setAccessible()方法来访问其中的私有属性。

在使用new创建一个对象实例的时候必须知道类名，但是通过反射创建对象有时候不需要知道类名也可以
import java.lang.reflect.*;

public class ReflectionExample {
    public static void main(String[] args) throws ClassNotFoundException {
        // 获取类的Class对象
        Class<?> myClass = Class.forName("com.example.MyClass");

        // 获取类的名称
        String className = myClass.getName();
        System.out.println("Class Name: " + className);

        // 获取类的方法
        Method[] methods = myClass.getMethods();
        for (Method method : methods) {
            System.out.println("Method: " + method.getName());
        }

        // 获取类的字段
        Field[] fields = myClass.getDeclaredFields();
        for (Field field : fields) {
            System.out.println("Field: " + field.getName());
        }
    }
}

在ClientFrontUtils中

try {
    program.invokeInteractiveModeForExecution();
} finally {
    ContextEnvironment.unsetAsContext();
    StreamContextEnvironment.unsetAsContext();
}

最终调用自己写的类的main方法

public void invokeInteractiveModeForExecution() throws ProgramInvocationException {
    FlinkSecurityManager.monitorUserSystemExitForCurrentThread();
    try {
        callMainMethod(mainClass, args);
    } finally {
        FlinkSecurityManager.unmonitorUserSystemExitForCurrentThread();
    }
}

callMainMethod中，得到运行主类的main方法实例， entryClass自己编写的应用程序

try {
    mainMethod = entryClass.getMethod("main", String[].class);
} catch (NoSuchMethodException e) {
    throw new ProgramInvocationException(
            "The class " + entryClass.getName() + " has no main(String[]) method.");
} catch (Throwable t) {
    throw new ProgramInvocationException(
            "Could not look up the main(String[]) method from the class "
                    + entryClass.getName()
                    + ": "
                    + t.getMessage(),
            t);
}

最终进入自己编写的业务程序的main中

在Execution Environment源码解析

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Flink程序第一步创建运行环境StreamExecution Environment

通过getExecutionEnvironment()来获取

时flink程序的执行入口提供重要的操作机制

提供了readTextFile(),socketTestStream()，createInput(), addSource()
提供了setParallelism设置并行度
管理EnvironmentConfig对象，负责job执行的一些行为配置
StreamExecutionEnviroment管理了一个list transformation 成员变量用于保存job各种算子转化得到的transformation，拼起来就成了StreamGraph
StreamExecutionEnvrioment提供了execute提交job的执行，接收的参数是StreamGraph

StreamExecutionEnvironment是Flink应用程序执行的上下文

Job提交代码的流程

1. DataStream<String> text = env.socketTextStream(String hostname, int port, String delimiter)

查看他的代码，其实返回的是addSource

public DataStreamSource<String> socketTextStream(
        String hostname, int port, String delimiter, long maxRetry) {
    return addSource(
            new SocketTextStreamFunction(hostname, port, delimiter, maxRetry), "Socket Stream");
}

进入addSource

private <OUT> DataStreamSource<OUT> addSource(
        final SourceFunction<OUT> function,
        final String sourceName,
        @Nullable final TypeInformation<OUT> typeInfo,
        final Boundedness boundedness) {
    checkNotNull(function);
    checkNotNull(sourceName);
    checkNotNull(boundedness);

    TypeInformation<OUT> resolvedTypeInfo =
            getTypeInfo(function, sourceName, SourceFunction.class, typeInfo);

    boolean isParallel = function instanceof ParallelSourceFunction;

    clean(function);

    final StreamSource<OUT, ?> sourceOperator = new StreamSource<>(function);
    return new DataStreamSource<>(
            this, resolvedTypeInfo, sourceOperator, isParallel, sourceName, boundedness);
}

返回DataStreamSource，可以被抽象成DataStream，KeyedDataStream，DataStreamSource，DataStreamSink

DataStream 是一个抽象类，它定义了数据流的基本操作和属性。
DataStreamSource 是

2. FlatMap

DataStream<WordWithCount> windowCounts =
                text.flatMap(
                                (FlatMapFunction<String, WordWithCount>)
                                        (value, out) -> {
                                            for (String word : value.split("\\s")) {
                                                out.collect(new WordWithCount(word, 1L));
                                            }
                                        },
                                Types.POJO(WordWithCount.class))
                        .keyBy(value -> value.word)
                        .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
                        .reduce((a, b) -> new WordWithCount(a.word, a.count + b.count))
                        .returns(WordWithCount.class);

这个方法底层返回的是transform方法

public <R> SingleOutputStreamOperator<R> flatMap(
        FlatMapFunction<T, R> flatMapper, TypeInformation<R> outputType) {
    return transform("Flat Map", outputType, new StreamFlatMap<>(clean(flatMapper)));
}

把一个flatMapper这个function构建成StreamFlatMap这个operator

把算子变成function在变成function

最终调用transform方法来把operator这种streamOperator转换成transformation，最终加入到StreamExecutionEnvironment的List transformations中

这个Transform里面，把operator注册到执行环境中，用于生成StreamGraph，将这个transformation注册到执行环境中，当执行generate方法时，生成streamGraph图结构

protected <R> SingleOutputStreamOperator<R> doTransform(
        String operatorName,
        TypeInformation<R> outTypeInfo,
        StreamOperatorFactory<R> operatorFactory) {

    // read the output type of the input Transform to coax out errors about MissingTypeInfo
    transformation.getOutputType();

    OneInputTransformation<T, R> resultTransform =
            new OneInputTransformation<>(
                    this.transformation,
                    operatorName,
                    operatorFactory,
                    outTypeInfo,
                    environment.getParallelism(),
                    false);

    @SuppressWarnings({"unchecked", "rawtypes"})
    SingleOutputStreamOperator<R> returnStream =
            new SingleOutputStreamOperator(environment, resultTransform);

    getExecutionEnvironment().addOperator(resultTransform);

    return returnStream;
}

上述方法addOperator加的时transform，可见二者概念较为类似

也就是说把所有的高阶算子变成transformation加入list中

最后提交job：

env.execute("Socket Window WordCount");

Execute流程

public JobExecutionResult execute(String jobName) throws Exception {
    final List<Transformation<?>> originalTransformations = new ArrayList<>(transformations);
    StreamGraph streamGraph = getStreamGraph();
    if (jobName != null) {
        streamGraph.setJobName(jobName);
    }

    try {
        return execute(streamGraph);
    } catch (Throwable t) {
        Optional<ClusterDatasetCorruptedException> clusterDatasetCorruptedException =
                ExceptionUtils.findThrowable(t, ClusterDatasetCorruptedException.class);
        if (!clusterDatasetCorruptedException.isPresent()) {
            throw t;
        }

        // Retry without cache if it is caused by corrupted cluster dataset.
        invalidateCacheTransformations(originalTransformations);
        streamGraph = getStreamGraph(originalTransformations);
        return execute(streamGraph);
    }
}

通过getStreamGraph()获取streamGraph

通过execute提交streamGraph

streamGraph

Flink的一个job，需要构建一个高效率的用于分布式并行执行的DAG执行图

帮我们把上下游两个相邻算子优化，优化点在于如果能链接到一起就合并到一起，组成一个operatorChain，节省序列化的开销。

相同函数类型的算子：如果两个算子是相同类型的函数，例如两个连续的相邻两个阶段之间的streamTask是有关联的，到底哪些上游StreamTask生成数据给下游消费的StreamTask

一个Flink流式作业，从Client提交到Flink集群到最后执行，总共会经历4个状态，Flink中的执行图：StreamGraph->jobGraph->executionGraph->执行图

client根据用户代码生成StreamGraph，把StreamGraph构建成JobGraph提交给Flink集群主节点

启动JobMaster接受到JobGraph，解析成ExecutionGraph，调度StreamTask执行

StreamTask并行化的运行在Flink集群中，就是最终的物理执行图状态结构

StreamGraph:根据用户通过stram API编写的代码生成的最初的图，标识程序的拓扑结构

JobGraph：streamGrapoh经过优化之后生成了JobGraph，提交给JobManager的数据结构，主要的优化为将多个符合条件的node给chain成一个node，减少序列化开销

ExecutionGraph：Job Manager根据JobGraph生成ExecutionGraph，是JobGraph的并行版本，是一个数据结构

物理执行图：JM跟布局ExecutionGraph对job调度，在各个TM上部署task后形成的实际物理图，不是数据结构

提交的流程的细节介绍

1. generate

private StreamGraphGenerator getStreamGraphGenerator(List<Transformation<?>> transformations) {
    if (transformations.size() <= 0) {
        throw new IllegalStateException(
                "No operators defined in streaming topology. Cannot execute.");
    }

    // We copy the transformation so that newly added transformations cannot intervene with the
    // stream graph generation.
    return new StreamGraphGenerator(
                    new ArrayList<>(transformations), config, checkpointCfg, configuration)
            .setStateBackend(defaultStateBackend)
            .setChangelogStateBackendEnabled(changelogStateBackendEnabled)
            .setSavepointDir(defaultSavepointDirectory)
            .setChaining(isChainingEnabled)
            .setUserArtifacts(cacheFile)
            .setTimeCharacteristic(timeCharacteristic)
            .setDefaultBufferTimeout(bufferTimeout)
            .setSlotSharingGroupResource(slotSharingGroupResources);
}

存储转换过的transformation

从Env对象中把transformation拿出来，转换成StreamNode

Function->Operator->Transformation->StreamNode

final List<Collection<Integer>> allInputIds = getParentInputIds(transform.getInputs());

递归调用input的Transformation处理完之后才能处理后面的，先处理transform的上一个transform

2. 解析streamGraph

在StreamGraph里面会添加很多StreamNode

protected StreamNode addNode(
        Integer vertexID,
        @Nullable String slotSharingGroup,
        @Nullable String coLocationGroup,
        Class<? extends TaskInvokable> vertexClass,
        StreamOperatorFactory<?> operatorFactory,
        String operatorName) {

    if (streamNodes.containsKey(vertexID)) {
        throw new RuntimeException("Duplicate vertexID " + vertexID);
    }

    StreamNode vertex =
            new StreamNode(
                    vertexID,
                    slotSharingGroup,
                    coLocationGroup,
                    operatorFactory,
                    operatorName,
                    vertexClass);

    streamNodes.put(vertexID, vertex);

    return vertex;
}

StreamGraph包含StreamNode和StreamEdge，对于边的生成，则是设置当前StreamNode和上游所有的StreamNode之间的StreamEdge

for (Transformation<F> feedbackEdge : coIterate.getFeedbackEdges()) {
    Collection<Integer> feedbackIds = transform(feedbackEdge);
    allFeedbackIds.addAll(feedbackIds);
    for (Integer feedbackId : feedbackIds) {
        streamGraph.addEdge(feedbackId, itSink.getId(), 0);
    }
}

设置的具体方法，一个上游的streamVertexID一个下游的streamVertexID，也可以一看成StreamNodeID

public void addEdge(Integer upStreamVertexID, Integer downStreamVertexID, int typeNumber) {
    addEdge(upStreamVertexID, downStreamVertexID, typeNumber, null);
}

看看如何构建的边

private void addEdgeInternal(
        Integer upStreamVertexID,
        Integer downStreamVertexID,
        int typeNumber,
        StreamPartitioner<?> partitioner,
        List<String> outputNames,
        OutputTag outputTag,
        StreamExchangeMode exchangeMode,
        IntermediateDataSetID intermediateDataSetId) {

    if (virtualSideOutputNodes.containsKey(upStreamVertexID)) {
        int virtualId = upStreamVertexID;
        upStreamVertexID = virtualSideOutputNodes.get(virtualId).f0;
        if (outputTag == null) {
            outputTag = virtualSideOutputNodes.get(virtualId).f1;
        }
        addEdgeInternal(
                upStreamVertexID,
                downStreamVertexID,
                typeNumber,
                partitioner,
                null,
                outputTag,
                exchangeMode,
                intermediateDataSetId);
    } else if (virtualPartitionNodes.containsKey(upStreamVertexID)) {
        int virtualId = upStreamVertexID;
        upStreamVertexID = virtualPartitionNodes.get(virtualId).f0;
        if (partitioner == null) {
            partitioner = virtualPartitionNodes.get(virtualId).f1;
        }
        exchangeMode = virtualPartitionNodes.get(virtualId).f2;
        addEdgeInternal(
                upStreamVertexID,
                downStreamVertexID,
                typeNumber,
                partitioner,
                outputNames,
                outputTag,
                exchangeMode,
                intermediateDataSetId);
    } else {
        createActualEdge(
                upStreamVertexID,
                downStreamVertexID,
                typeNumber,
                partitioner,
                outputTag,
                exchangeMode,
                intermediateDataSetId);
    }
}

使用递归的方法addEdgeInternal构建边，addEdgeInternal方法都是在处理了特定的条件后，使用更新后的参数列表再次调用自身，这是递归的典型特征，递归的结束条件是当upStreamVertexID既不在virtualSideOutputNodes也不在virtualPartitionNodes中，这时代码将执行到else分支，并调用createActualEdge方法来创建实际的边，此时不再进行递归调用

会把所有的transform都做一次，转换成StreamNode

既然处理了streamGraph之后，我们后面的流程是异步地启动一个流处理作业，并在作业启动后提供反馈给监听器，同时处理可能发生的异常情况。使用了 CompletableFuture 来实现异步执行

public JobClient executeAsync(StreamGraph streamGraph) throws Exception {
    checkNotNull(streamGraph, "StreamGraph cannot be null.");
    final PipelineExecutor executor = getPipelineExecutor();

    CompletableFuture<JobClient> jobClientFuture =
            executor.execute(streamGraph, configuration, userClassloader);

    try {
        JobClient jobClient = jobClientFuture.get();
        jobListeners.forEach(jobListener -> jobListener.onJobSubmitted(jobClient, null));
        collectIterators.forEach(iterator -> iterator.setJobClient(jobClient));
        collectIterators.clear();
        return jobClient;
    } catch (ExecutionException executionException) {
        final Throwable strippedException =
                ExceptionUtils.stripExecutionException(executionException);
        jobListeners.forEach(
                jobListener -> jobListener.onJobSubmitted(null, strippedException));

        throw new FlinkException(
                String.format("Failed to execute job '%s'.", streamGraph.getJobName()),
                strippedException);
    }
}

3. 转为JobGraph提交给RestClient处理

执行完成后，可以通过 getJobClient() 获取 JobClient 对象，它提供了对作业执行结果的访问，包括作业的状态和统计信息

MiniClusterClient本地执行

RestClusterClient提交到Flink Rest服务端

在提交JobGraph到Flink集群运行的时候，其实提交的就是jobGraphFile

由JobSubmitHandler接受请求执行处理

在执行处理的第一件事情，把传送过来的这个文件反序列化得到JobGraph对象

然后把JobGraphFile上传到HDFS

然后SendRetriableRequest多次提交这个请求，job Submit RequestBody

restClient提交Request给WebMonitorEndpoint，最终由JobSubmitHandler来执行请求处理

通过Http Restful方式提交

因此总的流程：

+---------------------------+
|    创建执行环境         |
|  StreamExecutionEnvironment env = |
|         StreamExecutionEnvironment.getExecutionEnvironment(); |
+---------------------------+
            |
            v
+---------------------------+
|    添加源数据            |
|  DataStream<T> source =   |
|  env.addSource(sourceFunction); |
+---------------------------+
            |
            v
+---------------------------+
|    转换操作（例如：map、flatMap）|
|  DataStream<R> transformed = |
|  source.transformed(transformFunction); |
+---------------------------+
            |
            v
+---------------------------+
|    设置并行度            |
|  env.setParallelism(parallelism); |
+---------------------------+
            |
            v
+---------------------------+
|    添加Sink（输出）      |
|  transformResult.addSink(sinkFunction); |
+---------------------------+
            |
            v
+---------------------------+
|    执行作业              |
|  env.execute(jobName);    |
+---------------------------+
            |
            v
+---------------------------+
|    作业执行结果           |
|  JobClient jobClient =    |
|  getJobClient()            |
+---------------------------+

提交的流程流程

构建StreamNode和StreamEdge

提交StreamGraph，生成JobGraph，通过RestClusterClient提交JobCraph，http请求

到webmonitorEndpoint中的JobSubmitHandler来执行处理

通过FlinkPipelineTranslator来转换获取到JobGraph

final JobGraph jobGraph =
        FlinkPipelineTranslationUtil.getJobGraph(
                userClassloader,
                pipeline,
                configuration,
                executionConfigAccessor.getParallelism());

StreamGraph.getJobGraph

StreamingJobGraphGenerator.createJobGraph来实现streamGraph到jobGraph的转换

StreamGraph到JobGraph之间主要是做了chain动作：setChaining方法

将可以chain在一起的streamNode链接在一起

能减少线程之间的切换，消息的序列化，缓冲区交换，减少延时

一个StreamNode也可以被认为是做了chain动作

一个StreamEdge链接上下游两个StreamNode

List<StreamEdge> chainableOutputs = new ArrayList<>()

拿到streamNode的getOutEdges

isChainable判断是否能chain，能就放入chainableOutputs，不能就放入nonchainableOutputs

当前判断只能做当前StreamNode和直接下游StreamNode

把每一个可以chain的Edges给createChain

如何判断能否被chain

private static boolean isChainable(
        Transformation<?> upTransform,
        Transformation<?> downTransform,
        Map<Transformation<?>, Set<Transformation<?>>> outputMap) {
    return upTransform.getParallelism() == downTransform.getParallelism()
            && upTransform.getMaxParallelism() == downTransform.getMaxParallelism()
            && upTransform.getSlotSharingGroup().equals(downTransform.getSlotSharingGroup())
            && areOperatorsChainable(upTransform, downTransform)
            && outputMap.get(upTransform).size() == 1;
}


private static boolean areOperatorsChainable(
        Transformation<?> upTransform, Transformation<?> downTransform) {
    if (!areOperatorsChainableByChainingStrategy(upTransform, downTransform)) {
        return false;
    }

    if (upTransform instanceof PythonBroadcastStateTransformation
            || upTransform instanceof PythonKeyedBroadcastStateTransformation) {
        return false;
    }

    DataStreamPythonFunctionOperator<?> upOperator =
            (DataStreamPythonFunctionOperator<?>)
                    ((SimpleOperatorFactory<?>) getOperatorFactory(upTransform)).getOperator();

    DataStreamPythonFunctionOperator<?> downOperator =
            (DataStreamPythonFunctionOperator<?>)
                    ((SimpleOperatorFactory<?>) getOperatorFactory(downTransform))
                            .getOperator();

    if (!arePythonOperatorsInSameExecutionEnvironment(upOperator, downOperator)) {
        return false;
    }

    return (downOperator instanceof ExternalPythonProcessOperator
                    && (upOperator instanceof ExternalPythonKeyedProcessOperator
                            || upOperator instanceof ExternalPythonKeyedCoProcessOperator
                            || upOperator instanceof ExternalPythonProcessOperator
                            || upOperator instanceof ExternalPythonCoProcessOperator))
            || (downOperator instanceof EmbeddedPythonProcessOperator
                    && (upOperator instanceof EmbeddedPythonKeyedProcessOperator
                            || upOperator instanceof EmbeddedPythonKeyedCoProcessOperator
                            || upOperator instanceof EmbeddedPythonProcessOperator
                            || upOperator instanceof EmbeddedPythonCoProcessOperator
                            || upOperator instanceof EmbeddedPythonWindowOperator));
}

4. JobSubmitHandler的作用

资源管理：ResourceManager+TaskExecutor

任务运行：JobMaster+StreamTask

客户端提交作业，客户端把JobGraph提交给JobManager了
handleRequest服务端处理，接受JobGraph
加载JobGraph：从请求中获取文件，拿到请求体，从中loadJobGraph恢复成JobGraph
客户端异步等待结果：异步等待拿到结果
webMonitor监听：当webmonitor拿到请求之后马上调用Job Submit的submitHandleRequest处理，上传jar包之后，把数据提交给getway，也就是Dispatcher
Dispatcher处理：当客户端提交一个job的时候由集群的主节点Dispatcher接受来继续提交执行，分配Job给JobManager实例

1. createJobManagerRunner：创建JM实例，JobGraph变成ExecutionGraph　　
  1. 把jobGraph通过json字符串的方式设置到ExecutionGraph的成员变量
    
    最大的变化是单个jobVertex变成了多个ExecutionVertex，一对多的关系，根据JobVertex的并发度设置来决定生成多少ExecutionVertex。一个ExecutionVertex就对应到真正执行的streamTask。正常来说，一个StreamTask对应一个slot

1. 1. JobVertex 可以指定其并行度，ExecutionGraph 会根据这个并行度将每个 JobVertex 分割成多个 ExecutionVertex。每个 ExecutionVertex 代表操作符的一个并行实例。
  2. slot分配
2. startJobManagerRunner：启动JM对象

作业执行：调度完成后JM将作业分发给TM
1. TaskManager上面已经启动好了一个JobMaster
  1. JobMaster需要向resourceManager汇报心跳
  2. JobMaster需要向TaskManager维持心跳
    1. 对应task在那些TM运行，所以需要维持
  3. 把所有注册的心跳对象发一个心跳请求，调度任务重新10s刷新调度请求心跳
  4. 当前JobMaster启动slotPool，为了集中管理所有申请到的slot
  5. ReconnectToResource连接ResourceManager
  6. 监听resource Manager的变更，jobMaster需要实时感知RM的更新
  7. 如果主节点failover，重新建立JobMaster连接
  8. 向RM注册当前这个JobMaster
8. WebMonitor更新：更新作业的状态信息，可以实时监控进度　　　　　
9.结果反馈：客户端通过webMonitor的REST API拿到结果的状态　　
也就是：

客户端准备作业 -> 客户端提交作业 -> 联系 JobManager -> handleRequest 服务端处理 -> 加载 JobGraph -> 调度作业 -> 异步等待结果 -> WebMonitor 监听 -> Dispatcher（Gateway）处理 -> 作业执行 -> WebMonitor 更新 -> 结果反馈。

5. slot管理申请和释放

　

JobMaster服务端处理入口

恢复得到JobGraph

创建JobManagerRunner，创建JobMaster，创建过程中，把JobGraph转为ExecutionGraph，当前整个Job到底需要多少资源

内部主要做2件事情，startJobMasterService，jobMasterService

JM发送请求申请slot

对象RM，执行slot请求处理

TM处理RM发过来slot请求

ResourceManager管理所有的TaskManager taskExecutor

TaskExecutor中关于资源的管理，使用slot的抽象

slot的状态管理，专门有一个做slot管理的slotManagerImpl

JobMaster申请slot管理组件slotPool

slot共享，既然由slot共享的概念，如果要执行一个task，其实就可以先尝试从slotPool中申请，如果申请不到，则向ResourceManager申请

Taskmanager的slot管理

jobMaster发送请求申请slot

　　 ResourceManager接受请求执行slot请求

TaskManager处理ResourceManager发送过来的slot请求

JobMaster接收到TaskManager发送的slot申请处理结果

用户客户端 ----> JobManager | ^ | | v | 作业图生成 <----+----> ResourceManager | ^ | | v | JobSubmitHandler <--+ | v TaskManager(s)

allocateSlotsAndDeploy

初始化一个容器存储申请到的slotExecutionVertexAssignment

遍历待申请slot的ExecutionVertex集合，依次执行slot申请

处理申请结果：如果申请到，最终申请到的是LogicalSlot，可能存在共享slot，多个task公用一个slot，在逻辑上看作多个slot，在物理上是同一个slot

如果第一次申请slot，那么slotpool里面肯定没有slot，那么就去resourcemanager安排一个slot

假设申请slot，刚好没有了，启动的是standalone集群

只能等待

假设申请slot，刚好没有了，启动的是yarn集群

动态slot管理，job提交给yarn之后，如果taskExecutor太少，重新申请一个container启动，每个taskExecutor提供一定数量的slot用于执行任务

请求存起来，如果断网了还能等恢复之后继续发

如果这一次slot申请，最终申请到了slot，这个slot会被分配一个allocationID防止重复申请

调用RM的代理对象申请slot

进入RM处理流程：

判断这个job是不是已经被内注册过了，判断申请slot的JM和注册的Job的Master地址是不是一样

防止Failover后因为JM的迁移导致了双倍的slot申请资源浪费

从Free状态的已注册的slot中选择符合要求的slot、

找到了对应的slot，从freeslots集合中删除申请到的slot

只要taskExecutor注册上线了，必然进行slot汇报

在执行完task之后被释放的slot放入freeslots中

如果集群中没有free状态的slot，那么执行allocateResource申请资源

如果是yarn集群：开启新的TaskExecutor，开启一个startNewWorker

如果是standalone方式：并不能startNewWorker，刚才ResourceManager通过调用

SlotManagerImpl的findMatchingSlot==逻辑计算，找出一个最合适的free状态的最合适的slot

如果能拿到free的slot，那么方法的返回结果中，必然包含slotID和TaskExecutorID

状态校验，看这个taskManagerSlot是allocated，pending，free。taskManagerSlot状态变为pending，resourceManager已经分配出去了，但是TaskManager不知道，还没返回成功，所以叫pending

如果某个TaskExecutor里面的某个slot被分配给了某个job的某个task，那么当前这个taskExecutor就会封装一个TaskSlot进行管理

taskSlot = new TaskSlot<>( index, effectiveResourceProfile, memoryPageSize, jobId, allocationId, memoryVerificationExecutor); taskSlots.put(index, taskSlot); // update the allocation id to task slot map allocatedSlots.put(allocationId, taskSlot);

更新job在该节点上申请的slot数量，维护slot和allocationID之间的关系

if (slots == null) { slots = new HashSet<>(4); slotsPerJob.put(jobId, slots); } slots.add(allocationId);

把每一个slot封装成slotOffer，将自己的slot分配给jobManager

处理分配执行的反馈

如果超时了就重来，报错就释放slot，没错就在taskSlotTable里面的slot状态修改

将申请到的slotOffer的集合返回

JobMaster经过一系列动作，最终RM把某一个TaskExecutor上的某个slot分配给当前jobMaster，当前JM通过slotPool来管理起来这个申请到的slot

slotExecutionVertexAssignment返回slot抽象集合

Execution需要的slot申请完毕，拥有资源，给每一个对象构建一个handler

通过一个handler启动和部署一个task

构建的deploymentHandler并行部署slot

waitForAllslotsAndDeploy

找到对应的slotAssigned=slotExecutionVertexAssignment对应起来

把slotAssigned.handle进行部署

通过deployOrHandlerError方法，根据slotAssigned，deploymentHandle

deployTaskSafe方法，根据ExeutionVertexId获取ExecutionVertex，在初始化jobMaster的时候会同时创建一个Scheduler = DefaultScheduler，在被创建的时候会把jobGraph变成ExecutionGraph

第一次执行失败可能会充实，但是这两次执行的是同一个ExecutionVertex，怎么区分

每执行一次ExecutionVertex就封装一个Execution对象

posted @ 2024-04-20 23:57 Heinrich♣ 阅读(35) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

ak918xp

努力，努力，天行健同功！

Flink源码学习（5）Flink编程的执行流程

Job提交代码的流程

公告