|NO.Z.00048|——————————|BigDataEnd|——|Hadoop&Flink.V03|——|Flink.v03|Flink Connector|kafka|源码理解|源码说明.V1|

一、源码提取说明
### --- flink-kafka 是如何消费的?以及如何分区分配等

~~~     open方法源码:
### --- 指定offset提交模式

~~~     OffsetCommitMode:
~~~     OffsetCommitMode:表示偏移量如何从外部提交回Kafka brokers/ 
~~~     Zookeeper的行为它的确切值是在运行时在使用者子任务中确定的。
~~~     # 源码提取说明:OffsetCommitMode.java
~~~     # 第29~39行
~~~     DISABLED:完全禁用offset提交。
~~~     ON_CHECKPOINTS:只有当检查点完成时,才将偏移量提交回Kafka。
~~~     KAFKA_PERIODIC:使用内部Kafka客户端的自动提交功能,定期将偏移量提交回Kafka。

/**
 * The offset commit mode represents the behaviour of how offsets are externally committed
 * back to Kafka brokers / Zookeeper.
 *
 * <p>The exact value of this is determined at runtime in the consumer subtasks.
 */
@Internal
public enum OffsetCommitMode {

    /** Completely disable offset committing. */
    DISABLED,

    /** Commit offsets back to Kafka only when checkpoints are completed. */
    ON_CHECKPOINTS,

    /** Commit offsets periodically back to Kafka, using the auto commit functionality of internal Kafka clients. */
    KAFKA_PERIODIC;
}
~~~     # 源码提取说明:OffsetCommitModes.java
~~~     # 第37~50行

    public static OffsetCommitMode fromConfiguration(
            boolean enableAutoCommit,
            boolean enableCommitOnCheckpoint,
            boolean enableCheckpointing) {

        if (enableCheckpointing) {
            // if checkpointing is enabled, the mode depends only on whether committing on checkpoints is enabled
            return (enableCommitOnCheckpoint) ? OffsetCommitMode.ON_CHECKPOINTS : OffsetCommitMode.DISABLED;
        } else {
            // else, the mode depends only on whether auto committing is enabled in the provided Kafka properties
            return (enableAutoCommit) ? OffsetCommitMode.KAFKA_PERIODIC : OffsetCommitMode.DISABLED;
        }
    }
}
~~~     使用多个配置值确定偏移量提交模式
~~~     如果启用了checkpoint,并且启用了checkpoint完成时提交offset,返回ON_CHECKPOINTS。
~~~     如果未启用checkpoint,但是启用了自动提交,返回KAFKA_PERIODIC。
~~~     其他情况都返回DISABLED。
### --- 接下来创建和启动分区发现工具

~~~     # 源码提取说明:OffsetCommitModes.java
~~~     # 第1040~1052行
    /**
     * Creates the partition discoverer that is used to find new partitions for this subtask.
     *
     * @param topicsDescriptor Descriptor that describes whether we are discovering partitions for fixed topics or a topic pattern.
     * @param indexOfThisSubtask The index of this consumer subtask.
     * @param numParallelSubtasks The total number of parallel consumer subtasks.
     *
     * @return The instantiated partition discoverer
     */
    protected abstract AbstractPartitionDiscoverer createPartitionDiscoverer(
            KafkaTopicsDescriptor topicsDescriptor,
            int indexOfThisSubtask,
            int numParallelSubtasks);
~~~     创建用于为此子任务查找新分区的分区发现程序。
~~~     参数1:topicsDescriptor : 描述我们是为固定主题还是主题模式发现分区,也就是fixedTopics和
~~~     topicPattern的封装。其中fixedTopics明确指定了topic的名称,称为固定topic。topicPattern为匹配
~~~     topic名称的正则表达式,用于分区发现。
~~~     # 源码提取说明:KafkaTopicsDescriptor.java
~~~     # 第31~36行
~~~     参数2:indexOfThisSubtask :此consumer子任务的索引。
~~~     参数3:numParallelSubtasks : 并行consumer子任务的总数方法返回一个分区发现器的实例

/**
 * A Kafka Topics Descriptor describes how the consumer subscribes to Kafka topics -
 * either a fixed list of topics, or a topic pattern.
 */
@Internal
public class KafkaTopicsDescriptor implements Serializable {
### --- 打开分区发现程序,初始化所有需要的Kafka连接。
~~~     注意是线程不安全的

~~~     # 源码提取说明:AbstractPartitionDiscoverer.java
~~~     # 第87~95行
    /**
     * Opens the partition discoverer, initializing all required Kafka connections.
     *
     * <p>NOTE: thread-safety is not guaranteed.
     */
    public void open() throws Exception {
        closed = false;
        initializeConnections();
    }
### --- 初始化所有需要的Kafka链接源码:

~~~     # 源码提取说明:AbstractPartitionDiscoverer.java
~~~     # 第210~211行

    /** Establish the required connections in order to fetch topics and partitions metadata. */
    protected abstract void initializeConnections() throws Exception;
### --- KafkaPartitionDiscoverer:
~~~     创建出KafkaConsumer对象。

~~~     # 源码提取说明:AbstractPartitionDiscoverer.java
    @Override
    protected void initializeConnections(){
    this.kafkaConsumer = new KafkaConsumer<>(kafkaProperties);
    }
### --- subscribedPartitionsToStartOffsets = new HashMap<>();

~~~     已订阅的分区列表,这里将它初始化
~~~     private Map<KafkaTopicPartition, Long> subscribedPartitionsToStartOffsets;
~~~     用来保存将读取的一组主题分区,以及要开始读取的初始偏移量
### --- 用户获取所有fixedTopics和匹配topicPattern的Topic包含的所有分区信息

~~~     # 源码提取说明:AbstractPartitionDiscoverer.java
~~~     # 第118~124行
    /**
     * Execute a partition discovery attempt for this subtask.
     * This method lets the partition discoverer update what partitions it has discovered so far.
     *
     * @return List of discovered new partitions that this subtask should subscribe to.
     */
    public List<KafkaTopicPartition> discoverPartitions() throws WakeupException, ClosedException {
### --- 如果consumer从检查点恢复状态restoredState用来保存要恢复的偏移量选择TreeMap数据类型目的是有序‘

~~~     # 源码提取说明:FlinkKafkaConsumerBase.java
~~~     # 第182~190行
    /**
     * The offsets to restore to, if the consumer restores state from a checkpoint.
     *
     * <p>This map will be populated by the {@link #initializeState(FunctionInitializationContext)} method.
     *
     * <p>Using a sorted map as the ordering is important when using restored state
     * to seed the partition discoverer.
     */
    private transient volatile TreeMap<KafkaTopicPartition, Long> restoredState;
### --- 在initializeState实例化方法中填充:

~~~     # 源码提取说明:FlinkKafkaConsumerBase.java
~~~     # 第892~912行
    @Override
    public final void initializeState(FunctionInitializationContext context) throws Exception {

        OperatorStateStore stateStore = context.getOperatorStateStore();

        this.unionOffsetStates = stateStore.getUnionListState(new ListStateDescriptor<>(OFFSETS_STATE_NAME,
            createStateSerializer(getRuntimeContext().getExecutionConfig())));

        if (context.isRestored()) {
            restoredState = new TreeMap<>(new KafkaTopicPartition.Comparator());

            // populate actual holder for restored state
            for (Tuple2<KafkaTopicPartition, Long> kafkaOffset : unionOffsetStates.get()) {
                restoredState.put(kafkaOffset.f0, kafkaOffset.f1);
            }

            LOG.info("Consumer subtask {} restored state: {}.", getRuntimeContext().getIndexOfThisSubtask(), restoredState);
        } else {
            LOG.info("Consumer subtask {} has no restore state.", getRuntimeContext().getIndexOfThisSubtask());
        }
    }
### --- 回顾:context.isRestored的机制:当程序发生故障的时候值为true

~~~     # 源码提取说明:ManagedInitializationContext.java
~~~     # 第36~42行
public interface ManagedInitializationContext {

    /**
     * Returns true, if state was restored from the snapshot of a previous execution. This returns always false for
     * stateless tasks.
     */
    boolean isRestored();
if (restoredState != null) {
    // 从快照恢复逻辑...

} else {    
    // 直接启动逻辑...
}
### --- 如果restoredState没有存储某一分区的状态, 需要重头消费该分区

~~~     # 源码提取说明:FlinkKafkaConsumerBase.java
~~~     # 第553~569行
        final List<KafkaTopicPartition> allPartitions = partitionDiscoverer.discoverPartitions();
        if (restoredState != null) {
            for (KafkaTopicPartition partition : allPartitions) {
                if (!restoredState.containsKey(partition)) {
                    restoredState.put(partition, KafkaTopicPartitionStateSentinel.EARLIEST_OFFSET);
                }
            }

            for (Map.Entry<KafkaTopicPartition, Long> restoredStateEntry : restoredState.entrySet()) {
                // seed the partition discoverer with the union state while filtering out
                // restored partitions that should not be subscribed by this subtask
                if (KafkaTopicPartitionAssigner.assign(
                    restoredStateEntry.getKey(), getRuntimeContext().getNumberOfParallelSubtasks())
                        == getRuntimeContext().getIndexOfThisSubtask()){
                    subscribedPartitionsToStartOffsets.put(restoredStateEntry.getKey(), restoredStateEntry.getValue());
                }
            }
### --- 过滤掉不归该subtask负责的partition分区

~~~     assign方法:返回应该分配给特定Kafka分区的目标子任务的索引
~~~     subscribedPartitionsToStartOffsets.put(restoredStateEntry.getKey(),restoredStateEntry.getValue());

~~~     将restoredState中保存的一组topic的partition和要开始读取的
~~~     起始偏移量保存到subscribedPartitionsToStartOffsets
~~~     其中restoredStateEntry.getKey为某个Topic的摸个partition,restoredStateEntry.getValue为
~~~partition的要开始读取的起始偏移量过滤掉topic名称不符合topicsDescriptor的topicPattern的分区
~~~     # 源码提取说明:FlinkKafkaConsumerBase.java
~~~     # 第571~581行

            if (filterRestoredPartitionsWithCurrentTopicsDescriptor) {
                subscribedPartitionsToStartOffsets.entrySet().removeIf(entry -> {
                    if (!topicsDescriptor.isMatchingTopic(entry.getKey().getTopic())) {
                        LOG.warn(
                            "{} is removed from subscribed partitions since it is no longer associated with topics descriptor of current execution.",
                            entry.getKey());
                        return true;
                    }
                    return false;
                });
            }

 
 
 
 
 
 
 
 
 

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
                                                                                                                                                   ——W.S.Landor

 

posted on   yanqi_vip  阅读(24)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

导航

统计

点击右上角即可分享
微信分享提示