|NO.Z.00099|——————————|BigDataEnd|——|Hadoop&kafka.V06|——|kafka.v06|Kafka源码剖析|Producer消费者流程.v02|

一、订阅Topic

### --- 订阅Topic

~~~     下面我们先来看一下subscribe方法都有哪些逻辑

public void subscribe(Collection<String> topics, ConsumerRebalanceListenerlistener) {
    // 轻量级锁
    acquireAndEnsureOpen();
    try {
        if (topics == null) {
            throw new IllegalArgumentException("Topic collection to subscribe to cannot be null");
        } else if (topics.isEmpty()) {
            // topics为空，则开始取消订阅的逻辑
            this.unsubscribe();
        } else {
            // topic合法性判断,包含null或者空字符串直接抛异常
            for (String topic : topics) {
                if (topic == null || topic.trim().isEmpty())
                    throw new IllegalArgumentException("Topic collection to subscribe to cannot contain null or empty topic");
            }
            // 如果没有消费协调者直接抛异常
            throwIfNoAssignorsConfigured();
            log.debug("Subscribed to topic(s): {}", Utils.join(topics, ", "));
            // 开始订阅
            this.subscriptions.subscribe(new HashSet<>(topics), listener);
            // 更新元数据,如果metadata当前不包括所有的topics则标记强制更新
            metadata.setTopics(subscriptions.groupSubscription());
        }
    } finally {
        release();
    }
}

public void subscribe(Set<String> topics, ConsumerRebalanceListenerlistener) {
    if (listener == null)
        throw new IllegalArgumentException("RebalanceListener cannot be null");
    
    // 按照指定的Topic名字进行订阅，自动分配分区
    setSubscriptionType(SubscriptionType.AUTO_TOPICS);
    // 监听
    this.listener = listener;
    // 修改订阅信息
    changeSubscription(topics);
}

private void changeSubscription(Set<String> topicsToSubscribe) {
    if (!this.subscription.equals(topicsToSubscribe)) {
        // 如果使用AUTO_TOPICS或AUTO_PARTITION模式，则使用此集合记录所有订阅的Topic
        this.subscription = topicsToSubscribe;
        // Consumer Group中会选一个Leader,Leader会使用这个集合记录Consumer Group中所有消费者订阅的Topic,而其他的Follower的这个集合只会保存自身订阅的Topic
        this.groupSubscription.addAll(topicsToSubscribe);
    }
}

~~~     KafkaConsumer不是线程安全类，开启轻量级锁，topics为空抛异常，topics是空集合开始取消订阅，
~~~     再次判断topics集合中是否有非法数据，判断消费者协调者是否为空。开始订阅对应topic。
~~~     listener默认为NoOpConsumerRebalanceListener ，一个空操作
~~~     # 轻量级锁：
~~~     分别记录了当前使用KafkaConsumer的线程id和重入次数，
~~~     KafkaConsumer的acquire()和release()方法实现了一个”轻量级锁“，它并非真正的锁，
~~~     仅是检测是否有多线程并发操作KafkaConsumer而已
~~~     每一个KafkaConsumer实例内部都拥有一个SubscriptionState对象，
~~~     subscribe内部调用了subscribe方法，subscribe方法订阅信息记录到SubscriptionState ，
~~~     多次订阅会覆盖旧数据。
~~~     更新metadata，判断如果metadata中不包含当前groupSubscription，
~~~     开始标记更新（后面会有更新的逻辑），并且消费者侧的topic不会过期

二、消息消费过程

### --- 消息消费过程

~~~     下面KafkaConsumer的核心方法poll是如何拉取消息的，先来看一下下面的代码：

### --- poll

    public ConsumerRecords<K, V> poll(long timeout) {
        //  使用轻量级锁检测kafkaConsumer是否被其他线程使用
        acquireAndEnsureOpen();
        try {
            //  超时间小于0抛异常
            if (timeout < 0)
                throw new IllegalArgumentException("Timeout must not be negative");

            //  订阅类型为NONE抛异常,表示当前消费者没有订阅任何topic或者没有分配分区
            if (this.subscriptions.hasNoSubscriptionOrUserAssignment())
                throw new IllegalStateException("Consumer is not subscribed to any topics or assigned any partitions");

            // poll for new data until the timeout expires
            long start = time.milliseconds();
            long remaining = timeout;
            do {
                //  核心方法，拉取消息
                Map<TopicPartition, List<ConsumerRecord<K, V>>> records = pollOnce(remaining);
                if (!records.isEmpty()) {
                    // before returning the fetched records, we can send off the next round of fetches
                    // and avoid block waiting for their responses to enable pipelining while the user
                    // is handling the fetched records.
                    //
                    // NOTE: since the consumed position has already been updated, we must not allow
                    // wakeups or any other errors to be triggered prior to returning the fetched records.
                    
                    // 如果拉取到了消息，发送一次消息拉取的请求，不会阻塞不会被中断
                    // 在返回数据之前，发送下次的 fetch 请求，避免用户在下次获取数据时线程 block
                    if (fetcher.sendFetches() > 0 || client.hasPendingRequests())
                        client.pollNoWakeup();

                    //  经过烂机器处理后返回
                    if (this.interceptors == null)
                        return new ConsumerRecords<>(records);
                    else
                        return this.interceptors.onConsume(new ConsumerRecords<>(records));
                }

                long elapsed = time.milliseconds() - start;
                //  拉取超时就结束
                remaining = timeout - elapsed;
            } while (remaining > 0);

            return ConsumerRecords.empty();
        } finally {
            release();
        }
    }

### --- 这里可以看出，poll 方法的真正实现是在 pollOnce 方法中，

~~~     poll 方法通过 pollOnce 方法获取可用的数据
~~~     使用轻量级锁检测kafkaConsumer是否被其他线程使用
~~~     检查超时时间是否小于0，小于0抛出异常，停止消费
~~~     检查这个 consumer 是否订阅的相应的 topic-partition
~~~     调用 pollOnce() 方法获取相应的 records
~~~     在返回获取的 records 前，发送下一次的 fetch 请求，
~~~     避免用户在下次请求时线程 block在pollOnce() 方法中
~~~     如果在给定的时间（timeout）内获取不到可用的 records，返回空数据

### --- pollOnce

// 除了获取新数据外，还会做一些必要的 offset-commit和reset-offset的操作
    private Map<TopicPartition, List<ConsumerRecord<K, V>>> pollOnce(long timeout) {
        client.maybeTriggerWakeup();
        
        // 1. 获取 GroupCoordinator 地址并连接、加入 Group、sync Group、自动commit, join 及 sync 期间 group 会进行 rebalance
        coordinator.poll(time.milliseconds(), timeout);

        // 2. 更新订阅的 topic-partition 的 offset（如果订阅的 topic-partitionlist 没有有效的 offset 的情况下）
        // fetch positions if we have partitions we're subscribed to that we
        // don't know the offset for
        if (!subscriptions.hasAllFetchPositions())
            updateFetchPositions(this.subscriptions.missingFetchPositions());

        // 3. 获取 fetcher 已经拉取到的数据
        // if data is available already, return it immediately
        Map<TopicPartition, List<ConsumerRecord<K, V>>> records = fetcher.fetchedRecords();
        if (!records.isEmpty())
            return records;

        // 4. 发送 fetch 请求,会从多个 topic-partition 拉取数据（只要对应的 topicpartition没有未完成的请求）
        // send any new fetches (won't resend pending fetches)
        fetcher.sendFetches();

        long now = time.milliseconds();
        long pollTimeout = Math.min(coordinator.timeToNextPoll(now), timeout);

        // 5. 调用 poll 方法发送请求（底层发送请求的接口）
        client.poll(pollTimeout, now, new PollCondition() {
            @Override
            public boolean shouldBlock() {
                // since a fetch might be completed by the background thread, we need this poll condition
                // to ensure that we do not block unnecessarily in poll()
                return !fetcher.hasCompletedFetches();
            }
        });

        // 6. 如果 group 需要 rebalance,直接返回空数据,这样更快地让 group 进行稳定状态
        // after the long poll, we should check whether the group needs to rebalance
        // prior to returning data so that the group can stabilize faster
        if (coordinator.needRejoin())
            return Collections.emptyMap();

        // 获取到请求的结果
        return fetcher.fetchedRecords();
    }

### --- pollOnce 可以简单分为6步来看，其作用分别如下:
### --- coordinator.poll()

~~~     获取 GroupCoordinator 的地址，并建立相应 tcp 连接，发送 join-group、sync-group，
~~~     之后才真正加入到了一个 group 中，这时会获取其要消费的 topic-partition 列表，
~~~     如果设置了自动 commit，也会在这一步进行 commit。
~~~     总之对于一个新建的 group，group 状态将会从 Empty –>PreparingRebalance –> AwaiSync –> Stable

~~~     获取 GroupCoordinator 的地址，并建立相应 tcp 连接；
~~~     发送 join-group 请求，然后 group 将会进行 rebalance；
~~~     发送 sync-group 请求，之后才正在加入到了一个 group 中，这时会通过请求获取其要消费的 topic

### --- partition 列表；

~~~     如果设置了自动 commit，也会在这一步进行 commit offset

### --- updateFetchPositions()

~~~     这个方法主要是用来更新这个 consumer 实例订阅的 topic-partition 列表的 fetch-offset 信息。
~~~     目的就是为了获取其订阅的每个 topic-partition 对应的 position，
~~~     这样 Fetcher 才知道从哪个 offset 开始去拉取这个 topic-partition 的数据

~~~     # 在 Fetcher 中，这个 consumer 实例订阅的每个 topic-partition 都会有一个对应的TopicPartitionState 对象，

    private void updateFetchPositions(Set<TopicPartition> partitions) {
        // 先重置那些调用 seekToBegin 和 seekToEnd 的 offset 的 tp,设置其 thefetch position 的 offset
        // lookup any positions for partitions which are awaiting reset (which may be the
        // case if the user called seekToBeginning or seekToEnd. We do this check first to
        // avoid an unnecessary lookup of committed offsets (which typically occurs when
        // the user is manually assigning partitions and managing their own offsets).
        fetcher.resetOffsetsIfNeeded(partitions);

        if (!subscriptions.hasAllFetchPositions(partitions)) {
            // if we still don't have offsets for the given partitions, then we should either
            // seek to the last committed position or reset using the auto reset policy

            // 获取所有分配 tp 的 offset, 即 committed offset, 更新到TopicPartitionState 中的 committed offset 中
            // first refresh commits for all assigned partitions
            coordinator.refreshCommittedOffsetsIfNeeded();

            // 如果 the fetch position 值无效,则将上步获取的 committed offset 设置为 the fetch position
            // then do any offset lookups in case some positions are not known
            fetcher.updateFetchPositions(partitions);
        }
    }

### --- 在这个对象中会记录以下这些内容：

private static class TopicPartitionState {
    // Fetcher 下次去拉取时的 offset，Fecher 在拉取时需要知道这个值
    private Long position; // last consumed position
    // 最后一次获取的高水位标记
    private Long highWatermark; // the high watermark from last fetch
    private Long lastStableOffset;
    // consumer 已经处理完的最新一条消息的 offset，consumer 主动调用 offsetcommit时会更新这个值；
    private OffsetAndMetadata committed; // last committed position
    // 是否暂停
    private boolean paused; // whether this partition has been paused by the user
    // 这 topic-partition offset 重置的策略，重置之后，这个策略就会改为 null，防止再次操作
    private OffsetResetStrategy resetStrategy; // the strategy to use if the offset needs resetting
}

### --- fetcher.fetchedRecords()
~~~     返回其 fetched records，并更新其 fetch-position offset，
~~~     只有在 offset-commit 时（自动commit 时，是在第一步实现的），才会更新其 committed offset；

    public Map<TopicPartition, List<ConsumerRecord<K, V>>> fetchedRecords() {
        Map<TopicPartition, List<ConsumerRecord<K, V>>> fetched = new HashMap<>();
        //  在max.poll.records中设置单词最大的拉取条数
        int recordsRemaining = maxPollRecords;

        try {
            while (recordsRemaining > 0) {
                if (nextInLineRecords == null || nextInLineRecords.isFetched) {
                    //  从队列中获取但不移除次队列的头，如果此队列为空，则返回null
                    CompletedFetch completedFetch = completedFetches.peek();
                    if (completedFetch == null) break;

                    //  获取下一个要处理的nextInLineRecords
                    nextInLineRecords = parseCompletedFetch(completedFetch);
                    completedFetches.poll();
                } else {
                    //  拉取records，更新position
                    List<ConsumerRecord<K, V>> records = fetchRecords(nextInLineRecords, recordsRemaining);
                    TopicPartition partition = nextInLineRecords.partition;
                    if (!records.isEmpty()) {
                        List<ConsumerRecord<K, V>> currentRecords = fetched.get(partition);
                        if (currentRecords == null) {
                            fetched.put(partition, records);
                        } else {
                            // this case shouldn't usually happen because we only send one fetch at a time per partition,
                            // but it might conceivably happen in some rare cases (such as partition leader changes).
                            // we have to copy to a new list because the old one may be immutable
                            List<ConsumerRecord<K, V>> newRecords = new ArrayList<>(records.size() + currentRecords.size());
                            newRecords.addAll(currentRecords);
                            newRecords.addAll(records);
                            fetched.put(partition, newRecords);
                        }
                        recordsRemaining -= records.size();
                    }
                }
            }
        } catch (KafkaException e) {
            if (fetched.isEmpty())
                throw e;
        }
        return fetched;
    }

    private List<ConsumerRecord<K, V>> fetchRecords(PartitionRecords partitionRecords, int maxRecords) {
        if (!subscriptions.isAssigned(partitionRecords.partition)) {
            // this can happen when a rebalance happened before fetched records are returned to the consumer's poll call
            log.debug("Not returning fetched records for partition {} since it is no longer assigned",
                    partitionRecords.partition);
        } else {
            //  这个tp不能来消费了，比如调用pause方法暂停消费
            // note that the consumed position should always be available as long as the partition is still assigned
            long position = subscriptions.position(partitionRecords.partition);
            if (!subscriptions.isFetchable(partitionRecords.partition)) {
                // this can happen when a partition is paused before fetched records are returned to the consumer's poll call
                log.debug("Not returning fetched records for assigned partition {} since it is no longer fetchable",
                        partitionRecords.partition);
            } else if (partitionRecords.nextFetchOffset == position) {
                // 获取该 tp 对应的records,并更新 partitionRecords 的fetchOffset（用于判断是否顺序）
                List<ConsumerRecord<K, V>> partRecords = partitionRecords.fetchRecords(maxRecords);

                long nextOffset = partitionRecords.nextFetchOffset;
                log.trace("Returning fetched records at offset {} for assigned partition {} and update " +
                        "position to {}", position, partitionRecords.partition, nextOffset);
                // 更新消费的到 offset（ the fetch position）
                subscriptions.position(partitionRecords.partition, nextOffset);

                // 获取 Lag（即 position与 hw 之间差值）,hw 为 null 时,才返回null
                Long partitionLag = subscriptions.partitionLag(partitionRecords.partition, isolationLevel);
                if (partitionLag != null)
                    this.sensors.recordPartitionLag(partitionRecords.partition, partitionLag);

                return partRecords;
            } else {
                // these records aren't next in line based on the last consumed position, ignore them
                // they must be from an obsolete request
                log.debug("Ignoring fetched records for {} at offset {} since the current position is {}",
                        partitionRecords.partition, partitionRecords.nextFetchOffset, position);
            }
        }

        partitionRecords.drain();
        return emptyList();
    }

### --- fetcher.sendFetches()

~~~     只要订阅的 topic-partition list 没有未处理的 fetch 请求，
~~~     就发送对这个 topic-partition 的 fetch请求，
~~~     在真正发送时，还是会按 node 级别去发送，
~~~     leader 是同一个 node 的 topic-partition 会合成一个请求去发送；

// 向订阅的所有 partition （只要该 leader 暂时没有拉取请求）所在 leader 发送 fetch请求
    public int sendFetches() {
        //  1.创建Fetch Request
        Map<Node, FetchRequest.Builder> fetchRequestMap = createFetchRequests();
        for (Map.Entry<Node, FetchRequest.Builder> fetchEntry : fetchRequestMap.entrySet()) {
            final FetchRequest.Builder request = fetchEntry.getValue();
            final Node fetchTarget = fetchEntry.getKey();

            log.debug("Sending {} fetch for partitions {} to broker {}", isolationLevel, request.fetchData().keySet(),
                    fetchTarget);
            //  2.发送Fetch Request
            client.send(fetchTarget, request)
                    .addListener(new RequestFutureListener<ClientResponse>() {
                        @Override
                        public void onSuccess(ClientResponse resp) {
                            FetchResponse response = (FetchResponse) resp.responseBody();
                            if (!matchesRequestedPartitions(request, response)) {
                                // obviously we expect the broker to always send us valid responses, so this check
                                // is mainly for test cases where mock fetch responses must be manually crafted.
                                log.warn("Ignoring fetch response containing partitions {} since it does not match " +
                                        "the requested partitions {}", response.responseData().keySet(),
                                        request.fetchData().keySet());
                                return;
                            }

                            Set<TopicPartition> partitions = new HashSet<>(response.responseData().keySet());
                            FetchResponseMetricAggregator metricAggregator = new FetchResponseMetricAggregator(sensors, partitions);

                            for (Map.Entry<TopicPartition, FetchResponse.PartitionData> entry : response.responseData().entrySet()) {
                                TopicPartition partition = entry.getKey();
                                long fetchOffset = request.fetchData().get(partition).fetchOffset;
                                FetchResponse.PartitionData fetchData = entry.getValue();

                                log.debug("Fetch {} at offset {} for partition {} returned fetch data {}",
                                        isolationLevel, fetchOffset, partition, fetchData);
                                completedFetches.add(new CompletedFetch(partition, fetchOffset, fetchData, metricAggregator,
                                        resp.requestHeader().apiVersion()));
                            }

                            sensors.fetchLatency.record(resp.requestLatencyMs());
                        }

                        @Override
                        public void onFailure(RuntimeException e) {
                            log.debug("Fetch request {} to {} failed", request.fetchData(), fetchTarget, e);
                        }
                    });
        }
        return fetchRequestMap.size();
    }

~~~     # createFetchRequests()：
~~~     为订阅的所有 topic-partition list 创建 fetch 请求（只要该topicpartition没有还在处理的请求），
~~~     创建的 fetch 请求依然是按照 node 级别创建的；

~~~     # client.send()：
~~~     发送 fetch 请求，并设置相应的 Listener，请求处理成功的话，
~~~     就加入到completedFetches 中，在加入这个 completedFetches 集合时，
~~~     是按照 topic-partition 级别去加入，这样也就方便了后续的处理。
~~~     从这里可以看出，在每次发送 fetch 请求时，都会向所有可发送的 topic-partition 发送 fetch 请求，
~~~     调用一次 fetcher.sendFetches，拉取到的数据，可需要多次 pollOnce 循环才能处理完，
~~~     因为Fetcher 线程是在后台运行，这也保证了尽可能少地阻塞用户的处理线程，
~~~     因为如果 Fetcher 中没有可处理的数据，用户的线程是会阻塞在 poll 方法中的

### --- client.poll()

~~~     调用底层 NetworkClient 提供的接口去发送相应的请求；

### --- coordinator.needRejoin()

~~~     如果当前实例分配的 topic-partition 列表发送了变化，
~~~     那么这个 consumer group 就需要进行rebalance

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart

——W.S.Landor