javaclient操作kafka&springboot整合kafka&kafka分区&kafka客户端源码

1. javaclient 测试kafka

1. 配置kafka 允许远程推送

修改config/Kraft/server.properties 文件，，将地址变为服务器公网IP地址。

advertised.listeners=PLAINTEXT://localhost:9092

然后重启

2. 测试AdminClient 对topic等元数据的管理

测试类以及结果：

package cn.qz.cloud.kafka.client;

import com.google.common.collect.Sets;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.admin.*;

import java.util.*;
import java.util.concurrent.ExecutionException;

/**
 * 对Topic的CRUD
 */
@Slf4j
public class KafkaAdminTest {

    public static Properties props = new Properties();

    static {
        props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaConstants.BOOTSTRAP_SERVER);
        props.put("request.timeout.ms", 60000);
    }

    public static void main(String[] args) throws ExecutionException, InterruptedException {
        createTopic();
        describeTopic();
    }

    public static void createTopic() throws ExecutionException, InterruptedException {
        String topicName = KafkaConstants.TOPIC_NAME;
        try (AdminClient adminClient = AdminClient.create(props)) {
            /**
             * 2 代表分区
             * 1 代表副本
             */
            NewTopic newTopic = new NewTopic(topicName, 2, (short) 1);
            CreateTopicsResult topics = adminClient.createTopics(Collections.singletonList(newTopic));
            log.info("{}", topics.all().get());
        }
    }

    public static void listTopic() throws ExecutionException, InterruptedException {
        ListTopicsOptions listTopicsOptions = new ListTopicsOptions();
        listTopicsOptions.listInternal(true);
        try (AdminClient adminClient = AdminClient.create(props)) {
            ListTopicsResult listTopicsResult = adminClient.listTopics(listTopicsOptions);
            Collection<TopicListing> topicListings = listTopicsResult.listings().get();
            log.info("{}", topicListings);
            /**
             * [(name=quickstart-events, topicId=rPIXse70QvK3Rri24a-bNg, internal=false), (name=myTopic1, topicId=E6i1TbWXTz-11yKI207ZLA, internal=false), (name=__consumer_offsets, topicId=38T6UsJSRn2BL6tnfj5Wfg, internal=true)]
             */
        }
    }

    public static void deleteTopic() throws ExecutionException, InterruptedException {
        String topicName = KafkaConstants.TOPIC_NAME;
        try (AdminClient adminClient = AdminClient.create(props)) {
            DeleteTopicsResult deleteTopicsResult = adminClient.deleteTopics(Sets.newHashSet(topicName));
            log.info("{}", deleteTopicsResult);
        }
    }

    public static void describeTopic() throws ExecutionException, InterruptedException {
        String topicName = KafkaConstants.TOPIC_NAME;
        try (AdminClient adminClient = AdminClient.create(props)) {
            DescribeTopicsResult topicsResult = adminClient.describeTopics(Arrays.asList(topicName));
            Map<String, TopicDescription> topicDescription = topicsResult.all().get();
            log.info("{}", topicDescription);
            /**
             * {myTopic1=(name=myTopic1, internal=false, partitions=(partition=0, leader=x.x.x.x:9092 (id: 1 rack: null), replicas=x.x.x.x:9092 (id: 1 rack: null), isr=x.x.x.x:9092 (id: 1 rack: null)),(partition=1, leader=x.x.x.x:9092 (id: 1 rack: null), replicas=x.x.x.x:9092 (id: 1 rack: null), isr=x.x.x.x:9092 (id: 1 rack: null)), authorizedOperations=null)}
             */
        }
    }
}

3. 消息生产者

下面重新创建myTopic1。设置分区位6，副本为1。启动一个消费者进行监听测试：

bin/kafka-console-consumer.sh --topic myTopic1 --from-beginning --bootstrap-server localhost:9092

1. ProducerRecord 介绍

向topic 发送消息的时候是发送这么一条消息。源码如下：

public class ProducerRecord<K, V> {

    private final String topic;
    private final Integer partition;
    private final Headers headers;
    private final K key;
    private final V value;
    private final Long timestamp;

    /**
     * Creates a record with a specified timestamp to be sent to a specified topic and partition
     * 
     * @param topic The topic the record will be appended to
     * @param partition The partition to which the record should be sent
     * @param timestamp The timestamp of the record, in milliseconds since epoch. If null, the producer will assign
     *                  the timestamp using System.currentTimeMillis().
     * @param key The key that will be included in the record
     * @param value The record contents
     * @param headers the headers that will be included in the record
     */
    public ProducerRecord(String topic, Integer partition, Long timestamp, K key, V value, Iterable<Header> headers) {
        if (topic == null)
            throw new IllegalArgumentException("Topic cannot be null.");
        if (timestamp != null && timestamp < 0)
            throw new IllegalArgumentException(
                    String.format("Invalid timestamp: %d. Timestamp should always be non-negative or null.", timestamp));
        if (partition != null && partition < 0)
            throw new IllegalArgumentException(
                    String.format("Invalid partition: %d. Partition number should always be non-negative or null.", partition));
        this.topic = topic;
        this.partition = partition;
        this.key = key;
        this.value = value;
        this.timestamp = timestamp;
        this.headers = new RecordHeaders(headers);
    }

可以看到可以指定partition、key、value、headers，其中只有topic和value是必须的。其逻辑如下：

若指定Partition ID,则PR被发送至指定Partition

若未指定Partition ID,但指定了Key, PR会按照hasy(key)发送至对应Partition

若既未指定Partition ID也没指定Key，PR会按照round-robin模式发送到每个Partition

若同时指定了Partition ID和Key, PR只会发送到指定的Partition (Key不起作用，代码逻辑决定)

比如发送一条消息如下：

Header header = new RecordHeader("testHeader", "testHeaderValue".getBytes());
                ProducerRecord producerRecord = new ProducerRecord(topic, null, null, "TEST_KEY", msg, Sets.newHashSet(header));

消费者收到的消息如下：(也就是消费者可以拿到header的消息)

topic: myTopic1, partition: 2, offset: 0, key: TEST_KEY, value: testMsg
key: testHeader, value: testHeaderValue

下面发送的消息以及消费者都简单的发送字符串消息，不指定key、也不指定partition、也不指定header。

2. 发送消息

下面代码演示了同步发送、异步发送、基于幂等发送、以及基于事务的发送消息。

package cn.qz.cloud.kafka.client;

import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.builder.ToStringBuilder;
import org.apache.commons.lang3.builder.ToStringStyle;
import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.errors.OutOfOrderSequenceException;
import org.apache.kafka.common.errors.ProducerFencedException;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

@Slf4j
public class Producer {

    private Properties properties = new Properties();

    private KafkaProducer kafkaProducer;

    public Producer() {
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaConstants.BOOTSTRAP_SERVER);
        /**
         * client 的作用是
         */
//        properties.put(ProducerConfig.CLIENT_ID_CONFIG, "client1");
        /**
         * 序列化方法
         */
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.BATCH_SIZE_CONFIG, "16384"); // DEFAULT 16384 = 16K
        /**
         * acks=0 消息发送出去，不管数据是否从Partition Leader上写到磁盘是否成功，直接认为消息发送成功。
         * acks = 1 Partition Leader接收到消息并写入本地磁盘，就认为消息发送成功，不管其他的Follower有没有同步消息
         * acks=all Partition Leader接收到消息之后，必须确认ISR列表里跟Leader保持同步的Follower列表集合都要同步此消息后，客户端才认为消息发送成功
         */
        properties.put(ProducerConfig.ACKS_CONFIG, "all"); // default 1
        properties.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, "3000"); // DEFAULT 3000 ms = 3 s
        // 更多默认值参考： CommonClientConfigs
    }

    /**
     * 简单的发送消息
     */
    public void produce(SendTypeEnum sendTypeEnum, String msg) {
        String topic = KafkaConstants.TOPIC_NAME;
        try {
            kafkaProducer = new KafkaProducer(properties);
            long startTime = System.currentTimeMillis();
            // 异步
            if (sendTypeEnum == SendTypeEnum.ASYNC) {
                kafkaProducer.send(new ProducerRecord(topic, msg), new ProducerCallBack(startTime, msg));
            }
            // 发出去不关心结果
            // 方法返回的是一个Future 对象，不调用get 则不会阻塞
            if (SendTypeEnum.WITHOUT_RESULT == sendTypeEnum) {
                kafkaProducer.send(new ProducerRecord(topic, msg));
            }
            // 同步：org.apache.kafka.clients.producer.KafkaProducer.send(org.apache.kafka.clients.producer.ProducerRecord<K,V>)
            // 方法返回的是一个Future 对象，调用get 则是阻塞等待结果
            if (SendTypeEnum.SYNC_WITH_RESULT == sendTypeEnum) {
                RecordMetadata rm = (RecordMetadata) kafkaProducer.send(new ProducerRecord(topic, msg)).get();
                log.info("rm: {}", ToStringBuilder.reflectionToString(rm, ToStringStyle.NO_CLASS_NAME_STYLE));
            }
        } catch (Exception e) {
            log.error("produce error", e);
        } finally {
            kafkaProducer.close();
        }
    }

    /**
     * 开启幂等性
     *
     * @param msg
     */
    public void produceIdempotence(String msg) {
        // 设置幂等之后，重试次数将变为Integer.MAX_VALUE  次， 且acks 被设为all
        /**
         * Producer ID（即PID）和Sequence Number
         * PID。每个新的Producer在初始化的时候会被分配一个唯一的PID，这个PID对用户是不可见的。
         * Sequence Numbler。（对于每个PID，该Producer发送数据的每个<Topic, Partition>都对应一个从0开始单调递增的Sequence Number。Broker端在缓存中保存了这seq number，对于接收的每条消息，如果其序号比Broker缓存中序号大于1则接受它，否则将其丢弃。这样就可以实现了消息重复提交了。
         * 它只能保证单分区上的幂等性，即一个幂等性Producer 能够保证某个主题的一个分区上不出现重复消息，它无法实现多个分区的幂等性。其次，它只能实现单会话上的幂等性，不能实现跨会话的幂等性。
         */
        properties.put("enable.idempotence", "true");//开启幂等性
        try {
            kafkaProducer = new KafkaProducer(properties);
            long startTime = System.currentTimeMillis();
            kafkaProducer.send(new ProducerRecord(KafkaConstants.TOPIC_NAME, msg, msg), new ProducerCallBack(startTime, msg));
        } catch (Exception e) {
            log.error("", e);
        } finally {
            kafkaProducer.close();
        }
    }

    /**
     * 开启事务
     * 事务是基于PID。
     * transactional.id与producerId在事务管理器中是一一对应关系，即transactional.id作为key，producerId作为value这样的键值对方式存储在事务管理器中。
     * 当producer恢复时，会通过用户自己指定的transactional.id从事务管理器获取producerId，以此来确保幂等性不同会话之间发送数据的幂等性。
     */
    public void produceInTransaction() {
        properties.put("transactional.id", "myTx");
        kafkaProducer = new KafkaProducer(properties);
        kafkaProducer.initTransactions();
        try {
            long startTime = System.currentTimeMillis();
            try {
                kafkaProducer.beginTransaction();
                for (int i = 0; i < 100; i++) {
                    String messageStr = "message_" + i;
                    if (i == 99) {
                        throw new RuntimeException("XXX");
                    }
                    kafkaProducer.send(new ProducerRecord(KafkaConstants.TOPIC_NAME, messageStr, messageStr),
                            new ProducerCallBack(startTime, messageStr));
                }
                kafkaProducer.commitTransaction();
            } catch (ProducerFencedException e) {
                kafkaProducer.close();
                log.error("", e);
            } catch (OutOfOrderSequenceException e) {
                kafkaProducer.close();
                log.error("", e);
            } catch (Exception e) {
                kafkaProducer.abortTransaction();
                log.warn("", e);
            }
        } catch (Exception e) {
            log.error("", e);
        } finally {
            kafkaProducer.close();
        }
    }

    @Slf4j
    private static class ProducerCallBack implements Callback {

        private final long startTime;

        private final String message;

        public ProducerCallBack(long startTime, String message) {
            this.startTime = startTime;
            this.message = message;
        }

        /**
         * 收到Kafka服务端发来的Ack确认消息后，会调用此函数
         *
         * @param metadata 生产者发送消息的元数据，如果发送过程出现异常，此参数为null
         * @param e        发送过程出现的异常，如果发送成功此参数为空
         */
        public void onCompletion(RecordMetadata metadata, Exception e) {
            long elapsedTime = System.currentTimeMillis() - startTime;
            if (metadata != null) {
                log.info("send success! partition:{}, offset:{}, messgage:{}, elapsedTimeMs:{}", metadata.partition(), metadata.offset(), message, elapsedTime);
            } else {
                log.error("", e);
            }
        }
    }

    public enum SendTypeEnum {

        /**
         * Async
         */
        ASYNC,

        /**
         * 不关注结果，发出去就行
         */
        WITHOUT_RESULT,

        /**
         * 同步发送
         */
        SYNC_WITH_RESULT;
    }

    public static void main(String[] args) {
        Producer producer = new Producer();
        for (int i = 0; i < 10; i++) {
            producer.produce(SendTypeEnum.ASYNC, "testMsg" + i);
        }
    }
}

4. 消息消费者

消息有手动提交和异步提交。手动提交需要自己commit然后来记录偏移量，异步提交不需要自己提交offset。

1. 自动提交：

package cn.qz.cloud.kafka.client;

import cn.hutool.core.collection.CollectionUtil;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.header.Headers;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Arrays;
import java.util.Collection;
import java.util.Properties;

@Slf4j
public class Consumer {

    private static Properties properties = new Properties();

    static {
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaConstants.BOOTSTRAP_SERVER); //required
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, KafkaConstants.Concumer.GROUP_ID);
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "300000");//default 300000
        properties.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "500");//default 500
        properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true"); // 设置是否自动提交，设为true之后偏移量会自动记录，不需要自己ack
        properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        properties.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, "4194304"); // 服务端允许的最大消息大小为4MB。
    }

    private KafkaConsumer kafkaConsumer;

    public void consume() {
        kafkaConsumer = new KafkaConsumer(properties);
        kafkaConsumer.subscribe(Arrays.asList(KafkaConstants.TOPIC_NAME), new ConsumerRebalanceListener() {
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
                System.out.println(1);
            }

            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
                System.out.println(2);
            }
        });

        try {
            while (true) {
                ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(20));
                for (ConsumerRecord<String, String> record : records) {
                    log.info("topic: {}, partition: {}, offset: {}, key: {}, value: {}",
                            record.topic(), record.partition(), record.offset(), record.key(), record.value());
                    /**
                     * 如果生产者发送了消息header，消费者可以获取到
                     */
                    Headers headers = record.headers();
                    if (CollectionUtil.isNotEmpty(headers)) {
                        headers.forEach(h -> {
                            log.info("key: {}, value: {}", h.key(), new String(h.value()));
                        });
                    }
                }
            }
        } catch (Exception e) {
            log.error("", e);
        } finally {
            kafkaConsumer.close();
        }

    }

    public static void main(String[] args) throws Exception {
        Consumer consumerDemo = new Consumer();
        consumerDemo.consume();
    }

}

2. 手动提交

package cn.qz.cloud.kafka.client;

import cn.hutool.core.collection.CollectionUtil;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.header.Headers;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Arrays;
import java.util.Collection;
import java.util.Properties;

@Slf4j
public class Consumer {

    private static Properties properties = new Properties();

    static {
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaConstants.BOOTSTRAP_SERVER); //required
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, KafkaConstants.Concumer.GROUP_ID);
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "300000");//default 300000
        properties.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "500");//default 500
        properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false"); // 设置是否自动提交，设为true之后偏移量会自动记录，不需要自己ack
        properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        properties.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, "4194304"); // 服务端允许的最大消息大小为4MB。
    }

    private KafkaConsumer kafkaConsumer;

    public void consume() {
        kafkaConsumer = new KafkaConsumer(properties);
        kafkaConsumer.subscribe(Arrays.asList(KafkaConstants.TOPIC_NAME), new ConsumerRebalanceListener() {
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
                System.out.println(1);
            }

            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
                System.out.println(2);
            }
        });

        try {
            while (true) {
                ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(20));
                for (ConsumerRecord<String, String> record : records) {
                    log.info("topic: {}, partition: {}, offset: {}, key: {}, value: {}",
                            record.topic(), record.partition(), record.offset(), record.key(), record.value());
                    /**
                     * 如果生产者发送了消息header，消费者可以获取到
                     */
                    Headers headers = record.headers();
                    if (CollectionUtil.isNotEmpty(headers)) {
                        headers.forEach(h -> {
                            log.info("key: {}, value: {}", h.key(), new String(h.value()));
                        });
                    }
                }
                // 提交offset
                kafkaConsumer.commitAsync();
            }
        } catch (Exception e) {
            log.error("", e);
        } finally {
            kafkaConsumer.close();
        }

    }

    public static void main(String[] args) throws Exception {
        Consumer consumerDemo = new Consumer();
        consumerDemo.consume();
    }

}

3. 启动consumer 自动设置偏移量

kafkaConsumer.subscribe(Arrays.asList(KafkaConstants.TOPIC_NAME), new ConsumerRebalanceListener() { // 是自动分区，只有在消息过来的时候才会触发回调。而assign 属于手动分区，两个API 不能同时用；手动分区可以重置偏移量进行读取。

public void consume() {
        kafkaConsumer = new KafkaConsumer(properties);

        // 自动分配分区. 有消息发送到topic 才会触发到回调listener
        /*kafkaConsumer.subscribe(Arrays.asList(KafkaConstants.TOPIC_NAME), new ConsumerRebalanceListener() {
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
                log.info("onPartitionsRevoked partitions: {}", partitions);
                System.out.println(String.format("onPartitionsRevoked partitions: %s", partitions));
            }

            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
                log.info("onPartitionsAssigned partitions: {}", partitions);
                System.out.println(String.format("onPartitionsAssigned partitions: %s", partitions));
            }
        });*/

        // 指定偏移量
        // assign是手动分配分区, 两者API调用不能同时使用
        // 这里写死2个分区，实际可以动态获取分区数量 （只能获取到这两个分区的消息）
        TopicPartition partition0 = new TopicPartition(KafkaConstants.TOPIC_NAME, 0);
        TopicPartition partition1 = new TopicPartition(KafkaConstants.TOPIC_NAME, 1);
        kafkaConsumer.assign(Arrays.asList(partition0, partition1));
        Set assignment = kafkaConsumer.assignment();
        System.out.println(assignment.size());
        kafkaConsumer.assignment().stream().forEach(System.out::println);
        kafkaConsumer.seek(partition0, 10L); // 为分区0设置偏移量为10
        kafkaConsumer.seek(partition1, 20L); // 为分区1设置偏移量为20

        try {
            while (true) {
                ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(20));
                for (ConsumerRecord<String, String> record : records) {
                    log.info("topic: {}, partition: {}, offset: {}, key: {}, value: {}",
                            record.topic(), record.partition(), record.offset(), record.key(), record.value());
                    /**
                     * 如果生产者发送了消息header，消费者可以获取到
                     */
                    Headers headers = record.headers();
                    if (CollectionUtil.isNotEmpty(headers)) {
                        headers.forEach(h -> {
                            log.info("key: {}, value: {}", h.key(), new String(h.value()));
                        });
                    }
                }
                // 提交offset
                kafkaConsumer.commitAsync();
            }
        } catch (Exception e) {
            log.error("", e);
        } finally {
            kafkaConsumer.close();
        }

    }

4. 开多线程消费

和rocketMQ 不同的是，kafka 不能设置消费者线程数。可以自己开多线程拉。

import org.apache.kafka.clients.consumer.ConsumerRecord;  
import org.apache.kafka.clients.consumer.ConsumerRecords;  
import org.apache.kafka.clients.consumer.KafkaConsumer;  
import org.apache.kafka.common.serialization.StringDeserializer;  
  
import java.time.Duration;  
import java.util.Arrays;  
import java.util.Properties;  
import java.util.concurrent.ExecutorService;  
import java.util.concurrent.Executors;  
  
public class MultiThreadedConsumer {  
  
    public static void main(String[] args) {  
        Properties props = new Properties();  
        props.put("bootstrap.servers", "localhost:9092");  
        props.put("group.id", "test-group");  
        props.put("key.deserializer", StringDeserializer.class.getName());  
        props.put("value.deserializer", StringDeserializer.class.getName());  
  
        int numThreads = 4; // 假设我们想要4个线程来消费消息  
        ExecutorService executor = Executors.newFixedThreadPool(numThreads);  
  
        for (int i = 0; i < numThreads; i++) {  
            final KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);  
            consumer.subscribe(Arrays.asList("my-topic"));  
  
            executor.submit(() -> {  
                while (true) {  
                    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));  
                    for (ConsumerRecord<String, String> record : records) {  
                        System.out.printf("Thread %d, offset = %d, key = %s, value = %s%n",  
                                Thread.currentThread().getId(), record.offset(), record.key(), record.value());  
                    }  
                }  
            });  
        }  
  
        // 注意：在实际应用中，你需要有一种机制来优雅地关闭线程池和消费者。  
        // 这里只是一个简单的示例，没有包括关闭逻辑。  
    }  
}

5. 手动修改offset 偏移量

这里注意需要修改kafka-client 的版本为：3.3.2

			<dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>3.3.2</version>
        </dependency>

手动获取offset和修改offset:

	public static void getAndResetOffset() throws ExecutionException, InterruptedException {
        try (AdminClient adminClient = AdminClient.create(props)) {
            // 获取特定主题的最新偏移量
            ListConsumerGroupOffsetsResult listConsumerGroupOffsetsResult = adminClient.listConsumerGroupOffsets(KafkaConstants.CONSUMER_GROUP_ID);
            KafkaFuture<Map<TopicPartition, OffsetAndMetadata>> mapKafkaFuture = listConsumerGroupOffsetsResult.partitionsToOffsetAndMetadata();
            Map<TopicPartition, OffsetAndMetadata> partitionOffsets = mapKafkaFuture.get();
            for (Map.Entry<TopicPartition, OffsetAndMetadata> entry : partitionOffsets.entrySet()) {
                TopicPartition topicPartition = entry.getKey();
                OffsetAndMetadata value = entry.getValue();
                log.info("Latest Offset for " + topicPartition + ": " + value.offset());
            }

            /**
             * 所有的消费者都已经关闭(断开kafka进行消费消费)
             * 所有的生产者都已经关闭(不再连接kafka进行生产消息)
             * 重置的分区偏移量时候当前topic必须为XXX-数值的分区(这个一定要存在)
             * admin的api不要带有client.id的属性
             */
            Map<TopicPartition, OffsetAndMetadata> updateMap = new HashMap<>();
            updateMap.put(new TopicPartition(KafkaConstants.TOPIC_NAME, 0), new OffsetAndMetadata(2));
            // 修改分区偏移量
            AlterConsumerGroupOffsetsResult alterConsumerGroupOffsetsResult = adminClient.alterConsumerGroupOffsets(KafkaConstants.CONSUMER_GROUP_ID, updateMap);
            KafkaFuture<Void> all = alterConsumerGroupOffsetsResult.all();
            Void unused = all.get();
            System.out.println("====end");
        }
    }

6. 获取group 对应的消息堆积量

package cn.qz.cloud.kafka.client;

import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.AdminClientConfig;
import org.apache.kafka.clients.admin.KafkaAdminClient;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Map;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class KafkaLagChecker {

    public static void main(String[] args) throws Exception {
        getLag(KafkaConstants.BOOTSTRAP_SERVER, KafkaConstants.CONSUMER_GROUP_ID);
    }

    public static void getLag(String servers, String groupId) {
        Properties properties = new Properties();
        properties.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, servers);
        try (AdminClient adminClient = KafkaAdminClient.create(properties)) {
            Map<TopicPartition, OffsetAndMetadata> offsetAndMetadataMap = adminClient.listConsumerGroupOffsets(groupId)
                    .partitionsToOffsetAndMetadata().get(10, TimeUnit.SECONDS);
            properties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
            properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
            properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
            try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties)) {
                Map<TopicPartition, Long> endOffsets = consumer.endOffsets(offsetAndMetadataMap.keySet());
                offsetAndMetadataMap.forEach((key, value) ->
                        System.out.printf("topic: [%s] partition:[%d] lag:[%d]", key.topic(), key.partition(), endOffsets.get(key) - value.offset()));
            }
        } catch (ExecutionException | InterruptedException | TimeoutException exception) {
            exception.printStackTrace();
        }
    }

}

2. springboot 项目测试kafka

pom配置引入kafka

        <dependency>
            <groupId>org.springframework.kafka</groupId>
            <artifactId>spring-kafka</artifactId>
        </dependency>

新增kafka相关配置

server:
  port: 8080

spring:
  #kafka配置
  kafka:
    #这里改为你的kafka服务器ip和端口号
    bootstrap-servers: xxx:9092
    #=============== producer  =======================
    producer:
      #如果该值大于零时，表示启用重试失败的发送次数
      retries: 0
      #每当多个记录被发送到同一分区时，生产者将尝试将记录一起批量处理为更少的请求，默认值为16384(单位字节)
      batch-size: 16384
      #生产者可用于缓冲等待发送到服务器的记录的内存总字节数，默认值为3355443
      buffer-memory: 33554432
      #key的Serializer类，实现类实现了接口org.apache.kafka.common.serialization.Serializer
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      #value的Serializer类，实现类实现了接口org.apache.kafka.common.serialization.Serializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer
    #=============== consumer  =======================
    consumer:
      #用于标识此使用者所属的使用者组的唯一字符串
      group-id: test-consumer-group
      #当Kafka中没有初始偏移量或者服务器上不再存在当前偏移量时该怎么办，默认值为latest，表示自动将偏移重置为最新的偏移量
      #可选的值为latest, earliest, none
      auto-offset-reset: earliest
      #消费者的偏移量将在后台定期提交，默认值为true
      enable-auto-commit: true
      #如果'enable-auto-commit'为true，则消费者偏移自动提交给Kafka的频率（以毫秒为单位），默认值为5000。
      auto-commit-interval: 100
      #密钥的反序列化器类，实现类实现了接口org.apache.kafka.common.serialization.Deserializer
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      #值的反序列化器类，实现类实现了接口org.apache.kafka.common.serialization.Deserializer
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer

增加类：生产者、消费者

package cn.qz.cloud.kafka.springboot.springboot;

import cn.qz.cloud.kafka.client.KafkaConstants;
import com.google.common.collect.Lists;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.NewTopic;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.core.KafkaAdmin;

import javax.annotation.PostConstruct;
import java.util.ArrayList;
import java.util.List;

@Configuration
public class kafkaConfig {

    @Autowired
    private KafkaAdmin kafkaAdmin;

    @PostConstruct
    public void init() {
        /**
         * init topic
         */
        AdminClient adminClient = AdminClient.create(kafkaAdmin.getConfig());
        adminClient.deleteTopics(Lists.newArrayList(KafkaConstants.TOPIC_NAME));
        List<NewTopic> topics = new ArrayList<>();
        topics.add(new NewTopic(KafkaConstants.TOPIC_NAME, 3, (short) 1));
        adminClient.createTopics(topics);
        System.out.println("创建topic成功");
    }
}
===
  
package cn.qz.cloud.kafka.springboot.springboot;

import cn.qz.cloud.kafka.client.KafkaConstants;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.support.SendResult;
import org.springframework.util.concurrent.ListenableFuture;
import org.springframework.util.concurrent.ListenableFutureCallback;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping
public class Producer {

    @Autowired
    private KafkaTemplate<String, Object> kafkaTemplate;

    @GetMapping("/index")
    public String index() {
        return "index";
    }

    @GetMapping("/send-msg")
    public String send(@RequestParam String msg) {
        //生产消息
        ListenableFuture<SendResult<String, Object>> listenableFuture = kafkaTemplate.send(KafkaConstants.TOPIC_NAME, msg, msg);
        listenableFuture.addCallback(new ListenableFutureCallback<SendResult<String, Object>>() {
            @Override
            public void onFailure(Throwable throwable) {
                throwable.printStackTrace();
            }

            @Override
            public void onSuccess(SendResult<String, Object> stringObjectSendResult) {
                System.out.println(stringObjectSendResult);
            }
        });
        return msg;
    }

}

===
package cn.qz.cloud.kafka.springboot.springboot;

import cn.qz.cloud.kafka.client.KafkaConstants;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Component;

@Component
public class Consumer {

    /**
     * org.springframework.kafka.annotation.KafkaListener 可以指定分区，指定groupId 等参数
     *
     * @param record
     */
    @KafkaListener(topics = {KafkaConstants.TOPIC_NAME})
    public void handMessage(ConsumerRecord<String, String> record) {
        String topic = record.topic();
        String msg = record.value();
        System.out.println("消费者接受消息：topic-->" + topic + ",msg->>" + msg);
    }
}

关于配置参考：

org.springframework.boot.autoconfigure.kafka.KafkaProperties

3. 关于kafka 的分区

1. Kafka 的分区数量可以修改：

[root@VM-8-16-centos kafka_2.13-3.3.1]# bin/kafka-topics.sh --describe --topic myTopic1 --bootstrap-server localhost:9092
Topic: myTopic1	TopicId: 9LsqbI1dRVelPxx-3FJ9lw	PartitionCount: 3	ReplicationFactor: 1	Configs: segment.bytes=1073741824
	Topic: myTopic1	Partition: 0	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 1	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 2	Leader: 1	Replicas: 1	Isr: 1
[root@VM-8-16-centos kafka_2.13-3.3.1]# bin/kafka-topics.sh --alter --topic myTopic1 --bootstrap-server localhost:9092 --partitions 12
[root@VM-8-16-centos kafka_2.13-3.3.1]# bin/kafka-topics.sh --describe --topic myTopic1 --bootstrap-server localhost:9092
Topic: myTopic1	TopicId: 9LsqbI1dRVelPxx-3FJ9lw	PartitionCount: 12	ReplicationFactor: 1	Configs: segment.bytes=1073741824
	Topic: myTopic1	Partition: 0	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 1	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 2	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 3	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 4	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 5	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 6	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 7	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 8	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 9	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 10	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 11	Leader: 1	Replicas: 1	Isr: 1

2. kafka 的分区策略如下

如果是kafka-client，取分区的默认实现是：org.apache.kafka.clients.producer.internals.DefaultPartitioner

package org.apache.kafka.clients.producer.internals;

import java.util.List;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.utils.Utils;

public class DefaultPartitioner implements Partitioner {
    private final ConcurrentMap<String, AtomicInteger> topicCounterMap = new ConcurrentHashMap();

    public DefaultPartitioner() {
    }

    public void configure(Map<String, ?> configs) {
    }

    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        if (keyBytes == null) {
            int nextValue = this.nextValue(topic);
            List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() > 0) {
                int part = Utils.toPositive(nextValue) % availablePartitions.size();
                return ((PartitionInfo)availablePartitions.get(part)).partition();
            } else {
                return Utils.toPositive(nextValue) % numPartitions;
            }
        } else {
            return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
        }
    }

    private int nextValue(String topic) {
        AtomicInteger counter = (AtomicInteger)this.topicCounterMap.get(topic);
        if (null == counter) {
            counter = new AtomicInteger(ThreadLocalRandom.current().nextInt());
            AtomicInteger currentCounter = (AtomicInteger)this.topicCounterMap.putIfAbsent(topic, counter);
            if (currentCounter != null) {
                counter = currentCounter;
            }
        }

        return counter.getAndIncrement();
    }

    public void close() {
    }
}

这里可以看到如果有key，会将key进行计算得到值，然后转为整数，和分区数量取模做运算；如果没传，类似轮询的方式发送。

调用分区是在：

org.apache.kafka.clients.producer.KafkaProducer#send(org.apache.kafka.clients.producer.ProducerRecord<K,V>)
->
org.apache.kafka.clients.producer.KafkaProducer#doSend
->
org.apache.kafka.clients.producer.KafkaProducer#partition 源码如下：
    private int partition(ProducerRecord<K, V> record, byte[] serializedKey, byte[] serializedValue, Cluster cluster) {
        Integer partition = record.partition();
        return partition != null ? partition : this.partitioner.partition(record.topic(), record.key(), serializedKey, record.value(), serializedValue, cluster);
    }

3. 自定义自己的分区策略

新建实现类:一直送到分区0

package cn.qz.cloud.kafka.client;

import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;

import java.util.Map;

public class CustomPartitioner implements Partitioner {

    @Override
    public int partition(String s, Object o, byte[] bytes, Object o1, byte[] bytes1, Cluster cluster) {
        return 0;
    }

    @Override
    public void close() {

    }

    @Override
    public void configure(Map<String, ?> map) {

    }
}

生产者配置指定分区策略

properties.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, "cn.qz.cloud.kafka.client.CustomPartitioner");

4. 和ES分片区别

ES不能修改分片原因：https://blog.csdn.net/w1014074794/article/details/119802550

1.kafka很容易的通过管理工具增加新的分区，这种方式只会对指定了key的消息产生影响，但是这种影响其实不大，因为消费者其实还是能消费到全部的消息
2.相比较之下es不支持增加分区，原因在于es的查询流程中：query phase–fetch phase，fetch phase的情况下是根据id去获取文档的，如果此时分区数变化了，那么就会有很多id获取不到文档数据，而其实这个文档数据是存在于es的另外的分片中的，所以es并不支持在线增加分区

解释：

1.ES你先当它是个数据库，然后，你设想一种场景，你程序里自定义分库分表规则，按uid分片，uid尾号为0的在0号库，尾号1的在1号库，以此类推，你一共分了10个库。

OK，现在要加第11个库，从改了规则那一刻，就需要有数据迁移，数据迁移的过程，你如果要做到平滑，人为完成都非常麻烦。

Kafka本身就要是订阅某个主题，然后会有一个group cordinator来分配机器A消费分区1，机器B消费分区2

本身就是按分区来消费的，无论扩缩容，就不存在问题。

4. kafka 客户端原理

我们配置的时候配置的是kakfa 集群节点，研究下kafka 客户端建立连接以及发数据的过程。

kafka 自己实现了一套类似于netty 的nio 框架用于和kafka brokers 通信。

1. 生产者逻辑

大体逻辑：

建立KafkaProducer ，框架会解析集群地址以及相关的参数并且进行初始化；
业务调send 发数据，send 内部会选择分区，组装内部需要的BO对象，然后添加到本地缓存并且通知nio 发生写事件进行更新
框架内部用nio 进行写(如果第一次写，会与集群的节点建立socket连接然后选择一个节点进行发送)

1. 创建KafkaProducer 解析集群地址并进行初始化

new KafkaProducer(properties) 进行初始化，解析相关的集群信息。第一次发消息的时候会进行socket 连接，底层用的自己的nio。

private KafkaProducer(ProducerConfig config, Serializer<K> keySerializer, Serializer<V> valueSerializer) {
        try {
            Map<String, Object> userProvidedConfigs = config.originals();
            this.producerConfig = config;
            this.time = Time.SYSTEM;
            String clientId = config.getString(ProducerConfig.CLIENT_ID_CONFIG);
            if (clientId.length() <= 0)
                clientId = "producer-" + PRODUCER_CLIENT_ID_SEQUENCE.getAndIncrement();
            this.clientId = clientId;

            String transactionalId = userProvidedConfigs.containsKey(ProducerConfig.TRANSACTIONAL_ID_CONFIG) ?
                    (String) userProvidedConfigs.get(ProducerConfig.TRANSACTIONAL_ID_CONFIG) : null;
            LogContext logContext;
            if (transactionalId == null)
                logContext = new LogContext(String.format("[Producer clientId=%s] ", clientId));
            else
                logContext = new LogContext(String.format("[Producer clientId=%s, transactionalId=%s] ", clientId, transactionalId));
            log = logContext.logger(KafkaProducer.class);
            log.trace("Starting the Kafka producer");

            Map<String, String> metricTags = Collections.singletonMap("client-id", clientId);
            MetricConfig metricConfig = new MetricConfig().samples(config.getInt(ProducerConfig.METRICS_NUM_SAMPLES_CONFIG))
                    .timeWindow(config.getLong(ProducerConfig.METRICS_SAMPLE_WINDOW_MS_CONFIG), TimeUnit.MILLISECONDS)
                    .recordLevel(Sensor.RecordingLevel.forName(config.getString(ProducerConfig.METRICS_RECORDING_LEVEL_CONFIG)))
                    .tags(metricTags);
            List<MetricsReporter> reporters = config.getConfiguredInstances(ProducerConfig.METRIC_REPORTER_CLASSES_CONFIG,
                    MetricsReporter.class);
            reporters.add(new JmxReporter(JMX_PREFIX));
            this.metrics = new Metrics(metricConfig, reporters, time);
            ProducerMetrics metricsRegistry = new ProducerMetrics(this.metrics);
            this.partitioner = config.getConfiguredInstance(ProducerConfig.PARTITIONER_CLASS_CONFIG, Partitioner.class);
            long retryBackoffMs = config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG);
            if (keySerializer == null) {
                this.keySerializer = ensureExtended(config.getConfiguredInstance(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
                                                                                         Serializer.class));
                this.keySerializer.configure(config.originals(), true);
            } else {
                config.ignore(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG);
                this.keySerializer = ensureExtended(keySerializer);
            }
            if (valueSerializer == null) {
                this.valueSerializer = ensureExtended(config.getConfiguredInstance(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
                                                                                           Serializer.class));
                this.valueSerializer.configure(config.originals(), false);
            } else {
                config.ignore(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG);
                this.valueSerializer = ensureExtended(valueSerializer);
            }

            // load interceptors and make sure they get clientId
            userProvidedConfigs.put(ProducerConfig.CLIENT_ID_CONFIG, clientId);
            List<ProducerInterceptor<K, V>> interceptorList = (List) (new ProducerConfig(userProvidedConfigs, false)).getConfiguredInstances(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG,
                    ProducerInterceptor.class);
            this.interceptors = interceptorList.isEmpty() ? null : new ProducerInterceptors<>(interceptorList);
            ClusterResourceListeners clusterResourceListeners = configureClusterResourceListeners(keySerializer, valueSerializer, interceptorList, reporters);
            this.metadata = new Metadata(retryBackoffMs, config.getLong(ProducerConfig.METADATA_MAX_AGE_CONFIG),
                    true, true, clusterResourceListeners);
            this.maxRequestSize = config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG);
            this.totalMemorySize = config.getLong(ProducerConfig.BUFFER_MEMORY_CONFIG);
            this.compressionType = CompressionType.forName(config.getString(ProducerConfig.COMPRESSION_TYPE_CONFIG));

            this.maxBlockTimeMs = config.getLong(ProducerConfig.MAX_BLOCK_MS_CONFIG);
            this.requestTimeoutMs = config.getInt(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG);
            this.transactionManager = configureTransactionState(config, logContext, log);
            int retries = configureRetries(config, transactionManager != null, log);
            int maxInflightRequests = configureInflightRequests(config, transactionManager != null);
            short acks = configureAcks(config, transactionManager != null, log);

            this.apiVersions = new ApiVersions();
            this.accumulator = new RecordAccumulator(logContext,
                    config.getInt(ProducerConfig.BATCH_SIZE_CONFIG),
                    this.totalMemorySize,
                    this.compressionType,
                    config.getLong(ProducerConfig.LINGER_MS_CONFIG),
                    retryBackoffMs,
                    metrics,
                    time,
                    apiVersions,
                    transactionManager);
            List<InetSocketAddress> addresses = ClientUtils.parseAndValidateAddresses(config.getList(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG));
            this.metadata.update(Cluster.bootstrap(addresses), Collections.<String>emptySet(), time.milliseconds());
            ChannelBuilder channelBuilder = ClientUtils.createChannelBuilder(config);
            Sensor throttleTimeSensor = Sender.throttleTimeSensor(metricsRegistry.senderMetrics);
            NetworkClient client = new NetworkClient(
                    new Selector(config.getLong(ProducerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG),
                            this.metrics, time, "producer", channelBuilder, logContext),
                    this.metadata,
                    clientId,
                    maxInflightRequests,
                    config.getLong(ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG),
                    config.getLong(ProducerConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG),
                    config.getInt(ProducerConfig.SEND_BUFFER_CONFIG),
                    config.getInt(ProducerConfig.RECEIVE_BUFFER_CONFIG),
                    this.requestTimeoutMs,
                    time,
                    true,
                    apiVersions,
                    throttleTimeSensor,
                    logContext);
            this.sender = new Sender(logContext,
                    client,
                    this.metadata,
                    this.accumulator,
                    maxInflightRequests == 1,
                    config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG),
                    acks,
                    retries,
                    metricsRegistry.senderMetrics,
                    Time.SYSTEM,
                    this.requestTimeoutMs,
                    config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG),
                    this.transactionManager,
                    apiVersions);
            String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;
            this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
            this.ioThread.start();
            this.errors = this.metrics.sensor("errors");
            config.logUnused();
            AppInfoParser.registerAppInfo(JMX_PREFIX, clientId, metrics);
            log.debug("Kafka producer started");
        } catch (Throwable t) {
            // call close methods if internal objects are already constructed this is to prevent resource leak. see KAFKA-2121
            close(0, TimeUnit.MILLISECONDS, true);
            // now propagate the exception
            throw new KafkaException("Failed to construct kafka producer", t);
        }
    }

解析配置，如果没有clientId，用自己序列号递增生成一个。
解析集群地址，并将集群信息转换为Cluster 对象保存到metadata 属性

1. 解析地址 ClientUtils.parseAndValidateAddresses(config.getList(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG))
  
public static List<InetSocketAddress> parseAndValidateAddresses(List<String> urls) {
        List<InetSocketAddress> addresses = new ArrayList<>();
        for (String url : urls) {
            if (url != null && !url.isEmpty()) {
                try {
                    String host = getHost(url);
                    Integer port = getPort(url);
                    if (host == null || port == null)
                        throw new ConfigException("Invalid url in " + CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG + ": " + url);

                    InetSocketAddress address = new InetSocketAddress(host, port);

                    if (address.isUnresolved()) {
                        log.warn("Removing server {} from {} as DNS resolution failed for {}", url, CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, host);
                    } else {
                        addresses.add(address);
                    }
                } catch (IllegalArgumentException e) {
                    throw new ConfigException("Invalid port in " + CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG + ": " + url);
                }
            }
        }
        if (addresses.isEmpty())
            throw new ConfigException("No resolvable bootstrap urls given in " + CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG);
        return addresses;
    }

2. 转换成Cluster 对象
    public static Cluster bootstrap(List<InetSocketAddress> addresses) {
        List<Node> nodes = new ArrayList<>();
        int nodeId = -1;
        for (InetSocketAddress address : addresses)
            nodes.add(new Node(nodeId--, address.getHostString(), address.getPort()));
        return new Cluster(null, true, nodes, new ArrayList<PartitionInfo>(0), Collections.<String>emptySet(), Collections.<String>emptySet(), null);
    }

创建一个 NetworkClient 对象，内部包含原信息等信息
创建一个sender 对象，包含上面networkClisnt 对象
创建一个 ioThread，其实是包装sender，然后调用线程启动方法，会调用到sender.run 方法

2. 业务发数据，暂存到内存

业务调send 发数据，send 方法: 选择分区，准备数据，添加到本地缓存（业务程序调用的方法）

1. 最终调用到 org.apache.kafka.clients.producer.KafkaProducer#doSend
  private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
        // Append callback takes care of the following:
        //  - call interceptors and user callback on completion
        //  - remember partition that is calculated in RecordAccumulator.append
        AppendCallbacks<K, V> appendCallbacks = new AppendCallbacks<K, V>(callback, this.interceptors, record);

        try {
            throwIfProducerClosed();
            // first make sure the metadata for the topic is available
            long nowMs = time.milliseconds();
            ClusterAndWaitTime clusterAndWaitTime;
            try {
                clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), nowMs, maxBlockTimeMs);
            } catch (KafkaException e) {
                if (metadata.isClosed())
                    throw new KafkaException("Producer closed while send in progress", e);
                throw e;
            }
            nowMs += clusterAndWaitTime.waitedOnMetadataMs;
            long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
            Cluster cluster = clusterAndWaitTime.cluster;
            byte[] serializedKey;
            try {
                serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in key.serializer", cce);
            }
            byte[] serializedValue;
            try {
                serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in value.serializer", cce);
            }

            // Try to calculate partition, but note that after this call it can be RecordMetadata.UNKNOWN_PARTITION,
            // which means that the RecordAccumulator would pick a partition using built-in logic (which may
            // take into account broker load, the amount of data produced to each partition, etc.).
            int partition = partition(record, serializedKey, serializedValue, cluster);

            setReadOnly(record.headers());
            Header[] headers = record.headers().toArray();

            int serializedSize = AbstractRecords.estimateSizeInBytesUpperBound(apiVersions.maxUsableProduceMagic(),
                    compressionType, serializedKey, serializedValue, headers);
            ensureValidRecordSize(serializedSize);
            long timestamp = record.timestamp() == null ? nowMs : record.timestamp();

            // A custom partitioner may take advantage on the onNewBatch callback.
            boolean abortOnNewBatch = partitioner != null;

            // Append the record to the accumulator.  Note, that the actual partition may be
            // calculated there and can be accessed via appendCallbacks.topicPartition.
            RecordAccumulator.RecordAppendResult result = accumulator.append(record.topic(), partition, timestamp, serializedKey,
                    serializedValue, headers, appendCallbacks, remainingWaitMs, abortOnNewBatch, nowMs, cluster);
            assert appendCallbacks.getPartition() != RecordMetadata.UNKNOWN_PARTITION;

            if (result.abortForNewBatch) {
                int prevPartition = partition;
                onNewBatch(record.topic(), cluster, prevPartition);
                partition = partition(record, serializedKey, serializedValue, cluster);
                if (log.isTraceEnabled()) {
                    log.trace("Retrying append due to new batch creation for topic {} partition {}. The old partition was {}", record.topic(), partition, prevPartition);
                }
                result = accumulator.append(record.topic(), partition, timestamp, serializedKey,
                    serializedValue, headers, appendCallbacks, remainingWaitMs, false, nowMs, cluster);
            }

            // Add the partition to the transaction (if in progress) after it has been successfully
            // appended to the accumulator. We cannot do it before because the partition may be
            // unknown or the initially selected partition may be changed when the batch is closed
            // (as indicated by `abortForNewBatch`). Note that the `Sender` will refuse to dequeue
            // batches from the accumulator until they have been added to the transaction.
            if (transactionManager != null) {
                transactionManager.maybeAddPartition(appendCallbacks.topicPartition());
            }

            if (result.batchIsFull || result.newBatchCreated) {
                log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), appendCallbacks.getPartition());
                this.sender.wakeup();
            }
            return result.future;
            // handling exceptions and record the errors;
            // for API exceptions return them in the future,
            // for other exceptions throw directly
        } catch (ApiException e) {
            log.debug("Exception occurred during message send:", e);
            if (callback != null) {
                TopicPartition tp = appendCallbacks.topicPartition();
                RecordMetadata nullMetadata = new RecordMetadata(tp, -1, -1, RecordBatch.NO_TIMESTAMP, -1, -1);
                callback.onCompletion(nullMetadata, e);
            }
            this.errors.record();
            this.interceptors.onSendError(record, appendCallbacks.topicPartition(), e);
            if (transactionManager != null) {
                transactionManager.maybeTransitionToErrorState(e);
            }
            return new FutureFailure(e);
        } catch (InterruptedException e) {
            this.errors.record();
            this.interceptors.onSendError(record, appendCallbacks.topicPartition(), e);
            throw new InterruptException(e);
        } catch (KafkaException e) {
            this.errors.record();
            this.interceptors.onSendError(record, appendCallbacks.topicPartition(), e);
            throw e;
        } catch (Exception e) {
            // we notify interceptor about all exceptions, since onSend is called before anything else in this method
            this.interceptors.onSendError(record, appendCallbacks.topicPartition(), e);
            throw e;
        }
    }

2. 逻辑解释：
  1》获取集群信息Cluster
  2》对k、v 进行序列化 (record 可以包含key和value)
  3》选择分区
  4》调用org.apache.kafka.clients.producer.internals.RecordAccumulator#append 将信息缓存到本地

3. Sender#run() while 循环发数据拉结果

	public void run() {
        log.debug("Starting Kafka producer I/O thread.");

        // main loop, runs until close is called
        while (running) {
            try {
                runOnce();
            } catch (Exception e) {
                log.error("Uncaught error in kafka producer I/O thread: ", e);
            }
        }

        log.debug("Beginning shutdown of Kafka producer I/O thread, sending remaining records.");

        // okay we stopped accepting requests but there may still be
        // requests in the transaction manager, accumulator or waiting for acknowledgment,
        // wait until these are completed.
        while (!forceClose && ((this.accumulator.hasUndrained() || this.client.inFlightRequestCount() > 0) || hasPendingTransactionalRequests())) {
            try {
                runOnce();
            } catch (Exception e) {
                log.error("Uncaught error in kafka producer I/O thread: ", e);
            }
        }

        // Abort the transaction if any commit or abort didn't go through the transaction manager's queue
        while (!forceClose && transactionManager != null && transactionManager.hasOngoingTransaction()) {
            if (!transactionManager.isCompleting()) {
                log.info("Aborting incomplete transaction due to shutdown");
                transactionManager.beginAbort();
            }
            try {
                runOnce();
            } catch (Exception e) {
                log.error("Uncaught error in kafka producer I/O thread: ", e);
            }
        }

        if (forceClose) {
            // We need to fail all the incomplete transactional requests and batches and wake up the threads waiting on
            // the futures.
            if (transactionManager != null) {
                log.debug("Aborting incomplete transactional requests due to forced shutdown");
                transactionManager.close();
            }
            log.debug("Aborting incomplete batches due to forced shutdown");
            this.accumulator.abortIncompleteBatches();
        }
        try {
            this.client.close();
        } catch (Exception e) {
            log.error("Failed to close network client", e);
        }

        log.debug("Shutdown of Kafka producer I/O thread has completed.");
    }

这里有数据是先缓存到内存，然后这里的while 循环轮询本地内存数据然后选择node 进行发送。

如果有需要更新，会先准备节点，建立socket连接(只有第一次需要准备，多个node 先建立连接，后面就是直接复用)

1. 建立连接方法：org.apache.kafka.common.network.Selector#connect

	public void connect(String id, InetSocketAddress address, int sendBufferSize, int receiveBufferSize) throws IOException {
        if (this.channels.containsKey(id))
            throw new IllegalStateException("There is already a connection for id " + id);
        if (this.closingChannels.containsKey(id))
            throw new IllegalStateException("There is already a connection for id " + id + " that is still being closed");

        SocketChannel socketChannel = SocketChannel.open();
        socketChannel.configureBlocking(false);
        Socket socket = socketChannel.socket();
        socket.setKeepAlive(true);
        if (sendBufferSize != Selectable.USE_DEFAULT_BUFFER_SIZE)
            socket.setSendBufferSize(sendBufferSize);
        if (receiveBufferSize != Selectable.USE_DEFAULT_BUFFER_SIZE)
            socket.setReceiveBufferSize(receiveBufferSize);
        socket.setTcpNoDelay(true);
        boolean connected;
        try {
            connected = socketChannel.connect(address);
        } catch (UnresolvedAddressException e) {
            socketChannel.close();
            throw new IOException("Can't resolve address: " + address, e);
        } catch (IOException e) {
            socketChannel.close();
            throw e;
        }
        SelectionKey key = socketChannel.register(nioSelector, SelectionKey.OP_CONNECT);
        KafkaChannel channel = buildChannel(socketChannel, id, key);

        if (connected) {
            // OP_CONNECT won't trigger for immediately connected channels
            log.debug("Immediately connected to node {}", channel.id());
            immediatelyConnectedKeys.add(key);
            key.interestOps(0);
        }
    }  
2. 方法调用链：
connect:204, Selector (org.apache.kafka.common.network)
initiateConnect:793, NetworkClient (org.apache.kafka.clients)
access$700:62, NetworkClient (org.apache.kafka.clients)
maybeUpdate:944, NetworkClient$DefaultMetadataUpdater (org.apache.kafka.clients)
maybeUpdate:848, NetworkClient$DefaultMetadataUpdater (org.apache.kafka.clients)
poll:458, NetworkClient (org.apache.kafka.clients)
run:239, Sender (org.apache.kafka.clients.producer.internals)
run:163, Sender (org.apache.kafka.clients.producer.internals)
run:750, Thread (java.lang)

框架选择节点发数据 - 准备数据并且激活写发事件到selector

1. 调用到 org.apache.kafka.clients.NetworkClient#doSend(org.apache.kafka.clients.ClientRequest, boolean, long, org.apache.kafka.common.requests.AbstractRequest)

2. 调用链：
doSend:533, NetworkClient (org.apache.kafka.clients)
doSend:500, NetworkClient (org.apache.kafka.clients)
sendInternalMetadataRequest:466, NetworkClient (org.apache.kafka.clients)
maybeUpdate:1146, NetworkClient$DefaultMetadataUpdater (org.apache.kafka.clients)
maybeUpdate:1051, NetworkClient$DefaultMetadataUpdater (org.apache.kafka.clients)
poll:558, NetworkClient (org.apache.kafka.clients)
awaitReady:73, NetworkClientUtils (org.apache.kafka.clients)
awaitNodeReady:534, Sender (org.apache.kafka.clients.producer.internals)
maybeSendAndPollTransactionalRequest:455, Sender (org.apache.kafka.clients.producer.internals)
runOnce:316, Sender (org.apache.kafka.clients.producer.internals)
run:243, Sender (org.apache.kafka.clients.producer.internals)
run:750, Thread (java.lang)

框架发数据-进行写

1. 方法
org.apache.kafka.common.network.ByteBufferSend#writeTo
	public long writeTo(TransferableChannel channel) throws IOException {
        long written = channel.write(buffers);
        if (written < 0)
            throw new EOFException("Wrote negative bytes to channel. This shouldn't happen.");
        remaining -= written;
        pending = channel.hasPendingWrites();
        return written;
    }

2. 调用链：
writeTo:58, ByteBufferSend (org.apache.kafka.common.network)
writeTo:41, NetworkSend (org.apache.kafka.common.network)
write:430, KafkaChannel (org.apache.kafka.common.network)
write:644, Selector (org.apache.kafka.common.network)
attemptWrite:637, Selector (org.apache.kafka.common.network)
pollSelectionKeys:593, Selector (org.apache.kafka.common.network)
poll:481, Selector (org.apache.kafka.common.network)
poll:560, NetworkClient (org.apache.kafka.clients)
awaitReady:73, NetworkClientUtils (org.apache.kafka.clients)
awaitNodeReady:534, Sender (org.apache.kafka.clients.producer.internals)
maybeSendAndPollTransactionalRequest:455, Sender (org.apache.kafka.clients.producer.internals)
runOnce:316, Sender (org.apache.kafka.clients.producer.internals)
run:243, Sender (org.apache.kafka.clients.producer.internals)
run:750, Thread (java.lang)

2. 消费者逻辑

消费者逻辑同生产者，也是启动的时候解析集群信息，并且维护到属性中。具体可以参考：org.apache.kafka.clients.consumer.KafkaConsumer#KafkaConsumer(org.apache.kafka.clients.consumer.ConsumerConfig, org.apache.kafka.common.serialization.Deserializer, org.apache.kafka.common.serialization.Deserializer)

posted @ 2022-12-11 15:30 QiaoZhi 阅读(242) 评论(0) 编辑收藏举报

刷新页面返回顶部

Qiao_Zhi

有远大抱负的人不可忽略眼前的工作!!!