javaclient操作kafka&springboot整合kafka&kafka分区&kafka客户端源码

1. javaclient 测试kafka

1. 配置kafka 允许远程推送

修改config/Kraft/server.properties 文件,,将地址变为服务器公网IP地址。

advertised.listeners=PLAINTEXT://localhost:9092

然后重启

2. 测试AdminClient 对topic等元数据的管理

测试类以及结果:

package cn.qz.cloud.kafka.client;

import com.google.common.collect.Sets;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.admin.*;

import java.util.*;
import java.util.concurrent.ExecutionException;

/**
 * 对Topic的CRUD
 */
@Slf4j
public class KafkaAdminTest {

    public static Properties props = new Properties();

    static {
        props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaConstants.BOOTSTRAP_SERVER);
        props.put("request.timeout.ms", 60000);
    }

    public static void main(String[] args) throws ExecutionException, InterruptedException {
        createTopic();
        describeTopic();
    }

    public static void createTopic() throws ExecutionException, InterruptedException {
        String topicName = KafkaConstants.TOPIC_NAME;
        try (AdminClient adminClient = AdminClient.create(props)) {
            /**
             * 2 代表分区
             * 1 代表副本
             */
            NewTopic newTopic = new NewTopic(topicName, 2, (short) 1);
            CreateTopicsResult topics = adminClient.createTopics(Collections.singletonList(newTopic));
            log.info("{}", topics.all().get());
        }
    }

    public static void listTopic() throws ExecutionException, InterruptedException {
        ListTopicsOptions listTopicsOptions = new ListTopicsOptions();
        listTopicsOptions.listInternal(true);
        try (AdminClient adminClient = AdminClient.create(props)) {
            ListTopicsResult listTopicsResult = adminClient.listTopics(listTopicsOptions);
            Collection<TopicListing> topicListings = listTopicsResult.listings().get();
            log.info("{}", topicListings);
            /**
             * [(name=quickstart-events, topicId=rPIXse70QvK3Rri24a-bNg, internal=false), (name=myTopic1, topicId=E6i1TbWXTz-11yKI207ZLA, internal=false), (name=__consumer_offsets, topicId=38T6UsJSRn2BL6tnfj5Wfg, internal=true)]
             */
        }
    }

    public static void deleteTopic() throws ExecutionException, InterruptedException {
        String topicName = KafkaConstants.TOPIC_NAME;
        try (AdminClient adminClient = AdminClient.create(props)) {
            DeleteTopicsResult deleteTopicsResult = adminClient.deleteTopics(Sets.newHashSet(topicName));
            log.info("{}", deleteTopicsResult);
        }
    }

    public static void describeTopic() throws ExecutionException, InterruptedException {
        String topicName = KafkaConstants.TOPIC_NAME;
        try (AdminClient adminClient = AdminClient.create(props)) {
            DescribeTopicsResult topicsResult = adminClient.describeTopics(Arrays.asList(topicName));
            Map<String, TopicDescription> topicDescription = topicsResult.all().get();
            log.info("{}", topicDescription);
            /**
             * {myTopic1=(name=myTopic1, internal=false, partitions=(partition=0, leader=x.x.x.x:9092 (id: 1 rack: null), replicas=x.x.x.x:9092 (id: 1 rack: null), isr=x.x.x.x:9092 (id: 1 rack: null)),(partition=1, leader=x.x.x.x:9092 (id: 1 rack: null), replicas=x.x.x.x:9092 (id: 1 rack: null), isr=x.x.x.x:9092 (id: 1 rack: null)), authorizedOperations=null)}
             */
        }
    }
}

3. 消息生产者

下面重新创建myTopic1。 设置分区位6,副本为1。启动一个消费者进行监听测试:

bin/kafka-console-consumer.sh --topic myTopic1 --from-beginning --bootstrap-server localhost:9092

1. ProducerRecord 介绍

向topic 发送消息的时候是发送这么一条消息。源码如下:

public class ProducerRecord<K, V> {

    private final String topic;
    private final Integer partition;
    private final Headers headers;
    private final K key;
    private final V value;
    private final Long timestamp;

    /**
     * Creates a record with a specified timestamp to be sent to a specified topic and partition
     * 
     * @param topic The topic the record will be appended to
     * @param partition The partition to which the record should be sent
     * @param timestamp The timestamp of the record, in milliseconds since epoch. If null, the producer will assign
     *                  the timestamp using System.currentTimeMillis().
     * @param key The key that will be included in the record
     * @param value The record contents
     * @param headers the headers that will be included in the record
     */
    public ProducerRecord(String topic, Integer partition, Long timestamp, K key, V value, Iterable<Header> headers) {
        if (topic == null)
            throw new IllegalArgumentException("Topic cannot be null.");
        if (timestamp != null && timestamp < 0)
            throw new IllegalArgumentException(
                    String.format("Invalid timestamp: %d. Timestamp should always be non-negative or null.", timestamp));
        if (partition != null && partition < 0)
            throw new IllegalArgumentException(
                    String.format("Invalid partition: %d. Partition number should always be non-negative or null.", partition));
        this.topic = topic;
        this.partition = partition;
        this.key = key;
        this.value = value;
        this.timestamp = timestamp;
        this.headers = new RecordHeaders(headers);
    }

​ 可以看到可以指定partition、key、value、headers,其中只有topic和value是必须的。其逻辑如下:

  1. 若指定Partition ID,则PR被发送至指定Partition
  2. 若未指定Partition ID,但指定了Key, PR会按照hasy(key)发送至对应Partition
  3. 若既未指定Partition ID也没指定Key,PR会按照round-robin模式发送到每个Partition
  4. 若同时指定了Partition ID和Key, PR只会发送到指定的Partition (Key不起作用,代码逻辑决定)

比如发送一条消息如下:

Header header = new RecordHeader("testHeader", "testHeaderValue".getBytes());
                ProducerRecord producerRecord = new ProducerRecord(topic, null, null, "TEST_KEY", msg, Sets.newHashSet(header));

消费者收到的消息如下:(也就是消费者可以拿到header的消息)

topic: myTopic1, partition: 2, offset: 0, key: TEST_KEY, value: testMsg
key: testHeader, value: testHeaderValue

下面发送的消息以及消费者都简单的发送字符串消息,不指定key、也不指定partition、也不指定header。

2. 发送消息

下面代码演示了同步发送、异步发送、基于幂等发送、以及基于事务的发送消息。

package cn.qz.cloud.kafka.client;

import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.builder.ToStringBuilder;
import org.apache.commons.lang3.builder.ToStringStyle;
import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.errors.OutOfOrderSequenceException;
import org.apache.kafka.common.errors.ProducerFencedException;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

@Slf4j
public class Producer {

    private Properties properties = new Properties();

    private KafkaProducer kafkaProducer;

    public Producer() {
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaConstants.BOOTSTRAP_SERVER);
        /**
         * client 的作用是
         */
//        properties.put(ProducerConfig.CLIENT_ID_CONFIG, "client1");
        /**
         * 序列化方法
         */
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.BATCH_SIZE_CONFIG, "16384"); // DEFAULT 16384 = 16K
        /**
         * acks=0 消息发送出去,不管数据是否从Partition Leader上写到磁盘是否成功,直接认为消息发送成功。
         * acks = 1 Partition Leader接收到消息并写入本地磁盘,就认为消息发送成功,不管其他的Follower有没有同步消息
         * acks=all Partition Leader接收到消息之后,必须确认ISR列表里跟Leader保持同步的Follower列表集合都要同步此消息后,客户端才认为消息发送成功
         */
        properties.put(ProducerConfig.ACKS_CONFIG, "all"); // default 1
        properties.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, "3000"); // DEFAULT 3000 ms = 3 s
        // 更多默认值参考: CommonClientConfigs
    }

    /**
     * 简单的发送消息
     */
    public void produce(SendTypeEnum sendTypeEnum, String msg) {
        String topic = KafkaConstants.TOPIC_NAME;
        try {
            kafkaProducer = new KafkaProducer(properties);
            long startTime = System.currentTimeMillis();
            // 异步
            if (sendTypeEnum == SendTypeEnum.ASYNC) {
                kafkaProducer.send(new ProducerRecord(topic, msg), new ProducerCallBack(startTime, msg));
            }
            // 发出去不关心结果
            // 方法返回的是一个Future 对象,不调用get 则不会阻塞
            if (SendTypeEnum.WITHOUT_RESULT == sendTypeEnum) {
                kafkaProducer.send(new ProducerRecord(topic, msg));
            }
            // 同步:org.apache.kafka.clients.producer.KafkaProducer.send(org.apache.kafka.clients.producer.ProducerRecord<K,V>)
            // 方法返回的是一个Future 对象,调用get 则是阻塞等待结果
            if (SendTypeEnum.SYNC_WITH_RESULT == sendTypeEnum) {
                RecordMetadata rm = (RecordMetadata) kafkaProducer.send(new ProducerRecord(topic, msg)).get();
                log.info("rm: {}", ToStringBuilder.reflectionToString(rm, ToStringStyle.NO_CLASS_NAME_STYLE));
            }
        } catch (Exception e) {
            log.error("produce error", e);
        } finally {
            kafkaProducer.close();
        }
    }

    /**
     * 开启幂等性
     *
     * @param msg
     */
    public void produceIdempotence(String msg) {
        // 设置幂等之后,重试次数将变为Integer.MAX_VALUE  次, 且acks 被设为all
        /**
         * Producer ID(即PID)和Sequence Number
         * PID。每个新的Producer在初始化的时候会被分配一个唯一的PID,这个PID对用户是不可见的。
         * Sequence Numbler。(对于每个PID,该Producer发送数据的每个<Topic, Partition>都对应一个从0开始单调递增的Sequence Number。Broker端在缓存中保存了这seq number,对于接收的每条消息,如果其序号比Broker缓存中序号大于1则接受它,否则将其丢弃。这样就可以实现了消息重复提交了。
         * 它只能保证单分区上的幂等性,即一个幂等性Producer 能够保证某个主题的一个分区上不出现重复消息,它无法实现多个分区的幂等性。其次,它只能实现单会话上的幂等性,不能实现跨会话的幂等性。
         */
        properties.put("enable.idempotence", "true");//开启幂等性
        try {
            kafkaProducer = new KafkaProducer(properties);
            long startTime = System.currentTimeMillis();
            kafkaProducer.send(new ProducerRecord(KafkaConstants.TOPIC_NAME, msg, msg), new ProducerCallBack(startTime, msg));
        } catch (Exception e) {
            log.error("", e);
        } finally {
            kafkaProducer.close();
        }
    }

    /**
     * 开启事务
     * 事务是基于PID。
     * transactional.id与producerId在事务管理器中是一一对应关系,即transactional.id作为key,producerId作为value这样的键值对方式存储在事务管理器中。
     * 当producer恢复时,会通过用户自己指定的transactional.id从事务管理器获取producerId,以此来确保幂等性不同会话之间发送数据的幂等性。
     */
    public void produceInTransaction() {
        properties.put("transactional.id", "myTx");
        kafkaProducer = new KafkaProducer(properties);
        kafkaProducer.initTransactions();
        try {
            long startTime = System.currentTimeMillis();
            try {
                kafkaProducer.beginTransaction();
                for (int i = 0; i < 100; i++) {
                    String messageStr = "message_" + i;
                    if (i == 99) {
                        throw new RuntimeException("XXX");
                    }
                    kafkaProducer.send(new ProducerRecord(KafkaConstants.TOPIC_NAME, messageStr, messageStr),
                            new ProducerCallBack(startTime, messageStr));
                }
                kafkaProducer.commitTransaction();
            } catch (ProducerFencedException e) {
                kafkaProducer.close();
                log.error("", e);
            } catch (OutOfOrderSequenceException e) {
                kafkaProducer.close();
                log.error("", e);
            } catch (Exception e) {
                kafkaProducer.abortTransaction();
                log.warn("", e);
            }
        } catch (Exception e) {
            log.error("", e);
        } finally {
            kafkaProducer.close();
        }
    }

    @Slf4j
    private static class ProducerCallBack implements Callback {

        private final long startTime;

        private final String message;

        public ProducerCallBack(long startTime, String message) {
            this.startTime = startTime;
            this.message = message;
        }

        /**
         * 收到Kafka服务端发来的Ack确认消息后,会调用此函数
         *
         * @param metadata 生产者发送消息的元数据,如果发送过程出现异常,此参数为null
         * @param e        发送过程出现的异常,如果发送成功此参数为空
         */
        public void onCompletion(RecordMetadata metadata, Exception e) {
            long elapsedTime = System.currentTimeMillis() - startTime;
            if (metadata != null) {
                log.info("send success! partition:{}, offset:{}, messgage:{}, elapsedTimeMs:{}", metadata.partition(), metadata.offset(), message, elapsedTime);
            } else {
                log.error("", e);
            }
        }
    }

    public enum SendTypeEnum {

        /**
         * Async
         */
        ASYNC,

        /**
         * 不关注结果,发出去就行
         */
        WITHOUT_RESULT,

        /**
         * 同步发送
         */
        SYNC_WITH_RESULT;
    }

    public static void main(String[] args) {
        Producer producer = new Producer();
        for (int i = 0; i < 10; i++) {
            producer.produce(SendTypeEnum.ASYNC, "testMsg" + i);
        }
    }
}

4. 消息消费者

​ 消息有手动提交和异步提交。手动提交需要自己commit然后来记录偏移量,异步提交不需要自己提交offset。

1. 自动提交:

package cn.qz.cloud.kafka.client;

import cn.hutool.core.collection.CollectionUtil;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.header.Headers;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Arrays;
import java.util.Collection;
import java.util.Properties;

@Slf4j
public class Consumer {

    private static Properties properties = new Properties();

    static {
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaConstants.BOOTSTRAP_SERVER); //required
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, KafkaConstants.Concumer.GROUP_ID);
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "300000");//default 300000
        properties.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "500");//default 500
        properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true"); // 设置是否自动提交,设为true之后偏移量会自动记录,不需要自己ack
        properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        properties.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, "4194304"); // 服务端允许的最大消息大小为4MB。
    }

    private KafkaConsumer kafkaConsumer;

    public void consume() {
        kafkaConsumer = new KafkaConsumer(properties);
        kafkaConsumer.subscribe(Arrays.asList(KafkaConstants.TOPIC_NAME), new ConsumerRebalanceListener() {
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
                System.out.println(1);
            }

            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
                System.out.println(2);
            }
        });

        try {
            while (true) {
                ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(20));
                for (ConsumerRecord<String, String> record : records) {
                    log.info("topic: {}, partition: {}, offset: {}, key: {}, value: {}",
                            record.topic(), record.partition(), record.offset(), record.key(), record.value());
                    /**
                     * 如果生产者发送了消息header,消费者可以获取到
                     */
                    Headers headers = record.headers();
                    if (CollectionUtil.isNotEmpty(headers)) {
                        headers.forEach(h -> {
                            log.info("key: {}, value: {}", h.key(), new String(h.value()));
                        });
                    }
                }
            }
        } catch (Exception e) {
            log.error("", e);
        } finally {
            kafkaConsumer.close();
        }

    }

    public static void main(String[] args) throws Exception {
        Consumer consumerDemo = new Consumer();
        consumerDemo.consume();
    }

}

2. 手动提交

package cn.qz.cloud.kafka.client;

import cn.hutool.core.collection.CollectionUtil;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.header.Headers;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Arrays;
import java.util.Collection;
import java.util.Properties;

@Slf4j
public class Consumer {

    private static Properties properties = new Properties();

    static {
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaConstants.BOOTSTRAP_SERVER); //required
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, KafkaConstants.Concumer.GROUP_ID);
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "300000");//default 300000
        properties.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "500");//default 500
        properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false"); // 设置是否自动提交,设为true之后偏移量会自动记录,不需要自己ack
        properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        properties.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, "4194304"); // 服务端允许的最大消息大小为4MB。
    }

    private KafkaConsumer kafkaConsumer;

    public void consume() {
        kafkaConsumer = new KafkaConsumer(properties);
        kafkaConsumer.subscribe(Arrays.asList(KafkaConstants.TOPIC_NAME), new ConsumerRebalanceListener() {
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
                System.out.println(1);
            }

            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
                System.out.println(2);
            }
        });

        try {
            while (true) {
                ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(20));
                for (ConsumerRecord<String, String> record : records) {
                    log.info("topic: {}, partition: {}, offset: {}, key: {}, value: {}",
                            record.topic(), record.partition(), record.offset(), record.key(), record.value());
                    /**
                     * 如果生产者发送了消息header,消费者可以获取到
                     */
                    Headers headers = record.headers();
                    if (CollectionUtil.isNotEmpty(headers)) {
                        headers.forEach(h -> {
                            log.info("key: {}, value: {}", h.key(), new String(h.value()));
                        });
                    }
                }
                // 提交offset
                kafkaConsumer.commitAsync();
            }
        } catch (Exception e) {
            log.error("", e);
        } finally {
            kafkaConsumer.close();
        }

    }

    public static void main(String[] args) throws Exception {
        Consumer consumerDemo = new Consumer();
        consumerDemo.consume();
    }

}

3. 启动consumer 自动设置偏移量

kafkaConsumer.subscribe(Arrays.asList(KafkaConstants.TOPIC_NAME), new ConsumerRebalanceListener() { // 是自动分区,只有在消息过来的时候才会触发回调。 而assign 属于手动分区,两个API 不能同时用; 手动分区可以重置偏移量进行读取。

public void consume() {
        kafkaConsumer = new KafkaConsumer(properties);

        // 自动分配分区. 有消息发送到topic 才会触发到回调listener
        /*kafkaConsumer.subscribe(Arrays.asList(KafkaConstants.TOPIC_NAME), new ConsumerRebalanceListener() {
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
                log.info("onPartitionsRevoked partitions: {}", partitions);
                System.out.println(String.format("onPartitionsRevoked partitions: %s", partitions));
            }

            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
                log.info("onPartitionsAssigned partitions: {}", partitions);
                System.out.println(String.format("onPartitionsAssigned partitions: %s", partitions));
            }
        });*/

        // 指定偏移量
        // assign是手动分配分区, 两者API调用不能同时使用
        // 这里写死2个分区,实际可以动态获取分区数量 (只能获取到这两个分区的消息)
        TopicPartition partition0 = new TopicPartition(KafkaConstants.TOPIC_NAME, 0);
        TopicPartition partition1 = new TopicPartition(KafkaConstants.TOPIC_NAME, 1);
        kafkaConsumer.assign(Arrays.asList(partition0, partition1));
        Set assignment = kafkaConsumer.assignment();
        System.out.println(assignment.size());
        kafkaConsumer.assignment().stream().forEach(System.out::println);
        kafkaConsumer.seek(partition0, 10L); // 为分区0设置偏移量为10
        kafkaConsumer.seek(partition1, 20L); // 为分区1设置偏移量为20

        try {
            while (true) {
                ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(20));
                for (ConsumerRecord<String, String> record : records) {
                    log.info("topic: {}, partition: {}, offset: {}, key: {}, value: {}",
                            record.topic(), record.partition(), record.offset(), record.key(), record.value());
                    /**
                     * 如果生产者发送了消息header,消费者可以获取到
                     */
                    Headers headers = record.headers();
                    if (CollectionUtil.isNotEmpty(headers)) {
                        headers.forEach(h -> {
                            log.info("key: {}, value: {}", h.key(), new String(h.value()));
                        });
                    }
                }
                // 提交offset
                kafkaConsumer.commitAsync();
            }
        } catch (Exception e) {
            log.error("", e);
        } finally {
            kafkaConsumer.close();
        }

    }

4. 开多线程消费

和rocketMQ 不同的是,kafka 不能设置消费者线程数。 可以自己开多线程拉。

import org.apache.kafka.clients.consumer.ConsumerRecord;  
import org.apache.kafka.clients.consumer.ConsumerRecords;  
import org.apache.kafka.clients.consumer.KafkaConsumer;  
import org.apache.kafka.common.serialization.StringDeserializer;  
  
import java.time.Duration;  
import java.util.Arrays;  
import java.util.Properties;  
import java.util.concurrent.ExecutorService;  
import java.util.concurrent.Executors;  
  
public class MultiThreadedConsumer {  
  
    public static void main(String[] args) {  
        Properties props = new Properties();  
        props.put("bootstrap.servers", "localhost:9092");  
        props.put("group.id", "test-group");  
        props.put("key.deserializer", StringDeserializer.class.getName());  
        props.put("value.deserializer", StringDeserializer.class.getName());  
  
        int numThreads = 4; // 假设我们想要4个线程来消费消息  
        ExecutorService executor = Executors.newFixedThreadPool(numThreads);  
  
        for (int i = 0; i < numThreads; i++) {  
            final KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);  
            consumer.subscribe(Arrays.asList("my-topic"));  
  
            executor.submit(() -> {  
                while (true) {  
                    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));  
                    for (ConsumerRecord<String, String> record : records) {  
                        System.out.printf("Thread %d, offset = %d, key = %s, value = %s%n",  
                                Thread.currentThread().getId(), record.offset(), record.key(), record.value());  
                    }  
                }  
            });  
        }  
  
        // 注意:在实际应用中,你需要有一种机制来优雅地关闭线程池和消费者。  
        // 这里只是一个简单的示例,没有包括关闭逻辑。  
    }  
}

5. 手动修改offset 偏移量

这里注意需要修改kafka-client 的版本为:3.3.2

			<dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>3.3.2</version>
        </dependency>

手动获取offset和修改offset:

	public static void getAndResetOffset() throws ExecutionException, InterruptedException {
        try (AdminClient adminClient = AdminClient.create(props)) {
            // 获取特定主题的最新偏移量
            ListConsumerGroupOffsetsResult listConsumerGroupOffsetsResult = adminClient.listConsumerGroupOffsets(KafkaConstants.CONSUMER_GROUP_ID);
            KafkaFuture<Map<TopicPartition, OffsetAndMetadata>> mapKafkaFuture = listConsumerGroupOffsetsResult.partitionsToOffsetAndMetadata();
            Map<TopicPartition, OffsetAndMetadata> partitionOffsets = mapKafkaFuture.get();
            for (Map.Entry<TopicPartition, OffsetAndMetadata> entry : partitionOffsets.entrySet()) {
                TopicPartition topicPartition = entry.getKey();
                OffsetAndMetadata value = entry.getValue();
                log.info("Latest Offset for " + topicPartition + ": " + value.offset());
            }

            /**
             * 所有的消费者都已经关闭(断开kafka进行消费消费)
             * 所有的生产者都已经关闭(不再连接kafka进行生产消息)
             * 重置的分区偏移量时候当前topic必须为XXX-数值的分区(这个一定要存在)
             * admin的api不要带有client.id的属性
             */
            Map<TopicPartition, OffsetAndMetadata> updateMap = new HashMap<>();
            updateMap.put(new TopicPartition(KafkaConstants.TOPIC_NAME, 0), new OffsetAndMetadata(2));
            // 修改分区偏移量
            AlterConsumerGroupOffsetsResult alterConsumerGroupOffsetsResult = adminClient.alterConsumerGroupOffsets(KafkaConstants.CONSUMER_GROUP_ID, updateMap);
            KafkaFuture<Void> all = alterConsumerGroupOffsetsResult.all();
            Void unused = all.get();
            System.out.println("====end");
        }
    }

6. 获取group 对应的消息堆积量

package cn.qz.cloud.kafka.client;

import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.AdminClientConfig;
import org.apache.kafka.clients.admin.KafkaAdminClient;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Map;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class KafkaLagChecker {

    public static void main(String[] args) throws Exception {
        getLag(KafkaConstants.BOOTSTRAP_SERVER, KafkaConstants.CONSUMER_GROUP_ID);
    }

    public static void getLag(String servers, String groupId) {
        Properties properties = new Properties();
        properties.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, servers);
        try (AdminClient adminClient = KafkaAdminClient.create(properties)) {
            Map<TopicPartition, OffsetAndMetadata> offsetAndMetadataMap = adminClient.listConsumerGroupOffsets(groupId)
                    .partitionsToOffsetAndMetadata().get(10, TimeUnit.SECONDS);
            properties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
            properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
            properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
            try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties)) {
                Map<TopicPartition, Long> endOffsets = consumer.endOffsets(offsetAndMetadataMap.keySet());
                offsetAndMetadataMap.forEach((key, value) ->
                        System.out.printf("topic: [%s] partition:[%d] lag:[%d]", key.topic(), key.partition(), endOffsets.get(key) - value.offset()));
            }
        } catch (ExecutionException | InterruptedException | TimeoutException exception) {
            exception.printStackTrace();
        }
    }

}

2. springboot 项目测试kafka

  1. pom配置引入kafka
        <dependency>
            <groupId>org.springframework.kafka</groupId>
            <artifactId>spring-kafka</artifactId>
        </dependency>
  1. 新增kafka相关配置
server:
  port: 8080

spring:
  #kafka配置
  kafka:
    #这里改为你的kafka服务器ip和端口号
    bootstrap-servers: xxx:9092
    #=============== producer  =======================
    producer:
      #如果该值大于零时,表示启用重试失败的发送次数
      retries: 0
      #每当多个记录被发送到同一分区时,生产者将尝试将记录一起批量处理为更少的请求,默认值为16384(单位字节)
      batch-size: 16384
      #生产者可用于缓冲等待发送到服务器的记录的内存总字节数,默认值为3355443
      buffer-memory: 33554432
      #key的Serializer类,实现类实现了接口org.apache.kafka.common.serialization.Serializer
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      #value的Serializer类,实现类实现了接口org.apache.kafka.common.serialization.Serializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer
    #=============== consumer  =======================
    consumer:
      #用于标识此使用者所属的使用者组的唯一字符串
      group-id: test-consumer-group
      #当Kafka中没有初始偏移量或者服务器上不再存在当前偏移量时该怎么办,默认值为latest,表示自动将偏移重置为最新的偏移量
      #可选的值为latest, earliest, none
      auto-offset-reset: earliest
      #消费者的偏移量将在后台定期提交,默认值为true
      enable-auto-commit: true
      #如果'enable-auto-commit'为true,则消费者偏移自动提交给Kafka的频率(以毫秒为单位),默认值为5000。
      auto-commit-interval: 100
      #密钥的反序列化器类,实现类实现了接口org.apache.kafka.common.serialization.Deserializer
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      #值的反序列化器类,实现类实现了接口org.apache.kafka.common.serialization.Deserializer
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
  1. 增加类:生产者、消费者
package cn.qz.cloud.kafka.springboot.springboot;

import cn.qz.cloud.kafka.client.KafkaConstants;
import com.google.common.collect.Lists;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.NewTopic;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.core.KafkaAdmin;

import javax.annotation.PostConstruct;
import java.util.ArrayList;
import java.util.List;

@Configuration
public class kafkaConfig {

    @Autowired
    private KafkaAdmin kafkaAdmin;

    @PostConstruct
    public void init() {
        /**
         * init topic
         */
        AdminClient adminClient = AdminClient.create(kafkaAdmin.getConfig());
        adminClient.deleteTopics(Lists.newArrayList(KafkaConstants.TOPIC_NAME));
        List<NewTopic> topics = new ArrayList<>();
        topics.add(new NewTopic(KafkaConstants.TOPIC_NAME, 3, (short) 1));
        adminClient.createTopics(topics);
        System.out.println("创建topic成功");
    }
}
===
  
package cn.qz.cloud.kafka.springboot.springboot;

import cn.qz.cloud.kafka.client.KafkaConstants;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.support.SendResult;
import org.springframework.util.concurrent.ListenableFuture;
import org.springframework.util.concurrent.ListenableFutureCallback;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping
public class Producer {

    @Autowired
    private KafkaTemplate<String, Object> kafkaTemplate;

    @GetMapping("/index")
    public String index() {
        return "index";
    }

    @GetMapping("/send-msg")
    public String send(@RequestParam String msg) {
        //生产消息
        ListenableFuture<SendResult<String, Object>> listenableFuture = kafkaTemplate.send(KafkaConstants.TOPIC_NAME, msg, msg);
        listenableFuture.addCallback(new ListenableFutureCallback<SendResult<String, Object>>() {
            @Override
            public void onFailure(Throwable throwable) {
                throwable.printStackTrace();
            }

            @Override
            public void onSuccess(SendResult<String, Object> stringObjectSendResult) {
                System.out.println(stringObjectSendResult);
            }
        });
        return msg;
    }

}

===
package cn.qz.cloud.kafka.springboot.springboot;

import cn.qz.cloud.kafka.client.KafkaConstants;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Component;

@Component
public class Consumer {

    /**
     * org.springframework.kafka.annotation.KafkaListener 可以指定分区,指定groupId 等参数
     *
     * @param record
     */
    @KafkaListener(topics = {KafkaConstants.TOPIC_NAME})
    public void handMessage(ConsumerRecord<String, String> record) {
        String topic = record.topic();
        String msg = record.value();
        System.out.println("消费者接受消息:topic-->" + topic + ",msg->>" + msg);
    }
}  

关于配置参考:

org.springframework.boot.autoconfigure.kafka.KafkaProperties

3. 关于kafka 的分区

1. Kafka 的分区数量可以修改:

[root@VM-8-16-centos kafka_2.13-3.3.1]# bin/kafka-topics.sh --describe --topic myTopic1 --bootstrap-server localhost:9092
Topic: myTopic1	TopicId: 9LsqbI1dRVelPxx-3FJ9lw	PartitionCount: 3	ReplicationFactor: 1	Configs: segment.bytes=1073741824
	Topic: myTopic1	Partition: 0	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 1	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 2	Leader: 1	Replicas: 1	Isr: 1
[root@VM-8-16-centos kafka_2.13-3.3.1]# bin/kafka-topics.sh --alter --topic myTopic1 --bootstrap-server localhost:9092 --partitions 12
[root@VM-8-16-centos kafka_2.13-3.3.1]# bin/kafka-topics.sh --describe --topic myTopic1 --bootstrap-server localhost:9092
Topic: myTopic1	TopicId: 9LsqbI1dRVelPxx-3FJ9lw	PartitionCount: 12	ReplicationFactor: 1	Configs: segment.bytes=1073741824
	Topic: myTopic1	Partition: 0	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 1	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 2	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 3	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 4	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 5	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 6	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 7	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 8	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 9	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 10	Leader: 1	Replicas: 1	Isr: 1
	Topic: myTopic1	Partition: 11	Leader: 1	Replicas: 1	Isr: 1

2. kafka 的分区策略如下

如果是kafka-client,取分区的默认实现是:org.apache.kafka.clients.producer.internals.DefaultPartitioner

package org.apache.kafka.clients.producer.internals;

import java.util.List;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.utils.Utils;

public class DefaultPartitioner implements Partitioner {
    private final ConcurrentMap<String, AtomicInteger> topicCounterMap = new ConcurrentHashMap();

    public DefaultPartitioner() {
    }

    public void configure(Map<String, ?> configs) {
    }

    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        if (keyBytes == null) {
            int nextValue = this.nextValue(topic);
            List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() > 0) {
                int part = Utils.toPositive(nextValue) % availablePartitions.size();
                return ((PartitionInfo)availablePartitions.get(part)).partition();
            } else {
                return Utils.toPositive(nextValue) % numPartitions;
            }
        } else {
            return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
        }
    }

    private int nextValue(String topic) {
        AtomicInteger counter = (AtomicInteger)this.topicCounterMap.get(topic);
        if (null == counter) {
            counter = new AtomicInteger(ThreadLocalRandom.current().nextInt());
            AtomicInteger currentCounter = (AtomicInteger)this.topicCounterMap.putIfAbsent(topic, counter);
            if (currentCounter != null) {
                counter = currentCounter;
            }
        }

        return counter.getAndIncrement();
    }

    public void close() {
    }
}

这里可以看到如果有key,会将key进行计算得到值,然后转为整数,和分区数量取模做运算;如果没传,类似轮询的方式发送。

调用分区是在:

org.apache.kafka.clients.producer.KafkaProducer#send(org.apache.kafka.clients.producer.ProducerRecord<K,V>)
->
org.apache.kafka.clients.producer.KafkaProducer#doSend
->
org.apache.kafka.clients.producer.KafkaProducer#partition 源码如下:
    private int partition(ProducerRecord<K, V> record, byte[] serializedKey, byte[] serializedValue, Cluster cluster) {
        Integer partition = record.partition();
        return partition != null ? partition : this.partitioner.partition(record.topic(), record.key(), serializedKey, record.value(), serializedValue, cluster);
    }

3. 自定义自己的分区策略

  1. 新建实现类:一直送到分区0
package cn.qz.cloud.kafka.client;

import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;

import java.util.Map;

public class CustomPartitioner implements Partitioner {

    @Override
    public int partition(String s, Object o, byte[] bytes, Object o1, byte[] bytes1, Cluster cluster) {
        return 0;
    }

    @Override
    public void close() {

    }

    @Override
    public void configure(Map<String, ?> map) {

    }
}
  1. 生产者配置指定分区策略
properties.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, "cn.qz.cloud.kafka.client.CustomPartitioner");

4. 和ES分片区别

ES不能修改分片原因:https://blog.csdn.net/w1014074794/article/details/119802550

1.kafka很容易的通过管理工具增加新的分区,这种方式只会对指定了key的消息产生影响,但是这种影响其实不大,因为消费者其实还是能消费到全部的消息
2.相比较之下es不支持增加分区,原因在于es的查询流程中:query phase–fetch phase,fetch phase的情况下是根据id去获取文档的,如果此时分区数变化了,那么就会有很多id获取不到文档数据,而其实这个文档数据是存在于es的另外的分片中的,所以es并不支持在线增加分区

解释:

1.ES你先当它是个数据库,然后,你设想一种场景,你程序里自定义分库分表规则,按uid分片,uid尾号为0的在0号库,尾号1的在1号库,以此类推,你一共分了10个库。

OK,现在要加第11个库,从改了规则那一刻,就需要有数据迁移,数据迁移的过程,你如果要做到平滑,人为完成都非常麻烦。

  1. Kafka本身就要是订阅某个主题,然后会有一个group cordinator来分配机器A消费分区1,机器B消费分区2

本身就是按分区来消费的,无论扩缩容,就不存在问题。

4. kafka 客户端原理

我们配置的时候配置的是kakfa 集群节点,研究下kafka 客户端建立连接以及发数据的过程。

kafka 自己实现了一套类似于netty 的nio 框架用于和kafka brokers 通信。

1. 生产者逻辑

大体逻辑:

  1. 建立KafkaProducer ,框架会解析集群地址以及相关的参数并且进行初始化;
  2. 业务调send 发数据,send 内部会选择分区,组装内部需要的BO对象,然后添加到本地缓存并且通知nio 发生写事件进行更新
  3. 框架内部用nio 进行写(如果第一次写,会与集群的节点建立socket连接然后选择一个节点进行发送)

1. 创建KafkaProducer 解析集群地址并进行初始化

​ new KafkaProducer(properties) 进行初始化,解析相关的集群信息。 第一次发消息的时候会进行socket 连接,底层用的自己的nio。

private KafkaProducer(ProducerConfig config, Serializer<K> keySerializer, Serializer<V> valueSerializer) {
        try {
            Map<String, Object> userProvidedConfigs = config.originals();
            this.producerConfig = config;
            this.time = Time.SYSTEM;
            String clientId = config.getString(ProducerConfig.CLIENT_ID_CONFIG);
            if (clientId.length() <= 0)
                clientId = "producer-" + PRODUCER_CLIENT_ID_SEQUENCE.getAndIncrement();
            this.clientId = clientId;

            String transactionalId = userProvidedConfigs.containsKey(ProducerConfig.TRANSACTIONAL_ID_CONFIG) ?
                    (String) userProvidedConfigs.get(ProducerConfig.TRANSACTIONAL_ID_CONFIG) : null;
            LogContext logContext;
            if (transactionalId == null)
                logContext = new LogContext(String.format("[Producer clientId=%s] ", clientId));
            else
                logContext = new LogContext(String.format("[Producer clientId=%s, transactionalId=%s] ", clientId, transactionalId));
            log = logContext.logger(KafkaProducer.class);
            log.trace("Starting the Kafka producer");

            Map<String, String> metricTags = Collections.singletonMap("client-id", clientId);
            MetricConfig metricConfig = new MetricConfig().samples(config.getInt(ProducerConfig.METRICS_NUM_SAMPLES_CONFIG))
                    .timeWindow(config.getLong(ProducerConfig.METRICS_SAMPLE_WINDOW_MS_CONFIG), TimeUnit.MILLISECONDS)
                    .recordLevel(Sensor.RecordingLevel.forName(config.getString(ProducerConfig.METRICS_RECORDING_LEVEL_CONFIG)))
                    .tags(metricTags);
            List<MetricsReporter> reporters = config.getConfiguredInstances(ProducerConfig.METRIC_REPORTER_CLASSES_CONFIG,
                    MetricsReporter.class);
            reporters.add(new JmxReporter(JMX_PREFIX));
            this.metrics = new Metrics(metricConfig, reporters, time);
            ProducerMetrics metricsRegistry = new ProducerMetrics(this.metrics);
            this.partitioner = config.getConfiguredInstance(ProducerConfig.PARTITIONER_CLASS_CONFIG, Partitioner.class);
            long retryBackoffMs = config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG);
            if (keySerializer == null) {
                this.keySerializer = ensureExtended(config.getConfiguredInstance(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
                                                                                         Serializer.class));
                this.keySerializer.configure(config.originals(), true);
            } else {
                config.ignore(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG);
                this.keySerializer = ensureExtended(keySerializer);
            }
            if (valueSerializer == null) {
                this.valueSerializer = ensureExtended(config.getConfiguredInstance(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
                                                                                           Serializer.class));
                this.valueSerializer.configure(config.originals(), false);
            } else {
                config.ignore(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG);
                this.valueSerializer = ensureExtended(valueSerializer);
            }

            // load interceptors and make sure they get clientId
            userProvidedConfigs.put(ProducerConfig.CLIENT_ID_CONFIG, clientId);
            List<ProducerInterceptor<K, V>> interceptorList = (List) (new ProducerConfig(userProvidedConfigs, false)).getConfiguredInstances(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG,
                    ProducerInterceptor.class);
            this.interceptors = interceptorList.isEmpty() ? null : new ProducerInterceptors<>(interceptorList);
            ClusterResourceListeners clusterResourceListeners = configureClusterResourceListeners(keySerializer, valueSerializer, interceptorList, reporters);
            this.metadata = new Metadata(retryBackoffMs, config.getLong(ProducerConfig.METADATA_MAX_AGE_CONFIG),
                    true, true, clusterResourceListeners);
            this.maxRequestSize = config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG);
            this.totalMemorySize = config.getLong(ProducerConfig.BUFFER_MEMORY_CONFIG);
            this.compressionType = CompressionType.forName(config.getString(ProducerConfig.COMPRESSION_TYPE_CONFIG));

            this.maxBlockTimeMs = config.getLong(ProducerConfig.MAX_BLOCK_MS_CONFIG);
            this.requestTimeoutMs = config.getInt(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG);
            this.transactionManager = configureTransactionState(config, logContext, log);
            int retries = configureRetries(config, transactionManager != null, log);
            int maxInflightRequests = configureInflightRequests(config, transactionManager != null);
            short acks = configureAcks(config, transactionManager != null, log);

            this.apiVersions = new ApiVersions();
            this.accumulator = new RecordAccumulator(logContext,
                    config.getInt(ProducerConfig.BATCH_SIZE_CONFIG),
                    this.totalMemorySize,
                    this.compressionType,
                    config.getLong(ProducerConfig.LINGER_MS_CONFIG),
                    retryBackoffMs,
                    metrics,
                    time,
                    apiVersions,
                    transactionManager);
            List<InetSocketAddress> addresses = ClientUtils.parseAndValidateAddresses(config.getList(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG));
            this.metadata.update(Cluster.bootstrap(addresses), Collections.<String>emptySet(), time.milliseconds());
            ChannelBuilder channelBuilder = ClientUtils.createChannelBuilder(config);
            Sensor throttleTimeSensor = Sender.throttleTimeSensor(metricsRegistry.senderMetrics);
            NetworkClient client = new NetworkClient(
                    new Selector(config.getLong(ProducerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG),
                            this.metrics, time, "producer", channelBuilder, logContext),
                    this.metadata,
                    clientId,
                    maxInflightRequests,
                    config.getLong(ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG),
                    config.getLong(ProducerConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG),
                    config.getInt(ProducerConfig.SEND_BUFFER_CONFIG),
                    config.getInt(ProducerConfig.RECEIVE_BUFFER_CONFIG),
                    this.requestTimeoutMs,
                    time,
                    true,
                    apiVersions,
                    throttleTimeSensor,
                    logContext);
            this.sender = new Sender(logContext,
                    client,
                    this.metadata,
                    this.accumulator,
                    maxInflightRequests == 1,
                    config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG),
                    acks,
                    retries,
                    metricsRegistry.senderMetrics,
                    Time.SYSTEM,
                    this.requestTimeoutMs,
                    config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG),
                    this.transactionManager,
                    apiVersions);
            String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;
            this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
            this.ioThread.start();
            this.errors = this.metrics.sensor("errors");
            config.logUnused();
            AppInfoParser.registerAppInfo(JMX_PREFIX, clientId, metrics);
            log.debug("Kafka producer started");
        } catch (Throwable t) {
            // call close methods if internal objects are already constructed this is to prevent resource leak. see KAFKA-2121
            close(0, TimeUnit.MILLISECONDS, true);
            // now propagate the exception
            throw new KafkaException("Failed to construct kafka producer", t);
        }
    }
  1. 解析配置,如果没有clientId,用自己序列号递增生成一个。
  2. 解析集群地址, 并将集群信息转换为Cluster 对象保存到metadata 属性
1. 解析地址 ClientUtils.parseAndValidateAddresses(config.getList(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG))
  
public static List<InetSocketAddress> parseAndValidateAddresses(List<String> urls) {
        List<InetSocketAddress> addresses = new ArrayList<>();
        for (String url : urls) {
            if (url != null && !url.isEmpty()) {
                try {
                    String host = getHost(url);
                    Integer port = getPort(url);
                    if (host == null || port == null)
                        throw new ConfigException("Invalid url in " + CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG + ": " + url);

                    InetSocketAddress address = new InetSocketAddress(host, port);

                    if (address.isUnresolved()) {
                        log.warn("Removing server {} from {} as DNS resolution failed for {}", url, CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, host);
                    } else {
                        addresses.add(address);
                    }
                } catch (IllegalArgumentException e) {
                    throw new ConfigException("Invalid port in " + CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG + ": " + url);
                }
            }
        }
        if (addresses.isEmpty())
            throw new ConfigException("No resolvable bootstrap urls given in " + CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG);
        return addresses;
    }

2. 转换成Cluster 对象
    public static Cluster bootstrap(List<InetSocketAddress> addresses) {
        List<Node> nodes = new ArrayList<>();
        int nodeId = -1;
        for (InetSocketAddress address : addresses)
            nodes.add(new Node(nodeId--, address.getHostString(), address.getPort()));
        return new Cluster(null, true, nodes, new ArrayList<PartitionInfo>(0), Collections.<String>emptySet(), Collections.<String>emptySet(), null);
    }  
  1. 创建一个 NetworkClient 对象,内部包含原信息等信息

  2. 创建一个sender 对象, 包含上面networkClisnt 对象

  3. 创建一个 ioThread, 其实是包装sender,然后调用线程启动方法, 会调用到sender.run 方法

2. 业务发数据,暂存到内存

业务调send 发数据,send 方法: 选择分区,准备数据,添加到本地缓存 (业务程序调用的方法)

1. 最终调用到 org.apache.kafka.clients.producer.KafkaProducer#doSend
  private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
        // Append callback takes care of the following:
        //  - call interceptors and user callback on completion
        //  - remember partition that is calculated in RecordAccumulator.append
        AppendCallbacks<K, V> appendCallbacks = new AppendCallbacks<K, V>(callback, this.interceptors, record);

        try {
            throwIfProducerClosed();
            // first make sure the metadata for the topic is available
            long nowMs = time.milliseconds();
            ClusterAndWaitTime clusterAndWaitTime;
            try {
                clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), nowMs, maxBlockTimeMs);
            } catch (KafkaException e) {
                if (metadata.isClosed())
                    throw new KafkaException("Producer closed while send in progress", e);
                throw e;
            }
            nowMs += clusterAndWaitTime.waitedOnMetadataMs;
            long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
            Cluster cluster = clusterAndWaitTime.cluster;
            byte[] serializedKey;
            try {
                serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in key.serializer", cce);
            }
            byte[] serializedValue;
            try {
                serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in value.serializer", cce);
            }

            // Try to calculate partition, but note that after this call it can be RecordMetadata.UNKNOWN_PARTITION,
            // which means that the RecordAccumulator would pick a partition using built-in logic (which may
            // take into account broker load, the amount of data produced to each partition, etc.).
            int partition = partition(record, serializedKey, serializedValue, cluster);

            setReadOnly(record.headers());
            Header[] headers = record.headers().toArray();

            int serializedSize = AbstractRecords.estimateSizeInBytesUpperBound(apiVersions.maxUsableProduceMagic(),
                    compressionType, serializedKey, serializedValue, headers);
            ensureValidRecordSize(serializedSize);
            long timestamp = record.timestamp() == null ? nowMs : record.timestamp();

            // A custom partitioner may take advantage on the onNewBatch callback.
            boolean abortOnNewBatch = partitioner != null;

            // Append the record to the accumulator.  Note, that the actual partition may be
            // calculated there and can be accessed via appendCallbacks.topicPartition.
            RecordAccumulator.RecordAppendResult result = accumulator.append(record.topic(), partition, timestamp, serializedKey,
                    serializedValue, headers, appendCallbacks, remainingWaitMs, abortOnNewBatch, nowMs, cluster);
            assert appendCallbacks.getPartition() != RecordMetadata.UNKNOWN_PARTITION;

            if (result.abortForNewBatch) {
                int prevPartition = partition;
                onNewBatch(record.topic(), cluster, prevPartition);
                partition = partition(record, serializedKey, serializedValue, cluster);
                if (log.isTraceEnabled()) {
                    log.trace("Retrying append due to new batch creation for topic {} partition {}. The old partition was {}", record.topic(), partition, prevPartition);
                }
                result = accumulator.append(record.topic(), partition, timestamp, serializedKey,
                    serializedValue, headers, appendCallbacks, remainingWaitMs, false, nowMs, cluster);
            }

            // Add the partition to the transaction (if in progress) after it has been successfully
            // appended to the accumulator. We cannot do it before because the partition may be
            // unknown or the initially selected partition may be changed when the batch is closed
            // (as indicated by `abortForNewBatch`). Note that the `Sender` will refuse to dequeue
            // batches from the accumulator until they have been added to the transaction.
            if (transactionManager != null) {
                transactionManager.maybeAddPartition(appendCallbacks.topicPartition());
            }

            if (result.batchIsFull || result.newBatchCreated) {
                log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), appendCallbacks.getPartition());
                this.sender.wakeup();
            }
            return result.future;
            // handling exceptions and record the errors;
            // for API exceptions return them in the future,
            // for other exceptions throw directly
        } catch (ApiException e) {
            log.debug("Exception occurred during message send:", e);
            if (callback != null) {
                TopicPartition tp = appendCallbacks.topicPartition();
                RecordMetadata nullMetadata = new RecordMetadata(tp, -1, -1, RecordBatch.NO_TIMESTAMP, -1, -1);
                callback.onCompletion(nullMetadata, e);
            }
            this.errors.record();
            this.interceptors.onSendError(record, appendCallbacks.topicPartition(), e);
            if (transactionManager != null) {
                transactionManager.maybeTransitionToErrorState(e);
            }
            return new FutureFailure(e);
        } catch (InterruptedException e) {
            this.errors.record();
            this.interceptors.onSendError(record, appendCallbacks.topicPartition(), e);
            throw new InterruptException(e);
        } catch (KafkaException e) {
            this.errors.record();
            this.interceptors.onSendError(record, appendCallbacks.topicPartition(), e);
            throw e;
        } catch (Exception e) {
            // we notify interceptor about all exceptions, since onSend is called before anything else in this method
            this.interceptors.onSendError(record, appendCallbacks.topicPartition(), e);
            throw e;
        }
    }

2. 逻辑解释:
  1》获取集群信息Cluster
  2》对k、v 进行序列化 (record 可以包含key和value)
  3》选择分区
  4》调用org.apache.kafka.clients.producer.internals.RecordAccumulator#append 将信息缓存到本地

3. Sender#run() while 循环发数据拉结果

	public void run() {
        log.debug("Starting Kafka producer I/O thread.");

        // main loop, runs until close is called
        while (running) {
            try {
                runOnce();
            } catch (Exception e) {
                log.error("Uncaught error in kafka producer I/O thread: ", e);
            }
        }

        log.debug("Beginning shutdown of Kafka producer I/O thread, sending remaining records.");

        // okay we stopped accepting requests but there may still be
        // requests in the transaction manager, accumulator or waiting for acknowledgment,
        // wait until these are completed.
        while (!forceClose && ((this.accumulator.hasUndrained() || this.client.inFlightRequestCount() > 0) || hasPendingTransactionalRequests())) {
            try {
                runOnce();
            } catch (Exception e) {
                log.error("Uncaught error in kafka producer I/O thread: ", e);
            }
        }

        // Abort the transaction if any commit or abort didn't go through the transaction manager's queue
        while (!forceClose && transactionManager != null && transactionManager.hasOngoingTransaction()) {
            if (!transactionManager.isCompleting()) {
                log.info("Aborting incomplete transaction due to shutdown");
                transactionManager.beginAbort();
            }
            try {
                runOnce();
            } catch (Exception e) {
                log.error("Uncaught error in kafka producer I/O thread: ", e);
            }
        }

        if (forceClose) {
            // We need to fail all the incomplete transactional requests and batches and wake up the threads waiting on
            // the futures.
            if (transactionManager != null) {
                log.debug("Aborting incomplete transactional requests due to forced shutdown");
                transactionManager.close();
            }
            log.debug("Aborting incomplete batches due to forced shutdown");
            this.accumulator.abortIncompleteBatches();
        }
        try {
            this.client.close();
        } catch (Exception e) {
            log.error("Failed to close network client", e);
        }

        log.debug("Shutdown of Kafka producer I/O thread has completed.");
    }

​ 这里有数据是先缓存到内存,然后这里的while 循环轮询本地内存数据然后选择node 进行发送。

  1. 如果有需要更新,会先准备节点,建立socket连接(只有第一次需要准备,多个node 先建立连接,后面就是直接复用)
1. 建立连接方法:org.apache.kafka.common.network.Selector#connect

	public void connect(String id, InetSocketAddress address, int sendBufferSize, int receiveBufferSize) throws IOException {
        if (this.channels.containsKey(id))
            throw new IllegalStateException("There is already a connection for id " + id);
        if (this.closingChannels.containsKey(id))
            throw new IllegalStateException("There is already a connection for id " + id + " that is still being closed");

        SocketChannel socketChannel = SocketChannel.open();
        socketChannel.configureBlocking(false);
        Socket socket = socketChannel.socket();
        socket.setKeepAlive(true);
        if (sendBufferSize != Selectable.USE_DEFAULT_BUFFER_SIZE)
            socket.setSendBufferSize(sendBufferSize);
        if (receiveBufferSize != Selectable.USE_DEFAULT_BUFFER_SIZE)
            socket.setReceiveBufferSize(receiveBufferSize);
        socket.setTcpNoDelay(true);
        boolean connected;
        try {
            connected = socketChannel.connect(address);
        } catch (UnresolvedAddressException e) {
            socketChannel.close();
            throw new IOException("Can't resolve address: " + address, e);
        } catch (IOException e) {
            socketChannel.close();
            throw e;
        }
        SelectionKey key = socketChannel.register(nioSelector, SelectionKey.OP_CONNECT);
        KafkaChannel channel = buildChannel(socketChannel, id, key);

        if (connected) {
            // OP_CONNECT won't trigger for immediately connected channels
            log.debug("Immediately connected to node {}", channel.id());
            immediatelyConnectedKeys.add(key);
            key.interestOps(0);
        }
    }  
2. 方法调用链:
connect:204, Selector (org.apache.kafka.common.network)
initiateConnect:793, NetworkClient (org.apache.kafka.clients)
access$700:62, NetworkClient (org.apache.kafka.clients)
maybeUpdate:944, NetworkClient$DefaultMetadataUpdater (org.apache.kafka.clients)
maybeUpdate:848, NetworkClient$DefaultMetadataUpdater (org.apache.kafka.clients)
poll:458, NetworkClient (org.apache.kafka.clients)
run:239, Sender (org.apache.kafka.clients.producer.internals)
run:163, Sender (org.apache.kafka.clients.producer.internals)
run:750, Thread (java.lang)  
  1. 框架选择节点发数据 - 准备数据并且激活写发事件到selector
1. 调用到 org.apache.kafka.clients.NetworkClient#doSend(org.apache.kafka.clients.ClientRequest, boolean, long, org.apache.kafka.common.requests.AbstractRequest)

2. 调用链:
doSend:533, NetworkClient (org.apache.kafka.clients)
doSend:500, NetworkClient (org.apache.kafka.clients)
sendInternalMetadataRequest:466, NetworkClient (org.apache.kafka.clients)
maybeUpdate:1146, NetworkClient$DefaultMetadataUpdater (org.apache.kafka.clients)
maybeUpdate:1051, NetworkClient$DefaultMetadataUpdater (org.apache.kafka.clients)
poll:558, NetworkClient (org.apache.kafka.clients)
awaitReady:73, NetworkClientUtils (org.apache.kafka.clients)
awaitNodeReady:534, Sender (org.apache.kafka.clients.producer.internals)
maybeSendAndPollTransactionalRequest:455, Sender (org.apache.kafka.clients.producer.internals)
runOnce:316, Sender (org.apache.kafka.clients.producer.internals)
run:243, Sender (org.apache.kafka.clients.producer.internals)
run:750, Thread (java.lang)


  1. 框架发数据-进行写
1. 方法
org.apache.kafka.common.network.ByteBufferSend#writeTo
	public long writeTo(TransferableChannel channel) throws IOException {
        long written = channel.write(buffers);
        if (written < 0)
            throw new EOFException("Wrote negative bytes to channel. This shouldn't happen.");
        remaining -= written;
        pending = channel.hasPendingWrites();
        return written;
    }

2. 调用链:
writeTo:58, ByteBufferSend (org.apache.kafka.common.network)
writeTo:41, NetworkSend (org.apache.kafka.common.network)
write:430, KafkaChannel (org.apache.kafka.common.network)
write:644, Selector (org.apache.kafka.common.network)
attemptWrite:637, Selector (org.apache.kafka.common.network)
pollSelectionKeys:593, Selector (org.apache.kafka.common.network)
poll:481, Selector (org.apache.kafka.common.network)
poll:560, NetworkClient (org.apache.kafka.clients)
awaitReady:73, NetworkClientUtils (org.apache.kafka.clients)
awaitNodeReady:534, Sender (org.apache.kafka.clients.producer.internals)
maybeSendAndPollTransactionalRequest:455, Sender (org.apache.kafka.clients.producer.internals)
runOnce:316, Sender (org.apache.kafka.clients.producer.internals)
run:243, Sender (org.apache.kafka.clients.producer.internals)
run:750, Thread (java.lang) 

2. 消费者逻辑

消费者逻辑同生产者,也是启动的时候解析集群信息,并且维护到属性中。 具体可以参考:org.apache.kafka.clients.consumer.KafkaConsumer#KafkaConsumer(org.apache.kafka.clients.consumer.ConsumerConfig, org.apache.kafka.common.serialization.Deserializer, org.apache.kafka.common.serialization.Deserializer)

posted @ 2022-12-11 15:30  QiaoZhi  阅读(242)  评论(0编辑  收藏  举报