Kafka 消费者(一)

1、kafka 消费方式

  1. pull(拉)模式:Kafka 消费方式消费速度:10m/s消费速度:20m/s消费速度:50m/s➢push(推)模式:consumer采用从broker中主动拉取数据。Kafka采用这种方式。
  2. push(推)模式:Kafka没有采用这种方式,因为由broker决定消息发送速率,很难适应所有消费者的消费速率。例如推送的速度是50m/s,Consumer1、Consumer2就来不及处理消息。

pull模式不足之处是,如果Kafka没有数据,消费者可能会陷入循环中,一直返回空数据。

2、kafka 消费者工作流程

2.1、消费者总体工作流程

2.2、消费者组原理

ConsumerGroup(CG):消费者组,由多个consumer组成。形成一个消费者组的条件,是所有消费者的groupid相同。

  1. 消费者组内每个消费者负责消费不同分区的数据,一个分区只能由一个组内消费者消费。
  2. 消费者组之间互不影响。所有的消费者都属于某个消费者组,即消费者组是逻辑上的一个订阅者

消费者初始化流程

coordinator:辅助实现消费者组的初始化和分区的分配。coordinator节点选择= groupid的hashcode值% 50(__consumer_offsets的分区数量)例如:groupid的hashcode值= 1,1% 50 = 1,那么__consumer_offsets主题的1号分区,在哪个broker上,就选择这个节点的coordinator作为这个消费者组的老大。消费者组下的所有的消费者提交offset的时候就往这个分区去提交offset。 

 2.3、消费者参数

3、消费者API

3.1、独立消费者案例(订阅主题)

 创建一个独立消费者,消费 lhc 主题中数据。

 注意:在消费者API代码中必须配置消费者组id。命令行启动消费者不填写消费者组id会被自动填写随机的消费者组id。

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.ArrayList;
import java.util.Properties;

public class CustomeConsumer {
    public static void main(String[] args) {
        Properties properties = new Properties();
        //连接
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "hadoop103:9092,hadoop104:9092,hadoop105:9092");
        //反序列化
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        //配置消费者组ID
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, "test");
        //创建消费者
        KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
        //订阅主题
        ArrayList<String> topics = new ArrayList<>();
        topics.add("lhc");
        kafkaConsumer.subscribe(topics);
        //消费数据
        while (true) {
            ConsumerRecords<String, String> consumerRecords = kafkaConsumer.poll(Duration.ofSeconds(1));
            for (ConsumerRecord<String, String> consumerRecord : consumerRecords) {
                System.out.println(consumerRecord);
            }
        }
    }
}
View Code

运行之前的生成者代码

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

public class CustomProducerCallBack {
    public static void main(String[] args) throws InterruptedException {
        Properties properties = new Properties();
        //连接ZK
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "hadoop103:9092,hadoop104:9092,hadoop105:9092");
        //设置KV序列化
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        //指定 kv 的序列化类型
        //1、创建 生产者
        KafkaProducer<String, String> KafkaProducer = new KafkaProducer<String, String>(properties);
        //2、发送数据 put异步发送
        for (int i = 0; i < 5; i++) {
            KafkaProducer.send(new ProducerRecord<>("lhc", i + "  hello wdh01"), new Callback() {
                // new Callback( 回调函数
                @Override
                public void onCompletion(RecordMetadata metadata, Exception exception) {
                    if (exception == null) {
                        System.out.println("主题 " + metadata.topic() + " 分区 " + metadata.partition());
                    }
                }
            });
            Thread.sleep(1);
        }
        //3、关闭资源
        KafkaProducer.close();
    }
}
View Code
主题 lhc 分区 2
主题 lhc 分区 2
主题 lhc 分区 2
主题 lhc 分区 2
主题 lhc 分区 2

消费结果

ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 0, CreateTime = 1649064284368, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 0  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 1, CreateTime = 1649064284483, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 1  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 2, CreateTime = 1649064284485, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 2  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 3, CreateTime = 1649064284488, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 3  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 4, CreateTime = 1649064284490, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 4  hello wdh01)

3.2、 独立消费者案例(订阅分区)

创建一个独立消费者,消费 lhc 主题0号分区的数据

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.ArrayList;
import java.util.Properties;

public class CustomeConsumerPartition {
    public static void main(String[] args) {
        Properties properties = new Properties();
        //连接
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "hadoop103:9092,hadoop104:9092,hadoop105:9092");
        //反序列化
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        //设置消费者组ID
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, "test");
        //创建消费者对象
        KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
        //注册消费主题&指定消费的分区
        ArrayList<TopicPartition> topicPartitions = new ArrayList<>();
        topicPartitions.add(new TopicPartition("lhc", 0));
        kafkaConsumer.assign(topicPartitions);

        while (true) {
            ConsumerRecords<String, String> poll = kafkaConsumer.poll(Duration.ofSeconds(1));
            for (ConsumerRecord<String, String> stringStringConsumerRecord : poll) {
                System.out.println(stringStringConsumerRecord);
            }
        }
    }
}
View Code

运行

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

public class CustomProducerCallBackPartitions {
    public static void main(String[] args) throws InterruptedException {
        Properties properties = new Properties();
        //连接ZK
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "hadoop103:9092,hadoop104:9092");
        //设置KV序列化
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        //指定 kv 的序列化类型
        //1、创建 生产者
        KafkaProducer<String, String> KafkaProducer = new KafkaProducer<String, String>(properties);
        //2、发送数据 put异步发送
        for (int i = 0; i < 5; i++) {
            KafkaProducer.send(new ProducerRecord<>("lhc", 0, "", i + "  hello wdh01 a "), new Callback() {
                // new Callback( 回调函数
                @Override
                public void onCompletion(RecordMetadata metadata, Exception exception) {
                    if (exception == null) {
                        System.out.println(" a 主题  " + metadata.topic() + " 分区 " + metadata.partition());
                    }
                }
            });
        }
        //3、关闭资源
        KafkaProducer.close();
    }
}
View Code

观察消费结果

ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 0, CreateTime = 1649064682468, serialized key size = 0, serialized value size = 17, headers = RecordHeaders(headers = [], isReadOnly = false), key = , value = 0  hello wdh01 a )
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 1, CreateTime = 1649064682503, serialized key size = 0, serialized value size = 17, headers = RecordHeaders(headers = [], isReadOnly = false), key = , value = 1  hello wdh01 a )
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 2, CreateTime = 1649064682503, serialized key size = 0, serialized value size = 17, headers = RecordHeaders(headers = [], isReadOnly = false), key = , value = 2  hello wdh01 a )
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 3, CreateTime = 1649064682503, serialized key size = 0, serialized value size = 17, headers = RecordHeaders(headers = [], isReadOnly = false), key = , value = 3  hello wdh01 a )
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 4, CreateTime = 1649064682503, serialized key size = 0, serialized value size = 17, headers = RecordHeaders(headers = [], isReadOnly = false), key = , value = 4  hello wdh01 a )

3.3、消费者组案例

测试同一个主题的分区数据,只能由一个消费者组中的一个消费。

复制2份基础消费者的代码,在IDEA中同时启动,即可启动同一个消费者组中的3个消费者。

运行生产者

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

public class CustomProducerCallBack {
    public static void main(String[] args) throws InterruptedException {
        Properties properties = new Properties();
        //连接ZK
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "hadoop103:9092,hadoop104:9092,hadoop105:9092");
        //设置KV序列化
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        //指定 kv 的序列化类型
        //1、创建 生产者
        KafkaProducer<String, String> KafkaProducer = new KafkaProducer<String, String>(properties);
        //2、发送数据 put异步发送
        for (int i = 0; i < 50; i++) {
            KafkaProducer.send(new ProducerRecord<>("lhc", i + "  hello wdh01"), new Callback() {
                // new Callback( 回调函数
                @Override
                public void onCompletion(RecordMetadata metadata, Exception exception) {
                    if (exception == null) {
                        System.out.println("主题 " + metadata.topic() + " 分区 " + metadata.partition());
                    }
                }
            });
            Thread.sleep(1);
        }
        //3、关闭资源
        KafkaProducer.close();
    }
}
View Code

生产者数据发往了三个分区

主题 lhc 分区 2
主题 lhc 分区 2
主题 lhc 分区 2
主题 lhc 分区 2
主题 lhc 分区 2
主题 lhc 分区 2
主题 lhc 分区 2
主题 lhc 分区 1
主题 lhc 分区 1
主题 lhc 分区 1
主题 lhc 分区 1
主题 lhc 分区 1
主题 lhc 分区 1
主题 lhc 分区 1
主题 lhc 分区 1
主题 lhc 分区 1
主题 lhc 分区 1
主题 lhc 分区 1
主题 lhc 分区 1
主题 lhc 分区 2
主题 lhc 分区 1
主题 lhc 分区 2
主题 lhc 分区 0
主题 lhc 分区 1
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
主题 lhc 分区 0
View Code

消费0号分区

ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 5, CreateTime = 1649064880655, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 20  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 6, CreateTime = 1649064880662, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 23  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 7, CreateTime = 1649064880669, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 25  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 8, CreateTime = 1649064880694, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 26  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 9, CreateTime = 1649064880696, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 27  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 10, CreateTime = 1649064880704, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 28  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 11, CreateTime = 1649064880706, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 29  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 12, CreateTime = 1649064880708, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 30  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 13, CreateTime = 1649064880710, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 31  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 14, CreateTime = 1649064880712, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 32  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 15, CreateTime = 1649064880714, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 33  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 16, CreateTime = 1649064880716, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 34  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 17, CreateTime = 1649064880719, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 35  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 18, CreateTime = 1649064880722, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 36  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 19, CreateTime = 1649064880724, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 37  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 20, CreateTime = 1649064880726, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 38  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 21, CreateTime = 1649064880728, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 39  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 22, CreateTime = 1649064880730, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 40  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 23, CreateTime = 1649064880732, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 41  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 24, CreateTime = 1649064880734, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 42  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 25, CreateTime = 1649064880736, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 43  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 26, CreateTime = 1649064880737, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 44  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 27, CreateTime = 1649064880738, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 45  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 28, CreateTime = 1649064880739, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 46  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 29, CreateTime = 1649064880741, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 47  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 30, CreateTime = 1649064880743, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 48  hello wdh01)
ConsumerRecord(topic = lhc, partition = 0, leaderEpoch = 1, offset = 31, CreateTime = 1649064880744, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 49  hello wdh01)
View Code

消费1号分区

ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 0, CreateTime = 1649064880519, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 0  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 1, CreateTime = 1649064880565, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 1  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 2, CreateTime = 1649064880567, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 2  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 3, CreateTime = 1649064880570, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 3  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 4, CreateTime = 1649064880572, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 4  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 5, CreateTime = 1649064880577, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 5  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 6, CreateTime = 1649064880579, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 6  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 7, CreateTime = 1649064880585, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 7  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 8, CreateTime = 1649064880592, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 8  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 9, CreateTime = 1649064880593, serialized key size = -1, serialized value size = 14, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 9  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 10, CreateTime = 1649064880595, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 10  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 11, CreateTime = 1649064880597, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 11  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 12, CreateTime = 1649064880652, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 19  hello wdh01)
ConsumerRecord(topic = lhc, partition = 1, leaderEpoch = 2, offset = 13, CreateTime = 1649064880657, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 21  hello wdh01)
View Code

消费2号分区

ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 5, CreateTime = 1649064880601, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 12  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 6, CreateTime = 1649064880609, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 13  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 7, CreateTime = 1649064880613, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 14  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 8, CreateTime = 1649064880646, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 15  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 9, CreateTime = 1649064880648, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 16  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 10, CreateTime = 1649064880649, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 17  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 11, CreateTime = 1649064880650, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 18  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 12, CreateTime = 1649064880660, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 22  hello wdh01)
ConsumerRecord(topic = lhc, partition = 2, leaderEpoch = 3, offset = 13, CreateTime = 1649064880667, serialized key size = -1, serialized value size = 15, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 24  hello wdh01)
View Code

4、分区的分配以及再平衡

  • 一个consumer group中有多个consumer组成,一个topic有多个partition组成,现在的问题是,到底由哪个consumer来消费哪个partition的数据。
  • Kafka有四种主流的分区分配策略:Range、RoundRobin、Sticky、CooperativeSticky。可以通过配置参数partition.assignment.strategy,修改分区的分配策略。默认策略是Range + CooperativeSticky。Kafka可以同时使用多个分区分配策略。

 

4.1、Range以及再平衡

Range 是对每个topic 而言的。首先对同一个topic 里面的分区按照序号进行排序,并对消费者按照字母顺序进行排序。假如现在有7 个分区,3 个消费者,排序后的分区将会是0,1,2,3,4,5,6;消费者排序完之后将会是C0,C1,C2。例如,7/3 = 2 余1 ,除不尽,那么消费者C0 便会多消费1 个分区。8/3=2余2,除不尽,那么C0和C1分别多消费一个。通过partitions数/consumer数来决定每个消费者应该消费几个分区。如果除不尽,那么前面几个消费者将会多消费1 个分区。注意:如果只是针对1 个topic 而言,C0消费者多消费1个分区影响不是很大。但是如果有N 多个topic,那么针对每个topic,消费者C0都将多消费1 个分区,topic越多,C0消费的分区会比其他消费者明显多消费N 个分区。容易产生数据倾斜!

Range分区分配策略案例

修改主题 lhc 为7个分区

[hui@hadoop103 kafka]$ bin/kafka-topics.sh --bootstrap-server hadoop103:9092 --alter --topic lhc --partitions 7 
[hui@hadoop103 kafka]$ bin/kafka-topics.sh --bootstrap-server hadoop103:9092 --describe --topic lhc
Topic: lhc      TopicId: rf7bjcn0RfqOXhqSCwKqZw PartitionCount: 7       ReplicationFactor: 2    Configs: segment.bytes=1073741824
        Topic: lhc      Partition: 0    Leader: 2       Replicas: 2,0   Isr: 2,0
        Topic: lhc      Partition: 1    Leader: 2       Replicas: 1,2   Isr: 2,1
        Topic: lhc      Partition: 2    Leader: 1       Replicas: 0,1   Isr: 1,0
        Topic: lhc      Partition: 3    Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: lhc      Partition: 4    Leader: 0       Replicas: 0,2   Isr: 0,2
        Topic: lhc      Partition: 5    Leader: 1       Replicas: 1,0   Isr: 1,0
        Topic: lhc      Partition: 6    Leader: 2       Replicas: 2,0   Isr: 2,0

启动三个消费者、一个生产者发送100条消息,查看消费结果

生产者逻辑

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

public class CustomProducerCallBack {
    public static void main(String[] args) throws InterruptedException {
        Properties properties = new Properties();
        //连接ZK
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "hadoop103:9092,hadoop104:9092,hadoop105:9092");
        //设置KV序列化
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        //指定 kv 的序列化类型
        //1、创建 生产者
        KafkaProducer<String, String> KafkaProducer = new KafkaProducer<String, String>(properties);
        //2、发送数据 put异步发送
        for (int i = 0; i < 500; i++) {
            KafkaProducer.send(new ProducerRecord<>("lhc", i + "  hello wdh01"), new Callback() {
                // new Callback( 回调函数
                @Override
                public void onCompletion(RecordMetadata metadata, Exception exception) {
                    if (exception == null) {
                        System.out.println("主题 " + metadata.topic() + " 分区 " + metadata.partition());
                    }
                }
            });
            Thread.sleep(1);
        }
        //3、关闭资源
        KafkaProducer.close();
    }
}
View Code

消费者0 消费了 0 1 2 分区

 消费者1 消费了  5 6 两个分区

  消费者 2 消费  3 4 分区

  现在把 消费者0 停掉,45s 内再生成一批消息,观察 消费者 1 2 的消费情况,已停止消费者0

  发现消费者1 消费的还是 分区 5 6 

  其余的分区都被消费者2消费了

 再次启动生产者,此时观察 消费1 消费者2 的消费情况

消费者1 消费了 4 5 6 

 消费者2消费了0 1 2 3   

  Range分区分配再平衡案例

(1)停止掉0号消费者,快速重新发送消息观看结果(45s以内,越快越好)。
1号消费者:消费到5、6号分区数据。
2号消费者:消费到5、6号分区数据。0号消费者的任务会整体被分配到1号消费者或者2号消费者。
说明:0号消费者挂掉后,消费者组需要按照超时时间45s来判断它是否退出,所以需要等待,时间到了45s后,判断它真的退出就会把任务分配给其他broker执行。
(2)再次重新发送消息观看结果(45s以后)。1号消费者:消费到4、5、2 6号分区数据。
2号消费者:消费到0、1、2、3号分区数据。
说明:消费者0已经被踢出消费者组,所以重新按照range方式分配。

4.2 RoundRobin以及再平衡

RoundRobin针对集群中所有Topic而言。RoundRobin轮询分区策略,是把所有的partition 和所有的consumer 都列出来,然后按照hashcode进行排序,最后通过轮询算法来分配partition 给到各个消费者。

 消费者配置修改分区策略

RoundRobin分区分配再平衡案例(1)停止掉0号消费者,快速重新发送消息观看结果(45s以内,越快越好)。1号消费者:消费到2、5号分区数据2号消费者:消费到4、1号分区数据0号消费者的任务会按照RoundRobin的方式,把数据轮询分成0、6和3号分区数据,分别由1号消费者或者2号消费者消费。说明:0号消费者挂掉后,消费者组需要按照超时时间45s来判断它是否退出,所以需要等待,时间到了45s后,判断它真的退出就会把任务分配给其他broker执行。(2)再次重新发送消息观看结果(45s以后)。1号消费者:消费到0、2、4、6号分区数据2号消费者:消费到1、3、5号分区数据说明:消费者0已经被踢出消费者组,所以重新按照RoundRobin方式分配

4.3 Sticky以及再平衡

粘性分区定义:可以理解为分配的结果带有“粘性的”。即在执行一次新的分配之前,考虑上一次分配的结果,尽量少的调整分配的变动,可以节省大量的开销。粘性分区是Kafka从0.11.x版本开始引入这种分配策略,首先会尽量均衡的放置分区到消费者上面,在出现同一消费者组内消费者出现问题的时候,会尽量保持原有分配的分区不变化。

修改策略

       properties.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, "org.apache.kafka.clients.consumer.StickyAssignor");

  Sticky分区分配再平衡案例(1)停止掉0号消费者,快速重新发送消息观看结果(45s以内,越快越好)。1号消费者:消费到2、5、3号分区数据。2号消费者:消费到4、6号分区数据。0号消费者的任务会按照粘性规则,尽可能均衡的随机分成0和1号分区数据,分别由1号消费者或者2号消费者消费。说明:0号消费者挂掉后,消费者组需要按照超时时间45s来判断它是否退出,所以需要等待,时间到了45s后,判断它真的退出就会把任务分配给其他broker执行。(2)再次重新发送消息观看结果(45s以后)。1号消费者:消费到2、3、5号分区数据。2号消费者:消费到0、1、4、6号分区数据。说明:消费者0已经被踢出消费者组,所以重新按照粘性方式分配。

posted @ 2022-04-04 20:55  晓枫的春天  阅读(368)  评论(0编辑  收藏  举报