Java操作Kafka客户端及Springboot整合Kafka
1.生产者的基本实现
1.1 引入依赖
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<!--版本号根据kafka安装包版本指定,比如kafka_2.12-2.0.0-->
<version>2.0.0</version>
</dependency>
1.2 具体代码实现
同步发送消息
private final static String TOPIC_NAME = "my-replicated-topic";
public static void main(String[] args) throws ExecutionException, InterruptedException {
Properties props = new Properties();
//1. 设置参数
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.21.107:9092,192.168.21.108:9092,192.168.21.109:9092");
//把发送的key从字符串序列化为字符数组
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
//把发送消息的value从字符串序列化为字节数组
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
//发送消息的客户端
Producer<String, String> producer = new KafkaProducer<String, String>(props);
User user = new User();
user.setErpId(500000L);
user.setErpName("张三三");
user.setRealName("张三三");
user.setCreateTime(new Date());
user.setUpdateTime(new Date());
//指定发送分区
//key: 作用是决定了往哪个分区上发送消息,value表示具体要发送的消息内容
ProducerRecord<String, String> producerRecord = new ProducerRecord<>(TOPIC_NAME,
user.getErpId().toString(), JSON.toJSONString(user));
//发送消息,得到消息发送的元数据并输出
RecordMetadata metadata = producer.send(producerRecord).get();
System.out.println("同步方式发送消息结果:" + "topic-" +
metadata.topic() + "|partition-"+ metadata.partition() + "|offset-" + metadata.offset());
}
}
在同步发送消息的场景下,如果生产者发送消息三次没有收到ack,生产者会阻塞,阻塞到3s的时间,如果还没有收到消息,会进行重试。重试的次数为3次。
异步发送消息
//异步发送消息
producer.send(producerRecord, new Callback() {
@Override
public void onCompletion(RecordMetadata metadata, Exception e) {
if(e!=null){
System.out.println("发送消息失败:"+e.getMessage());
}
if(metadata!=null){
System.out.println("异步方式发送消息结果:" + "topic-" +
metadata.topic() + "|partition-"
+ metadata.partition() + "|offset-" + metadata.offset());
}
}
});
// 因为异步提交后主线程可能已经停止,没有拿到onCompletion的回调,因此可让主线程阻塞一段时间看到效果
Thread.sleep(100000L);
1.3 生产者中ack的配置
在同步发送的前提下,生产者在获得集群返回的ack之前会一直阻塞。集群中ack共有三个配置:
- ack = 0, kafka-cluster不需要任何broker收到消息,就立即返回ack给生产者,同时,这也是最容易丢消息的,但效率也是最高的。
- ack = 1(default), 多副本之间的leader已经收到消息,并且把消息写入到本地log中,才会返回ack给生产者,性能和安全性也是最均衡的。
- ack = -1 /all。里面有默认的配置min.insync.replicas=2(默认为1,推荐配置大于等于2), 此时就需要leader和一个follower同步完后,才会返回ack给生产者(此时集群中有2个 broker已完成数据的接收),这种方式最安全,但性能最差。
对应于ack和重试(如果没有收到ack,就开始重试)的配置
//ack和重试
props.put(ProducerConfig.ACKS_CONFIG,"1");
/**
* 发送失败会重试,默认重试间隔100ms,重试能保证消息发送的可靠性,但是也可能造成消息重复发送,
* 比如网络抖动,所以需要在接收者那边做好消息接收的幂等性处理
*/
props.put(ProducerConfig.RETRIES_CONFIG,3);
//重试间隔设置
props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG,300);
1.4 生产者发送消息的缓冲区配置
- kafka默认会创建一个消息缓冲区,用来存放要发送的消息,缓冲区大小为32m
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432);
- kafka本地线程会去缓冲区中一次拉取16k的数据,发送到broker
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
- 如果线程拉不到16k的数据,间隔10ms也会将已拉到的数据发送给broker
props.put(ProducerConfig.LINGER_MS_CONFIG, 10);
2. 消费者的实现
2.1 消费者的基本实现
private static final String TOPIC_NAME = "my-replicated-topic";
private static final String CONSUMER_GROUP_NAME = "testGroup";
public static void main(String[] args) {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.21.107:9092,192.168.21.108:9092,192.168.21.109:9092");
//消费分组名
props.put(ConsumerConfig.GROUP_ID_CONFIG,CONSUMER_GROUP_NAME);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class.getName());
//1.创建一个消费者的客户端信息
KafkaConsumer<String,String> consumer = new KafkaConsumer<String, String>(props);
//2. 消费者订阅主题列表
consumer.subscribe(Arrays.asList(TOPIC_NAME));
while (true){
//3. poll() API是拉取消息的长轮询
ConsumerRecords<String,String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String,String> record:records){
//4. 打印消息
System.out.printf("收到消息:partition = %d,offset = %d, key = %s, value = %s%n", record.partition(),
record.offset(), record.key(), record.value());
}
}
2.2 消费者自动提交和手动提交offset
消费者无论是自动提交还是手动提交,都需要把所属的消费组+消费的某个主题+消费的某个分区及消费的偏移量,这样的信息提交到集群的_consumer_offsets主题里面
自动提交
消费者poll消息下来以后就会自动提交offset
//是否自动提交offset,默认就是true
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,"true");
//自动提交offset的间隔时间
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,"1000");
自动提交会丢消息。因为消费者在消费前提交offset,有可能提交完后还没消费时消费者挂了。
手动提交
需要把自动提交的配置改为false
//是否自动提交offset,默认就是true
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,"false");
手动同步提交:在消费完消息后调用同步提交的方法,当集群返回ack前一直阻塞,返回ack后表示提交完成,执行之后的逻辑
while (true){
//3. poll() API是拉取消息的长轮询
ConsumerRecords<String,String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String,String> record:records){
//4. 打印消息
System.out.printf("收到消息:partition = %d,offset = %d, key = %s, value = %s%n", record.partition(),
record.offset(), record.key(), record.value());
}
//所有的消息都已消费完
if(records.count()>0){
//手动同步提交offset,当前线程会阻塞直到offset提交成功
consumer.commitSync();
}
}
手动异步提交:在消息消费完后提交,不需要等待集群ack,直接执行之后的逻辑,可以设置一个毁掉方法,供集群调用
while (true){
//3. poll() API是拉取消息的长轮询
ConsumerRecords<String,String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String,String> record:records){
//4. 打印消息
System.out.printf("收到消息:partition = %d,offset = %d, key = %s, value = %s%n", record.partition(),
record.offset(), record.key(), record.value());
}
//所有的消息都已消费完
if(records.count()>0){
//手动异步提交offset,当前线程提交offset不会阻塞,可以继续执行后面的程序逻辑
consumer.commitAsync(new OffsetCommitCallback() {
@Override
public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets, Exception exception) {
if (exception != null) {
System.err.println("Commit failed for " + offsets);
System.err.println("Commit failed exception: " +
exception.getStackTrace());
}
}
});
}
}
2.3 长轮询poll消息
-
默认情况下,消费者一次会poll 500条消息
// 一次poll最大拉取消息的条数,可以根据消费速度的快慢来设置 props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,500);
2.4 消费者健康状态检查
消费者每隔1s向kafka集群发送心跳,集群发现如果有超过10s没有续约的消费者,将被提出消费组,触发该消费组的rebalance机制,将该分区交给消费组里的其他消费者进行消费。
//consumer给broker发送心跳的间隔时间
props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG,1000);
//kafka如果超过10s没有收到消费者的心跳,则会把消费者提出消费组,进行reBalance,把分区分配给其他消费者。
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG,10*1000);
2.5 指定分区、偏移量和实际那消费
- 指定分区消费
//指定分区消费
consumer.assign(Arrays.asList(new TopicPartition(TOPIC_NAME,0)));
- 消息回溯消费
consumer.assign(Arrays.asList(new TopicPartition(TOPIC_NAME,0)));
consumer.seekToBeginning(Arrays.asList(new TopicPartition(TOPIC_NAME,0)));
收到消息:partition = 3,offset = 0, key = 500000, value = {"createTime":1636702968492,"erpId":500000,"erpName":"张三三","realName":"张三三","updateTime":1636702968492}
收到消息:partition = 3,offset = 1, key = 500005, value = {"createTime":1636957199355,"erpId":500005,"erpName":"张三三","realName":"张三三","updateTime":1636957199355}
收到消息:partition = 3,offset = 2, key = 500005, value = {"createTime":1636957211271,"erpId":500005,"erpName":"张三三","realName":"张三三","updateTime":1636957211271}
收到消息:partition = 3,offset = 3, key = 500005, value = {"createTime":1636958064780,"erpId":500005,"erpName":"张三三","realName":"张三三","updateTime":1636958064780}
- 指定offset消费
consumer.assign(Arrays.asList(new TopicPartition(TOPIC_NAME,0)));
consumer.seek(new TopicPartition(TOPIC_NAME,0),10);
收到消息:partition = 3,offset = 2, key = 500005, value = {"createTime":1636957211271,"erpId":500005,"erpName":"张三三","realName":"张三三","updateTime":1636957211271}
收到消息:partition = 3,offset = 3, key = 500005, value = {"createTime":1636958064780,"erpId":500005,"erpName":"张三三","realName":"张三三","updateTime":1636958064780}
- 指定时间去消费
根据时间,去所有的Partition中确定该时间对应的offset,然后去所有的partition中找到该offset之后的消息开始消费
List<PartitionInfo> topicPartitions = consumer.partitionsFor(TOPIC_NAME);
List<PartitionInfo> topicPartitions = consumer.partitionsFor(TOPIC_NAME);
//从1小时前开始消费
long fetchDataTime = new Date().getTime() - 1000 * 60 * 60;
Map<TopicPartition, Long> map = new HashMap<>();
for (PartitionInfo par : topicPartitions) {
map.put(new TopicPartition(TOPIC_NAME, par.partition()), fetchDataTime);
}
Map<TopicPartition, OffsetAndTimestamp> parMap = consumer.offsetsForTimes(map);
for (Map.Entry<TopicPartition, OffsetAndTimestamp> entry : parMap.entrySet()) {
TopicPartition key = entry.getKey();
OffsetAndTimestamp value = entry.getValue();
if (key == null || value == null) continue;
Long offset = value.offset();
System.out.println("partition-" + key.partition() +
"|offset-" + offset);
System.out.println();
//根据消费里的timestamp确定offset if (value != null) {
consumer.assign(Arrays.asList(key));
consumer.seek(key, offset);
}
2.6 新消费组的消费offset规则
新消费组的消费者在启动后,默认会从当前分区的最后一条消息offset+1开始消费(消费新消息)。可以通过以下设置,让新的消费者第一次从头开始消费,之后开始消费新消息(最后消费的位置的偏移量+1)
-
latest: 默认的,消费新消息
-
earliest: 第一次从头开始消费,之后开始消费新消息(最后消费的位置的偏移量+1)
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
收到消息:partition = 4,offset = 0, key = 500004, value = {"createTime":1636948165301,"erpId":500004,"erpName":"张三三","realName":"张三三","updateTime":1636948165301}
收到消息:partition = 1,offset = 0, key = 500001, value = {"createTime":1636703013717,"erpId":500001,"erpName":"李丝丝","realName":"李丝丝","updateTime":1636703013717}
收到消息:partition = 3,offset = 0, key = 500000, value = {"createTime":1636702968492,"erpId":500000,"erpName":"张三三","realName":"张三三","updateTime":1636702968492}
收到消息:partition = 3,offset = 1, key = 500005, value = {"createTime":1636957199355,"erpId":500005,"erpName":"张三三","realName":"张三三","updateTime":1636957199355}
收到消息:partition = 3,offset = 2, key = 500005, value = {"createTime":1636957211271,"erpId":500005,"erpName":"张三三","realName":"张三三","updateTime":1636957211271}
收到消息:partition = 3,offset = 3, key = 500005, value = {"createTime":1636958064780,"erpId":500005,"erpName":"张三三","realName":"张三三","updateTime":1636958064780}
3. SpringBoot整合Kafka
3.1 引入spring-kafka依赖
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
3.2 application.yml中的配置
spring:
kafka:
bootstrap-servers: 192.168.21.107:9092,192.168.21.108:9092,192.168.21.109:9092
# 生产者
producer:
# 设置大于0的值,则客户端会将发送失败的记录重新发送
retries: 3
batch-size: 16384
buffer-memory: 33554432
acks: 1
# 指定消息key和消息体的编解码方式
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
consumer:
group-id: default-group
enable-auto-commit: false
auto-offset-reset: earliest
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
max-poll-records: 500
listener:
# 手动调用Acknowledgment.acknowledge()后立即提交,一般使用这种
ack-mode: manual_immediate
3.3 编写消息生产者
private final static String TOPIC_NAME= "my-replicated-topic";
@Autowired
private KafkaTemplate<String,String> kafkaTemplate;
@RequestMapping(value ="/send")
public String sendMessage(){
kafkaTemplate.send(TOPIC_NAME,"key","this is a test kafka message");
return "send message success";
}
3. 4 编写消息消费者
@KafkaListener(topics = "my-replicated-topic")
public void listenGroup(ConsumerRecord<String, String> record,
Acknowledgment ack) {
String value = record.value();
System.out.println(value);
System.out.println(record);
//手动提交offset
ack.acknowledge();
}
3.5 消费者中配置消费主题、分区和偏移量
@KafkaListener(groupId = "testGroup", topicPartitions = {
@TopicPartition(topic = "topic1", partitions = {"0", "1"}),
@TopicPartition(topic = "topic2", partitions = "0",
partitionOffsets = @PartitionOffset(partition = "1",
initialOffset = "100"))
},concurrency = "3")//concurrency就是同组下的消费者个数,就是并发消费数,建 议小于等于分区总数
public void listenGroupPro(ConsumerRecord<String, String> record,
Acknowledgment ack) {
String value = record.value();
System.out.println(value);
System.out.println(record);
//手动提交offset
ack.acknowledge();
}