kafka从入门到精通
1.kafka概述
1.1.kafka是什么
#kafka官网地址:
http://kafka.apache.org/
(中文社区:https://www.orchome.com/kafka/index)
#kafka是一个消息系统,消息中间件。是一个高吞吐量的分布式发布订阅消息系统。
#什么是消息中间件?
1.在消息系统中,有两端:一端是生产消息的人(生产者);一端是消费消息的人(消费者)
2.正常情况:
比如说生产者生产鸡蛋,每生产一个鸡蛋,消费者就消费(吃掉)一个鸡蛋。刚刚好,很美好!
3.其它情况:
消费者在吃鸡蛋过程中,被噎住了。还没吃完呢,生产者又生产好了下一个鸡蛋。怎么办?
生产者每分钟生产10个鸡蛋,消费者每分钟消费2个鸡蛋。剩下8个鸡蛋。怎么办?
有人说,给一个篮子,把剩下的鸡蛋,暂时先放到篮子中,让消费者慢慢吃。那么这个篮子就是消息中间件,就是kafka。
官网图:
消息中间件图:
1.2.kafka角色介绍
角色 | 说明 |
---|---|
Broker | Kafka集群包含一个或多个服务器,服务器即broker |
Topic | 每条发布到Kafka集群的消息都有一个类别,类别即Topic |
Message | 消息,有定长的消息头与变长的消息体组成 |
Partition | Partition是物理上的概念,每个Topic包含一个或多个Partition |
Producer | 消息生产者,负责发布消息到Kafka broker |
Consumer | 消息消费者,向Kafka broker读取消息的客户端 |
Consumer Group | 每个Consumer属于一个特定的Consumer Group 。 如果没有指定group name,则属于默认的group |
Group Coordinator |
每一个消费者组,会选择一个Broker作为协调者 |
2.kafka集群环境搭建
2.1.集群主机规划
序号 | 机器名称 | ip/mac地址 | 硬件资源 | 安装服务 |
---|---|---|---|---|
1 | cdh1 root/server123 | 192.168.80.100,00:50:56:2B:5B:EF | cpu:2核 , 内存:2.5g ,硬盘20g ,网卡:千兆网卡 | jdk、zookeeper、kafka |
2 | cdh2 root/server123 | 192.168.80.101,00:50:56:39:23:67 | cpu:2核 , 内存:2.5g ,硬盘20g ,网卡:千兆网卡 | jdk、zookeeper、kafka |
3 | cdh3 root/server123 | 192.168.80.102,00:50:56:3E:3A:0B | cpu:2核 , 内存:2.5g ,硬盘20g ,网卡:千兆网卡 | jdk、zookeeper、kafka |
**说明:**集群主机之间需要配置ssh免密码登录
2.2.安装zookeeper集群
2.2.1.下载zookeeper
#zookeeper官网地址:
http://zookeeper.apache.org/
官网图:
2.2.2.上传解压
上传
解压 tar -zxvf zookeeper.... -c 需要解压的路径
2.2.3.配置
#进入目录:
cd /export/servers/zookeeper-3.4.5-cdh5.14.0/conf
#将zoo-sample.cfg改名wei zoo.cfg
#编辑文件:
vi zoo.cfg
# The number of milliseconds of each tick
#时间单元,zk中的所有时间都是以该时间单元为基础,进行整数倍配置
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
#follower在启动过程中,会从leader同步最新数据需要的最大时间。如果集群规模比较大,可以调大该参数
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
#leader与集群中所有机器进行心跳检查的最大时间。如果超出该时间,某follower没有回应,则说明该follower下线
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
#事务日志输出目录
dataDir=/export/servers/zookeeper-3.4.5-cdh5.14.0/zkdatas
# the port at which the clients will connect
#客户端连接端口
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#需要保留文件数目,默认就是3个
autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#自动清理事务日志和快照文件的频率,这里是1个小时
autopurge.purgeInterval=1
#集群服务器配置,数字1/2/3需要与myid文件一致。右边两个端口,2888表示数据同步和通信端口;3888表示选举端口
server.1=node01.hadoop.com:2888:3888
server.2=node02.hadoop.com:2888:3888
server.3=node03.hadoop.com:2888:3888
2.2.4.创建数据目录和myid
#创建数据存储目录:
mkdir -p /export/servers/zookeeper-3.4.5-cdh5.14.0/zkdatas
#创建myid:
cd /export/servers/zookeeper-3.4.5-cdh5.14.0/zkdatas
#在myid中添加1
touch 1>myid
2.2.5.分发到其它主机节点
#分发到node02节点,并修改myid内容为2:(其中node02另一个主机名)
scp -r zookeeper-3.4.5-cdh5.14.0/ node02:$PWD
#分发到node03节点,并修改myid内容为3:
2.3.安装kafka集群 scp -r zookeeper-3.4.5-cdh5.14.0/ node03:$PWD
2.2.6.启动zookeeper集群
#分别在node01/node02/node03节点启动/停止:
/export/servers/zookeeper-3.4.5-cdh5.14.0/bin/zkServer.sh start/stop
#查看集群状态:
/export/servers/zookeeper-3.4.5-cdh5.14.0/bin/zkServer.sh status
2.3.安装kafka集群
2.3.1.下载kafka
#kafka官网:
http://kafka.apache.org/
http://kafka.apache.org/downloads
2.3.2.上传解压
上传
解压
#解压到指定目录:
tar -zxvf kafka_2.11-1.0.0.tgz -C ../servers/
2.3.3.配置
#进入目录:
cd /export/servers/kafka_2.11-1.0.0/config
#编辑文件:
vi server.properties
------------------------------------------------------------------
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
#每个broker在集群中的唯一标识,不能重复
broker.id=0
#端口
port=9092
#broker主机地址
host.name=node01
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads that the server uses for receiving requests from the network and sending responses to the network
#broker处理消息的线程数
num.network.threads=3
# The number of threads that the server uses for processing requests, which may include disk I/O
#broker处理磁盘io的线程数
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
#socket发送数据缓冲区
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
#socket接收数据缓冲区
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
#socket接收请求最大值
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
#kafka数据存放目录位置,多个位置用逗号隔开
log.dirs=/export/servers/kafka_2.11-1.0.0/kfk-logs
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
#topic默认的分区数
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
#恢复线程数
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
#默认副本数
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
#消息日志最大存储时间,这里是7天
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
#每个日志段文件大小,这里是1g
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
#消息日志文件大小检查间隔时间
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
#zookeeper集群地址
zookeeper.connect=node01:2181,node02:2181,node03:2181
# Timeout in ms for connecting to zookeeper
#zookeeper连接超时时间
zookeeper.connection.timeout.ms=6000
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
2.3.4.创建数据存储目录
#创建数据存储目录:
mkdir -p /export/servers/kafka_2.11-1.0.0/kfk-logs
2.3.5.分发到其它主机节点
#分发到node02节点:
scp -r kafka_2.11-1.0.0/ node02:$PWD
#分发到node03节点:
scp -r kafka_2.11-1.0.0/ node03:$PWD
2.3.6.修改其它节点配置文件
#node02节点
cd /export/servers/kafka_2.11-1.0.0/config
vi server.properties
----------------------------------------------------
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
port=9092
host.name=node02
#node03节点
cd /export/servers/kafka_2.11-1.0.0/config
vi server.properties
----------------------------------------------------
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=2
port=9092
host.name=node03
2.3.6.启动kafka集群
#分别在三台节点执行:node01/node02/node03
##启动kafka集群
/export/servers/kafka_2.11-1.0.0/bin/kafka-server-start.sh -daemon /export/servers/kafka_2.11-1.0.0/config/server.properties
## 停止kafka集群
/export/servers/kafka_2.11-1.0.0/bin/kafka-server-stop.sh
3.kafka基本使用
3.1.交互式命令使用
查看topic
#查看topic 列表:
/export/servers/kafka_2.11-1.0.0/bin/kafka-topics.sh --list --zookeeper node01:2181,node02:2181,node03:2181
#查看指定topic:
/export/servers/kafka_2.11-1.0.0/bin/kafka-topics.sh --describe --zookeeper node01:2181,node02:2181,node03:2181 --topic itcast_topic
#创建topic
# --create:表示创建
# --zookeeper 后面的参数是zk的集群节点
# --replication-factor 1 :表示复本数
# --partitions 1:表示分区数
# --topic itheima_topic:表示topic的主题名称
/export/servers/kafka_2.11-1.0.0/bin/kafka-topics.sh --create --zookeeper node01:2181,node02:2181,node03:2181 --replication-factor 1 --partitions 1 --topic oc_itheima_topic
#删除topic
/export/servers/kafka_2.11-1.0.0/bin/kafka-topics.sh --delete --zookeeper node01:2181,node02:2181,node03:2181 --topic itheima_topic
创建生产者
#创建生产者,生产消息
/export/servers/kafka_2.11-1.0.0/bin/kafka-console-producer.sh --broker-list node01:9092,node02:9092,node03:9092 --topic oc_itheima_topic
创建消费者
#创建消费者,消费消息:
/export/servers/kafka_2.11-1.0.0/bin/kafka-console-consumer.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --topic oc_itheima_topic --consumer-property group.id=my-consumer-g --partition 0 --offset 0
3.2.java api使用
创建项目
配置pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.itheima</groupId>
<artifactId>kafka-demo</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<kafka.version>1.0.0</kafka.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>${kafka.version}</version>
</dependency>
</dependencies>
</project>
生产者
package com.itheima.producer;
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import java.util.Properties;
import java.util.concurrent.Future;
/**
* kafka客户端之:生产者
*/
public class MyKafkaProducer {
public static void main(String[] args) throws Exception{
// 1.配置信息
Properties props = new Properties();
// 定义kafka服务器地址列表,不需要指定所有的broker
props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
// 生产者需要leader确认请求完成之前接收的应答数
props.put("acks", "-1");
// 客户端失败重试次数
props.put("retries", 1);
// 生产者打包消息的批量大小,以字节为单位.此处是16k
props.put("batch.size", 16384);
// 生产者延迟1ms发送消息
props.put("linger.ms", 1);
// 生产者缓存内存的大小,以字节为单位.此处是32m
props.put("buffer.memory", 33554432);
// key 序列化类
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
// value序列化类
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
// 2.创建生产者
KafkaProducer<String,String> producer = new KafkaProducer<String, String>(props);
// 3.生产数据
/**
* 发送消息的三种方式:
* 1.同步阻塞发送
* 适用场景:业务不需要高吞吐量、更关心消息发送的顺序、不允许消息发送失败
* 2.异步发送(发送并忘记)
* 适用场景:业务只关心吞吐量、不关心消息发送的顺序、可以允许消息发送失败
* 3.异步发送(回调函数)
* 适用场景:业务需要知道消息发送成功、不关心消息发送的顺序
*/
// 1.同步阻塞发送
// 创建消息
/* System.out.println("-------------------同步发送消息......start-----------------------");
ProducerRecord<String,String> record = new ProducerRecord<String, String>("itheima_topic",0,"key-sync","同步发送消息");
Future<RecordMetadata> send = producer.send(record);
RecordMetadata recordMetadata = send.get();
System.out.println(recordMetadata);
System.out.println("-------------------同步发送消息......end-----------------------");*/
// 2.异步发送(发送并忘记)
// 创建消息
/*System.out.println("-------------------异步发送(发送并忘记)......start-----------------------");
ProducerRecord<String,String> record = new ProducerRecord<String, String>("itheima_topic",0,"key-async1","异步发送消息,发送并忘记");
// 发送并忘记
producer.send(record);
System.out.println("-------------------异步发送(发送并忘记)......end-----------------------");
// 刷新
producer.flush();*/
// 3.异步发送(回调函数)
// 创建消息
System.out.println("-------------------异步发送(回调函数)......start-----------------------");
ProducerRecord<String,String> record = new ProducerRecord<String, String>("itheima_topic",0,"key-async2","异步发送消息,(回调函数)");
// 发送,回调函数处理
producer.send(record, new Callback() {
// 处理回调业务逻辑
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
System.out.println("异步发送消息成功:"+recordMetadata);
System.out.println("异常对象:"+e);
}
});
System.out.println("-------------------异步发送(回调函数)......end-----------------------");
// 刷新
producer.flush();
}
}
消费者
package com.itheima.consumer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;
import java.util.Arrays;
import java.util.Properties;
/**
* kafka客户端之:消费者
*/
public class MyKafkaConsumer {
public static void main(String[] args) throws Exception{
// 1.配置信息
Properties props = new Properties();
// 定义kafka服务器地址列表,不需要指定所有的broker
props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
// 消费者组id
props.put("group.id", "itheima");
// 是否自动确认offset
props.put("enable.auto.commit", "true");
//自动确认offset时间间隔
props.put("auto.commit.interval.ms", "1000");
// key 序列化类
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
// value序列化类
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
// 2.创建消费者
KafkaConsumer<String,String> consumer = new KafkaConsumer<String, String>(props);
// 3.消费消息
// 指定分区消费
TopicPartition partition = new TopicPartition("itheima_topic",0);
// 获取已经提交的偏移量
long offset = 0L;
OffsetAndMetadata offsetAndMetadata = consumer.committed(partition);
if(offsetAndMetadata !=null){
offset = offsetAndMetadata.offset();
}
System.out.println("当前消费的偏移量:"+offset);
// 指定偏移量消费
consumer.assign(Arrays.asList(partition));
consumer.seek(partition,offset);
// 循环拉取数据
while (true){
// 拉取数据
ConsumerRecords<String, String> records = consumer.poll(1000);
// 打印数据
for (ConsumerRecord<String, String> record : records) {
System.out.println("消费的数据为:" + record.value());
}
}
}
}
4.kafka原理
4.1.消息存储和查询机制
4.1.1.消息存储机制
topic
#关于类别topic:
1.每条发布到kafka集群的消息都有一个类别,该类别就是topic
partition
#关于分区partition:
1.partition是物理上的概念,每个topic包含一个或者多个partition
2.每个分区由一系列有序的不可变的消息组成,是一个有序队列
3.每个分区在物理上是一个文件夹,分区命名规则:${topicName}-${partitionId}。比如:itheima_topic-0
4.分区目录下,存储该分区的日志段。包含一个数据文件和两个索引文件
5.每条消息被追加到对应的分区中,是顺序写磁盘。这也是kafka高吞吐量的重要保证
6.kafka是局部有序,即只保证一个分区内的消息有序性,不保证全局有序
logSegment
#关于日志段logSegment:
1.日志文件按照大小、或者时间滚动,切分成一个或者多个日志段(logSegment)
日志段大小默认1GB,配置参数:log.segment.bytes
时间长度配置参数:log.roll.ms、或者log.roll.hours
2.kafka的日志段:
由一个日志文件:
00000000000000000000.log
两个索引文件:
00000000000000000000.index
00000000000000000000.timeindex
3.数据文件:
数据文件以.log为后缀,保存实际消息数据
命名规则:数据文件的第一条消息偏移量(基准偏移量:BaseOffset),左补0构成20位数字字符组成
基准偏移量是上一个数据文件的LEO+1。LEO(Log End Offset)
4.偏移量索引文件:
文件名称与数据文件名称相同,以.index作为后缀。用于快速根据偏移量定位到消息所在的位置
5.时间戳索引文件:
文件名称与数据文件名称相同,以.timeindex作为后缀。用于根据时间戳快速定位到消息所在的位置
4.1.2.消息查询机制
#读取offset=368776的消息,需要通过两个步骤完成:
查找segment file
通过segment file 查找message
查找segment file
#文件偏移量:
1. 最开始的文件:00000000000000000000.index,起始偏移量(offset)为 0
2.00000000000000368769.index,消息起始偏移量:368770 = 368769 + 1
3.00000000000000737337.index,消息起始偏移量:737338=737337 + 1
4.后续文件以此类推
#查找过程:
1.根据起始偏移量,文件有序。
2.通过二分查找,快速定位到当前offset对应的文件。比如:
当 offset=368776 ,定位到 00000000000000368769.index 和对应 log 文件
通过segment file 查找message
#偏移量在.index文件存储:
kafka并不是每条消息都对应有索引(在.index进行存储)。而是采取了稀疏存储的方式,每个一定字节的数据建立一条索引。索引跨度通过参数配置:index.interval.bytes
#查找过程:
1.根据当前目标偏移量,通过二分查找,查找值小于等于目标偏移量的最大偏移量
2.从查找到的最大偏移量开始,顺序扫描数据文件,直到在数据文件中找到偏移量,与目标偏移量相等的消息
4.2.生产者数据分发策略与发送数据方式
4.3.1.数据分发策略
#关于生产者数据分发策略:
1.在kafka中一个topic下,有一个或者多个partition。那么当Producer向kafka写入数据的时候,如何决定数据该写入哪一个partition中呢?
2.分三种情况:
2.1.Producer将数据发送到指定分区:
/**
*创建消息对象 0 1 2
*参数说明:
* topic:指定类别
* partition:指定分区
* key:数据的key
* value:数据值value
*/
public ProducerRecord(String topic, Integer partition, K key, V value) {
this(topic, partition, (Long)null, key, value, (Iterable)null);
}
2.2.如果没有明确指定分区:
2.2.1.数据不是k/v对数据,采取轮询的策略。比如有三个分区:0,1,2。则将数据对应按照0,1,2,0,1,2......轮询方式写入分区
2.2.2.数据是k/v对数据,采取计算k的哈希值,将k哈希值%可用分区数=当前分区。决定将数据写入到对应的分区
源码参考:
// 默认分区类:
public class DefaultPartitioner implements Partitioner {
// 获取目标分区
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
// 获取当前topic所有的分区
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
// 获取到总的分区数量
int numPartitions = partitions.size();
// 判断当前消息是否有key,如果没有key的话,采取的是轮训策略
if (keyBytes == null) {
int nextValue = this.nextValue(topic);
List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0) {
int part = Utils.toPositive(nextValue) % availablePartitions.size();
return ((PartitionInfo)availablePartitions.get(part)).partition();
} else {
return Utils.toPositive(nextValue) % numPartitions;
}
} else {
// 如果当前的消息是有key的,把当前的key求hash值,再跟总的分区数量求余,决定把数据写入哪一个分区
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
}
// 轮询策略,获取下一个分区
private int nextValue(String topic) {
AtomicInteger counter = (AtomicInteger)this.topicCounterMap.get(topic);
if (null == counter) {
counter = new AtomicInteger(ThreadLocalRandom.current().nextInt());
AtomicInteger currentCounter = (AtomicInteger)this.topicCounterMap.putIfAbsent(topic, counter);
if (currentCounter != null) {
counter = currentCounter;
}
}
return counter.getAndIncrement();
}
}
4.3.2.发送数据方式
#关于数据发送方式:
1.同步阻塞发送:
适用场景:
业务不需要高吞吐量、更关心消息发送的顺序、不允许消息发送失败
参考代码:
2.异步发送(发送并忘记):
适用场景:
业务只关心吞吐量、不关心消息发送的顺序、可以允许消息发送失败
参考代码:
3.异步发送(回调函数):
适用场景:
业务需要知道消息发送成功、不关心消息发送的顺序
参考代码:
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步