Partition Leader重新选举
问题背景
生产者和消费者客户端都是通过 Leader
进行读写操作,所以 Leader
节点压力相对于来说比较大。运维过程有可能会出现。主机性能比较差的当 Leader
节点的话,会影响Kafka的性能。那么就涉及到重新选举 Leader
操作,将 Leader
落在预期主机上
问题现象
演示环境是3个节点Kafka集群,创建 4
个分区的topic。那就会出现其中一个节点是有两个Leader
节点。这里演示将Leader迁移到其他节点
$ kafka-topics.sh --bootstrap-server 192.168.32.188:9092 --describe --topic test02
Topic: test02 TopicId: Rpix53M5R8Krby5Wa6ansA PartitionCount: 4 ReplicationFactor: 3 Configs: segment.bytes=1073741824
Topic: test02 Partition: 0 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2
Topic: test02 Partition: 1 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1
Topic: test02 Partition: 2 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
Topic: test02 Partition: 3 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1
- partition
0、3
的Leader
是在同一个节点(brokerID=0)上- 预期将 partition
3
重选选举Leader
为 brokerID=1
问题处理
重新选举 Partition Leader
的原则:
- Replicas的第一个broker,且同时在Isr列表中,则replicas第一台broker为
Partition Leader
副本优先机制
了解 Partition Leader
选举原则后,分为两步进行操作:
- 修改 Replicas 顺序(AR),如果预期leader已经在 replicas 排在第一,则忽略此步骤。
- 手工触发选举leader操作
修改 Replicas 顺序(AR)
分区副本重分配 (低成本方案)
- 生成Kafka推荐的副本重分配方案
$ cat > topic-json-file.json <<-EOF
{"topics":[{"topic":"test02"}],"version":1}
EOF
$ kafka-reassign-partitions.sh --bootstrap-server 192.168.32.187:9092 --broker-list "0,1,2" --topics-to-move-json-file topic-json-file.json --generate
Current partition replica assignment
{"version":1,"partitions":[{"topic":"test02","partition":0,"replicas":[0,1,2],"log_dirs":["any","any","any"]},{"topic":"test02","partition":1,"replicas":[2,0,1],"log_dirs":["any","any","any"]},{"topic":"test02","partition":2,"replicas":[1,2,0],"log_dirs":["any","any","any"]},{"topic":"test02","partition":3,"replicas":[0,2,1],"log_dirs":["any","any","any"]}]}
Proposed partition reassignment configuration
{"version":1,"partitions":[{"topic":"test02","partition":0,"replicas":[2,1,0],"log_dirs":["any","any","any"]},{"topic":"test02","partition":1,"replicas":[0,2,1],"log_dirs":["any","any","any"]},{"topic":"test02","partition":2,"replicas":[1,0,2],"log_dirs":["any","any","any"]},{"topic":"test02","partition":3,"replicas":[2,0,1],"log_dirs":["any","any","any"]}]}
- 第一个是当前的分配方案
- 第二个是推荐的分配方案,如果有指定顺序的话,可以修改
replicas
顺序。
- 迁移副本
$ cat reassignment-file.json
{"version":1,"partitions":[{"topic":"test02","partition":0,"replicas":[0,1,2],"log_dirs":["any","any","any"]},{"topic":"test02","partition":1,"replicas":[2,0,1],"log_dirs":["any","any","any"]},{"topic":"test02","partition":2,"replicas":[1,2,0],"log_dirs":["any","any","any"]},{"topic":"test02","partition":3,"replicas":[1,0,2],"log_dirs":["any","any","any"]}]}
$ kafka-reassign-partitions.sh --bootstrap-server 192.168.32.187:9092 --reassignment-json-file reassignment-file.json --execute
Current partition replica assignment
{"version":1,"partitions":[{"topic":"test02","partition":0,"replicas":[0,1,2],"log_dirs":["any","any","any"]},{"topic":"test02","partition":1,"replicas":[2,0,1],"log_dirs":["any","any","any"]},{"topic":"test02","partition":2,"replicas":[1,2,0],"log_dirs":["any","any","any"]},{"topic":"test02","partition":3,"replicas":[2,1,0],"log_dirs":["any","any","any"]}]}
Save this to use as the --reassignment-json-file option during rollback
Successfully started partition reassignments for test02-0,test02-1,test02-2,test02-3
reassignment-file.json
文件,复制推荐的方案即可。由于我这里只更改 partition3
的Leader
,所以直接复制原来的分配方案,改了partition3
的副本顺序。
- 验证
$ kafka-reassign-partitions.sh --bootstrap-server 192.168.32.187:9092 --reassignment-json-file reassignment-file.json --verify
Status of partition reassignment:
Reassignment of partition test02-0 is complete.
Reassignment of partition test02-1 is complete.
Reassignment of partition test02-2 is complete.
Reassignment of partition test02-3 is complete.
Clearing broker-level throttles on brokers 0,1,2
Clearing topic-level throttles on topic test02
$ kafka-topics.sh --bootstrap-server 192.168.32.187:9092 --describe --topic test02
Topic: test02 TopicId: Rpix53M5R8Krby5Wa6ansA PartitionCount: 4 ReplicationFactor: 3 Configs: segment.bytes=1073741824
Topic: test02 Partition: 0 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2
Topic: test02 Partition: 1 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1
Topic: test02 Partition: 2 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
Topic: test02 Partition: 3 Leader: 0 Replicas: 1,0,2 Isr: 0,2,1
总结
- 优点: 实现了需求, 不需要改源码,也没有额外的开发工作。
- 缺点: 操作比较复杂容易出错,需要先获取原先的分区分配数据,然后手动修改Json文件,这里比较容易出错,影响会比较大,当然这些都可以通过校验接口来做好限制, 最重要的一点是 副本重分配当前只能有一个任务 !
假如你当前有一个「副本重分配」的任务在,那么这里就不能够执行了。
手动修改AR顺序(高成本方案)
- 从zk中获取/brokers/topics/{topic名称}节点数据
$ zkCli.sh -server 192.168.32.187:2181
[zk: 192.168.32.187:2181(CONNECTED) 0] get /kafka/brokers/topics/test02
{"partitions":{"0":[0,1,2],"1":[2,0,1],"2":[1,2,0],"3":[0,2,1]},"topic_id":"Rpix53M5R8Krby5Wa6ansA","adding_replicas":{},"removing_replicas":{},"version":3}
- 手动调整一下zk里面的顺序
[zk: 192.168.32.187:2181(CONNECTED) 1] set /kafka/brokers/topics/test02 {"partitions":{"0":[0,1,2],"1":[2,0,1],"2":[1,2,0],"3":[1,0,2]},"topic_id":"Rpix53M5R8Krby5Wa6ansA","adding_replicas":{},"removing_replicas":{},"version":3}
- 删除zk中的/Controller节点,让它触发重新加载,并且同时触发Leader选举
[zk: 192.168.32.187:2181(CONNECTED) 2] delete /kafka/controller
为什么要删除Controller的zk节点?
- 之所以删除Controller节点,是因为我们手动修改了zk节点数据之后,因为没有副本的新增,是不会触发Controller去更新AR内存的,就算你主动触发Leader选举,AR还是以前的,并不会达到想要的效果。
- 删除zk中的/Controller节点,会触发Controller重新选举,重新选举会重新加载所有元数据,所以我们刚刚加载的数据就会生效, 同时Controller重新加载也会触发Leader选举
总结
- 优点: 实现了目标需求, 简单, 操作方便
- 缺点: 频繁的Controller重选举对生产环境来说会有一些影响;
手工选举 partition leader 节点
kafka 3.x 版本
kafka-leader-election.sh --bootstrap-server 192.168.32.187:9092 --election-type PREFERRED --topic test02 --partition 3
Kafka 2.x 版本
编写 /tmp/prefered.json 待选举的partition文件
{
"partitions": [
{ "topic": "test02", "partition": 3}
]
}
任选下面一种方式执行即可
# 通过zookeeper
zk_conn=$(awk -F= '/^zookeeper.connect=/ {print $2}' /app/kafka/config/server.properties)
./kafka-preferred-replica-election.sh --zookeeper ${zk_conn} --path-to-json-file /tmp/prefered.json
# 通过Kafka
./kafka-preferred-replica-election.sh --admin.config ../config/sasl.properties --bootstrap-server kafka01:9092 --path-to-json-file /tmp/prefered.json
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 上周热点回顾(2.24-3.2)