Partition Leader重新选举

问题背景

生产者和消费者客户端都是通过 Leader 进行读写操作,所以 Leader 节点压力相对于来说比较大。运维过程有可能会出现。主机性能比较差的当 Leader 节点的话,会影响Kafka的性能。那么就涉及到重新选举 Leader 操作,将 Leader 落在预期主机上

问题现象

演示环境是3个节点Kafka集群,创建 4 个分区的topic。那就会出现其中一个节点是有两个Leader节点。这里演示将Leader迁移到其他节点

$ kafka-topics.sh --bootstrap-server 192.168.32.188:9092 --describe --topic test02                                               
Topic: test02   TopicId: Rpix53M5R8Krby5Wa6ansA PartitionCount: 4       ReplicationFactor: 3    Configs: segment.bytes=1073741824
        Topic: test02   Partition: 0    Leader: 0       Replicas: 0,1,2 Isr: 0,1,2
        Topic: test02   Partition: 1    Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
        Topic: test02   Partition: 2    Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
        Topic: test02   Partition: 3    Leader: 0       Replicas: 0,2,1 Isr: 0,2,1
  • partition 0、3Leader 是在同一个节点(brokerID=0)上
  • 预期将 partition 3 重选选举 Leader 为 brokerID=1

问题处理

重新选举 Partition Leader 的原则:

  • Replicas的第一个broker,且同时在Isr列表中,则replicas第一台broker为 Partition Leader 副本优先机制

了解 Partition Leader 选举原则后,分为两步进行操作:

  1. 修改 Replicas 顺序(AR),如果预期leader已经在 replicas 排在第一,则忽略此步骤。
  2. 手工触发选举leader操作

修改 Replicas 顺序(AR)

分区副本重分配 (低成本方案)

  1. 生成Kafka推荐的副本重分配方案
$ cat > topic-json-file.json <<-EOF
{"topics":[{"topic":"test02"}],"version":1}
EOF

$ kafka-reassign-partitions.sh --bootstrap-server 192.168.32.187:9092 --broker-list "0,1,2" --topics-to-move-json-file topic-json-file.json --generate
Current partition replica assignment
{"version":1,"partitions":[{"topic":"test02","partition":0,"replicas":[0,1,2],"log_dirs":["any","any","any"]},{"topic":"test02","partition":1,"replicas":[2,0,1],"log_dirs":["any","any","any"]},{"topic":"test02","partition":2,"replicas":[1,2,0],"log_dirs":["any","any","any"]},{"topic":"test02","partition":3,"replicas":[0,2,1],"log_dirs":["any","any","any"]}]}

Proposed partition reassignment configuration
{"version":1,"partitions":[{"topic":"test02","partition":0,"replicas":[2,1,0],"log_dirs":["any","any","any"]},{"topic":"test02","partition":1,"replicas":[0,2,1],"log_dirs":["any","any","any"]},{"topic":"test02","partition":2,"replicas":[1,0,2],"log_dirs":["any","any","any"]},{"topic":"test02","partition":3,"replicas":[2,0,1],"log_dirs":["any","any","any"]}]}
  • 第一个是当前的分配方案
  • 第二个是推荐的分配方案,如果有指定顺序的话,可以修改 replicas 顺序。
  1. 迁移副本
$ cat reassignment-file.json 
{"version":1,"partitions":[{"topic":"test02","partition":0,"replicas":[0,1,2],"log_dirs":["any","any","any"]},{"topic":"test02","partition":1,"replicas":[2,0,1],"log_dirs":["any","any","any"]},{"topic":"test02","partition":2,"replicas":[1,2,0],"log_dirs":["any","any","any"]},{"topic":"test02","partition":3,"replicas":[1,0,2],"log_dirs":["any","any","any"]}]}

$ kafka-reassign-partitions.sh --bootstrap-server 192.168.32.187:9092 --reassignment-json-file reassignment-file.json --execute
Current partition replica assignment

{"version":1,"partitions":[{"topic":"test02","partition":0,"replicas":[0,1,2],"log_dirs":["any","any","any"]},{"topic":"test02","partition":1,"replicas":[2,0,1],"log_dirs":["any","any","any"]},{"topic":"test02","partition":2,"replicas":[1,2,0],"log_dirs":["any","any","any"]},{"topic":"test02","partition":3,"replicas":[2,1,0],"log_dirs":["any","any","any"]}]}

Save this to use as the --reassignment-json-file option during rollback
Successfully started partition reassignments for test02-0,test02-1,test02-2,test02-3

reassignment-file.json 文件,复制推荐的方案即可。由于我这里只更改 partition 3Leader,所以直接复制原来的分配方案,改了partition 3 的副本顺序。

  1. 验证
$ kafka-reassign-partitions.sh --bootstrap-server 192.168.32.187:9092 --reassignment-json-file reassignment-file.json --verify 
Status of partition reassignment:
Reassignment of partition test02-0 is complete.
Reassignment of partition test02-1 is complete.
Reassignment of partition test02-2 is complete.
Reassignment of partition test02-3 is complete.

Clearing broker-level throttles on brokers 0,1,2
Clearing topic-level throttles on topic test02

$ kafka-topics.sh --bootstrap-server 192.168.32.187:9092 --describe --topic test02
Topic: test02   TopicId: Rpix53M5R8Krby5Wa6ansA PartitionCount: 4       ReplicationFactor: 3    Configs: segment.bytes=1073741824
        Topic: test02   Partition: 0    Leader: 0       Replicas: 0,1,2 Isr: 0,1,2
        Topic: test02   Partition: 1    Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
        Topic: test02   Partition: 2    Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
        Topic: test02   Partition: 3    Leader: 0       Replicas: 1,0,2 Isr: 0,2,1

总结

  • 优点: 实现了需求, 不需要改源码,也没有额外的开发工作。
  • 缺点: 操作比较复杂容易出错,需要先获取原先的分区分配数据,然后手动修改Json文件,这里比较容易出错,影响会比较大,当然这些都可以通过校验接口来做好限制, 最重要的一点是 副本重分配当前只能有一个任务 !
    假如你当前有一个「副本重分配」的任务在,那么这里就不能够执行了。

手动修改AR顺序(高成本方案)

  1. 从zk中获取/brokers/topics/{topic名称}节点数据
$ zkCli.sh -server 192.168.32.187:2181
[zk: 192.168.32.187:2181(CONNECTED) 0] get /kafka/brokers/topics/test02 
{"partitions":{"0":[0,1,2],"1":[2,0,1],"2":[1,2,0],"3":[0,2,1]},"topic_id":"Rpix53M5R8Krby5Wa6ansA","adding_replicas":{},"removing_replicas":{},"version":3}
  1. 手动调整一下zk里面的顺序
[zk: 192.168.32.187:2181(CONNECTED) 1] set /kafka/brokers/topics/test02 {"partitions":{"0":[0,1,2],"1":[2,0,1],"2":[1,2,0],"3":[1,0,2]},"topic_id":"Rpix53M5R8Krby5Wa6ansA","adding_replicas":{},"removing_replicas":{},"version":3}
  1. 删除zk中的/Controller节点,让它触发重新加载,并且同时触发Leader选举
[zk: 192.168.32.187:2181(CONNECTED) 2] delete /kafka/controller

为什么要删除Controller的zk节点?

  • 之所以删除Controller节点,是因为我们手动修改了zk节点数据之后,因为没有副本的新增,是不会触发Controller去更新AR内存的,就算你主动触发Leader选举,AR还是以前的,并不会达到想要的效果。
  • 删除zk中的/Controller节点,会触发Controller重新选举,重新选举会重新加载所有元数据,所以我们刚刚加载的数据就会生效, 同时Controller重新加载也会触发Leader选举

总结

  • 优点: 实现了目标需求, 简单, 操作方便
  • 缺点: 频繁的Controller重选举对生产环境来说会有一些影响;

手工选举 partition leader 节点

kafka 3.x 版本

kafka-leader-election.sh --bootstrap-server 192.168.32.187:9092 --election-type PREFERRED --topic test02 --partition 3

Kafka 2.x 版本

编写 /tmp/prefered.json 待选举的partition文件

{
  "partitions": [
    { "topic": "test02", "partition": 3}
  ]
}

任选下面一种方式执行即可

# 通过zookeeper
zk_conn=$(awk -F= '/^zookeeper.connect=/ {print $2}' /app/kafka/config/server.properties)
./kafka-preferred-replica-election.sh --zookeeper ${zk_conn} --path-to-json-file /tmp/prefered.json

# 通过Kafka
./kafka-preferred-replica-election.sh --admin.config ../config/sasl.properties --bootstrap-server kafka01:9092 --path-to-json-file /tmp/prefered.json 
posted @   jiaxzeng  阅读(167)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 上周热点回顾(2.24-3.2)
点击右上角即可分享
微信分享提示