Kafka原理-分区leader选举

0.说明

kafka源码版本为1.0

1.分区状态

kafka源码定义了4种状态

NewPartition：表示正在创建新的分区，是一个中间状态,只是在Controller的内存中存了状态信息

OnlinePartition：表示在线状态, 只有在线的分区才能提供服务.

OfflinePartition：表示下线状态, 分区可能因为Broker宕机或者删除Topic等原因流转到这个状态, 下线后不能提供服务

NonExistentPartition：表示分区不存在

2.选举源码分析

源码入口PartitionStateMachine#electLeaderForPartition

注释说明leader选举发生在OfflinePartition,OnlinePartition->OnlinePartition状态变更的时候

 /**
   * Invoked on the OfflinePartition,OnlinePartition->OnlinePartition state change.
   * It invokes the leader election API to elect a leader for the input offline partition
   * @param topic               The topic of the offline partition
   * @param partition           The offline partition
   * @param leaderSelector      Specific leader selector (e.g., offline/reassigned/etc.)
   */
  def electLeaderForPartition(topic: String, partition: Int, leaderSelector: PartitionLeaderSelector) {
    val topicAndPartition = TopicAndPartition(topic, partition)
    val stateChangeLog = stateChangeLogger.withControllerEpoch(controller.epoch)
    // handle leader election for the partitions whose leader is no longer alive
    stateChangeLog.trace(s"Started leader election for partition $topicAndPartition")
    try {
      var zookeeperPathUpdateSucceeded: Boolean = false
      var newLeaderAndIsr: LeaderAndIsr = null
      var replicasForThisPartition: Seq[Int] = Seq.empty[Int]
      while(!zookeeperPathUpdateSucceeded) {
        // 01：从zk种获取分区元数据
        val currentLeaderIsrAndEpoch = getLeaderIsrAndEpochOrThrowException(topic, partition)
        val currentLeaderAndIsr = currentLeaderIsrAndEpoch.leaderAndIsr
        val controllerEpoch = currentLeaderIsrAndEpoch.controllerEpoch
        // 02：这里表示其他controller成为新的首领，旧的请求抛异常
        if (controllerEpoch > controller.epoch) {
          val failMsg = s"Aborted leader election for partition $topicAndPartition since the LeaderAndIsr path was " +
            s"already written by another controller. This probably means that the current controller $controllerId went " +
            s"through a soft failure and another controller was elected with epoch $controllerEpoch."
          stateChangeLog.error(failMsg)
          throw new StateChangeFailedException(stateChangeLog.messageWithPrefix(failMsg))
        }
        // 03：选举的实现
        val (leaderAndIsr, replicas) = leaderSelector.selectLeader(topicAndPartition, currentLeaderAndIsr)
        val (updateSucceeded, newVersion) = ReplicationUtils.updateLeaderAndIsr(zkUtils, topic, partition,
          leaderAndIsr, controller.epoch, currentLeaderAndIsr.zkVersion)
        newLeaderAndIsr = leaderAndIsr.withZkVersion(newVersion)
        zookeeperPathUpdateSucceeded = updateSucceeded
        replicasForThisPartition = replicas
      }
      val newLeaderIsrAndControllerEpoch = LeaderIsrAndControllerEpoch(newLeaderAndIsr, controller.epoch)
      // 04：更新ControllerContext的leader信息（内存中的缓存）
      controllerContext.partitionLeadershipInfo.put(TopicAndPartition(topic, partition), newLeaderIsrAndControllerEpoch)
      stateChangeLog.trace(s"Elected leader ${newLeaderAndIsr.leader} for Offline partition $topicAndPartition")
      val replicas = controllerContext.partitionReplicaAssignment(TopicAndPartition(topic, partition))
      // 05：向broker添加LeaderAndIsr请求
      brokerRequestBatch.addLeaderAndIsrRequestForBrokers(replicasForThisPartition, topic, partition,
        newLeaderIsrAndControllerEpoch, replicas)
    } catch {
      //省略一些异常处理
    }
    debug(s"After leader election, leader cache for $topicAndPartition is updated to ${controllerContext.partitionLeadershipInfo(topicAndPartition)}")
  }

实际选举leaderSelector.selectLeader(topicAndPartition, currentLeaderAndIsr)策略来自PartitionLeaderSelector，具体有以下四种

（1）OfflinePartitionLeaderSelector

触发场景：

Controller 重新加载
脚本执行脏选举
Controller 监听到有Broker启动了
Controller 监听 LeaderAndIsrResponseReceived请求：
Controller 监听 UncleanLeaderElectionEnable请求：unclean.leader.election.enable设置为 true时

选举规则

找AR中第一个在线且在isr中的副本，若找不到，且unclean.leader.election.enable为true，找AR中第一个在线的副本。

（2）ReassignedPartitionLeaderSelector

触发场景：

分区副本重分配：只有当之前的Leader副本在经过重分配之后不存在了，或者故障下线了才会触发

选举规则

找AR中第一个在线且在isr中的副本

（3）PreferredReplicaPartitionLeaderSelector

触发场景：

自动定时执行优先副本选举任务：auto.leader.rebalance.enable=true （源码KafkaController#checkAndTriggerAutoLeaderRebalance)
Controller 重新加载的时候：先执行OfflinePartitionLeaderSelector再执行PreferredReplicaPartitionLeaderSelector （源码KafkaController#onControllerFailover)
手动执行优先副本选举脚本：kafka-leader-election.sh 并且选择的模式是 PREFERRED (先写zk节点，后controller触发PreferredReplicaElectionListener)

选举规则

只有满足以下条件才会选举成功：是第一个副本 && 副本在线 && 副本在ISR列表中。

代码如下

class PreferredReplicaPartitionLeaderSelector(controllerContext: ControllerContext) extends PartitionLeaderSelector with Logging {

  logIdent = "[PreferredReplicaPartitionLeaderSelector]: "

  def selectLeader(topicAndPartition: TopicAndPartition,
                   currentLeaderAndIsr: LeaderAndIsr): (LeaderAndIsr, Seq[Int]) = {
    // 从内存中（AR）拿到第一个副本，就是首选副本
    val assignedReplicas = controllerContext.partitionReplicaAssignment(topicAndPartition)
    val preferredReplica = assignedReplicas.head
    // 判断实际的leader是不是首选副本，若是就不需要选举了
    val currentLeader = controllerContext.partitionLeadershipInfo(topicAndPartition).leaderAndIsr.leader
    if (currentLeader == preferredReplica) {
      throw new LeaderElectionNotNeededException("Preferred replica %d is already the current leader for partition %s"
                                                   .format(preferredReplica, topicAndPartition))
    } else {
      info("Current leader %d for partition %s is not the preferred replica.".format(currentLeader, topicAndPartition) +
        " Triggering preferred replica leader election")
      // 否则，首选副本在线并且在isr中，就选举其为新leader
      if (controllerContext.isReplicaOnline(preferredReplica, topicAndPartition) && currentLeaderAndIsr.isr.contains(preferredReplica)) {
        val newLeaderAndIsr = currentLeaderAndIsr.newLeader(preferredReplica)
        (newLeaderAndIsr, assignedReplicas)
      } else {
        throw new StateChangeFailedException(s"Preferred replica $preferredReplica for partition $topicAndPartition " +
          s"is either not alive or not in the isr. Current leader and ISR: [$currentLeaderAndIsr]")
      }
    }
  }
}

（4）ControlledShutdownLeaderSelector

触发场景：

Broker关机的时候：当Broker关机的过程中,会向Controller发起一个请求, 让它重新发起一次选举, 把在所有正在关机(也就是发起请求的那个Broker,或其它同时正在关机的Broker) 的Broker里面的副本给剔除掉。

选举规则

在AR中找到第一个满足条件的副本：副本在线 && 副本在ISR中 && 副本所在的Broker不在正在关闭的Broker集合中。

posted @ 2023-02-13 21:48 OUYM 阅读(535) 评论(0) 收藏举报

刷新页面返回顶部

OUYM

life is binary !

Kafka原理-分区leader选举

公告