Kafka 之 async producer (2) kafka.producer.async.DefaultEventHandler
- 如果消息是发给很多不同的topic的, async producer如何在按batch发送的同时区分topic的
- 它是如何用key来做partition的?
- 是如何实现对消息成批量的压缩的?
async producer如何在按batch发送的同时区分topic的
这个问题的答案是: DefaultEventHandler会把发给它的一个batch的消息(实际上是Seq[KeyedMessage[K,V]]类型)拆开,确定每条消息该发送给哪个broker。对发给每个broker的消息,会按topic和partition来组合。即:拆包=>根据metaData组装
1 | def partitionAndCollate(messages: Seq[KeyedMessage[K,Message]]): Option[Map[Int, collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]]] |
它返回一个Option对象,这个Option的元素是一个Map,Key是brokerId,value是发给这个broker的消息。对每一条消息,先确定它要被发给哪一个topic的哪个parition。然后确定这个parition的leader broker,然后去Map[Int, collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]]这个Map里找到对应的broker,然后把这条消息填充给对应的topic+partition对应的Seq[KeyedMessage[K,Message]]。这样就得到了最后的结果。这个结果表示了哪些消息要以怎样的结构发给一个broker。真正发送的时候,会按照brokerId的不同,把打包好的消息发给不同的broker。
首先,看一下kafka protocol里对于Producer Request结构的说明:
ProduceRequest => RequiredAcks Timeout [TopicName [Partition MessageSetSize MessageSet]]
RequiredAcks => int16
Timeout => int32
Partition => int32
MessageSetSize => int32
同时,在kafka wiki里对于Produce API 有如下说明:
The produce API is used to send message sets to the server. For efficiency it allows sending message sets intended for many topic partitions in a single request.
即在一个produce request里,可以同时发消息给多个topic+partition的组合。当然一个produce request是发给一个broker的。
1 | send(brokerid, messageSetPerBroker) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | case class KeyedMessage[K, V](val topic: String, val key: K, val partKey: Any, val message: V) { if (topic == null ) throw new IllegalArgumentException( "Topic cannot be null." ) def this (topic: String, message: V) = this (topic, null .asInstanceOf[K], null , message) def this (topic: String, key: K, message: V) = this (topic, key, key, message) def partitionKey = { if (partKey != null ) partKey else if (hasKey) key else null } def hasKey = key != null } |
当使用三个参数的构造函数时, partKey会等于key。partKey是用来做partition的,但它不会最当成消息的一部分被存储。
1 | val topicPartitionsList = getPartitionListForTopic(message) //获取这个消息发送给的topic的partition信息<br>val partitionIndex = getPartition(message.topic, message.partitionKey, topicPartitionsList)//确定这个消息发给哪个partition |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | private def getPartition(topic: String, key: Any, topicPartitionList: Seq[PartitionAndLeader]): Int = { val numPartitions = topicPartitionList.size if (numPartitions <= 0 ) throw new UnknownTopicOrPartitionException( "Topic " + topic + " doesn't exist" ) val partition = if (key == null ) { // If the key is null, we don't really need a partitioner // So we look up in the send partition cache for the topic to decide the target partition val id = sendPartitionPerTopicCache.get(topic) id match { case Some(partitionId) => // directly return the partitionId without checking availability of the leader, // since we want to postpone the failure until the send operation anyways partitionId case None => val availablePartitions = topicPartitionList.filter(_.leaderBrokerIdOpt.isDefined) if (availablePartitions.isEmpty) throw new LeaderNotAvailableException( "No leader for any partition in topic " + topic) val index = Utils.abs(Random.nextInt) % availablePartitions.size val partitionId = availablePartitions(index).partitionId sendPartitionPerTopicCache.put(topic, partitionId) partitionId } } else partitioner.partition(key, numPartitions) |
当partKey为null时,首先它从sendParitionPerTopicCache里取这个topic缓存的partitionId,这个cache是一个Map.如果之前己经使用sendPartitionPerTopicCache.put(topic, partitionId)缓存了一个,就直接取出它。否则就随机从可用的partitionId里取出一个,把它缓存到sendParitionPerTopicCache。这就使得当sendParitionPerTopicCache里有一个可用的partitionId时,很多消息都会被发送给这同一个partition。因此若所有消息的partKey都为空,在一段时间内只会有一个partition能收到消息。之所以会说“一段”时间,而不是永久,是因为handler隔一段时间会重新获取它发送过的消息对应的topic的metadata,这个参数通过topic.metadata.refresh.interval.ms来设置。当它重新获取metadata之后,会消空一些缓存,就包括这个sendParitionPerTopicCache。因此,接下来就会生成另一个随机的被缓存的partitionId。
1 2 3 4 5 6 7 | if (topicMetadataRefreshInterval >= 0 && SystemTime.milliseconds - lastTopicMetadataRefreshTime > topicMetadataRefreshInterval) { //若该refresh topic metadata 了,do the refresh Utils.swallowError(brokerPartitionInfo.updateInfo(topicMetadataToRefresh.toSet, correlationId.getAndIncrement)) sendPartitionPerTopicCache.clear() topicMetadataToRefresh.clear lastTopicMetadataRefreshTime = SystemTime.milliseconds } |
1 2 3 | if (brokerId < 0 ) { warn( "Failed to send data since partitions %s don't have a leader" .format(messagesPerTopic.map(_._1).mkString( "," ))) messagesPerTopic.keys.toSeq |
当brokerId<0时,就返回一个非空的Seq,包括了所有没有leader的topic+partition的组合,如果重试了指定次数还不能发送,将最终导致handle方法抛出一个 FailedToSendMessageException异常。
1 | private def groupMessagesToSet(messagesPerTopicAndPartition: collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]) |
/** enforce the compressed.topics config here.
* If the compression codec is anything other than NoCompressionCodec,
* Enable compression only for specified topics if any
* If the list of compressed topics is empty, then enable the specified compression codec for all topics
* If the compression codec is NoCompressionCodec, compression is disabled for all topics
* A sequence of messages stored in a byte buffer
* There are two ways to create a ByteBufferMessageSet
* Option 1: From a ByteBuffer which already contains the serialized message set. Consumers will use this method.
* Option 2: Give it a list of messages along with instructions relating to serialization format. Producers will use this method.
1 2 3 4 5 6 7 8 | compressionCodec match { case DefaultCompressionCodec => new GZIPOutputStream(stream) case GZIPCompressionCodec => new GZIPOutputStream(stream) case SnappyCompressionCodec => import org.xerial.snappy.SnappyOutputStream new SnappyOutputStream(stream) case _ => throw new kafka.common.UnknownCodecException( "Unknown Codec: " + compressionCodec) |
