sarama Kafka客户端生产者与消费者梳理

生产者

sarama 库提供了同步生产者和异步生产者。

SyncProducer

SyncProducer 是在 AsyncProducer 基础上加以条件限制实现的。

type SyncProducer interface {

	// SendMessage produces a given message, and returns only when it either has
	// succeeded or failed to produce. It will return the partition and the offset
	// of the produced message, or an error if the message failed to produce.
	SendMessage(msg *ProducerMessage) (partition int32, offset int64, err error)

	// SendMessages produces a given set of messages, and returns only when all
	// messages in the set have either succeeded or failed. Note that messages
	// can succeed and fail individually; if some succeed and some fail,
	// SendMessages will return an error.
	SendMessages(msgs []*ProducerMessage) error

	// Close shuts down the producer; you must call this function before a producer
	// object passes out of scope, as it may otherwise leak memory.
	// You must call this before calling Close on the underlying client.
	Close() error
}

func NewSyncProducer(addrs []string, config *Config) (SyncProducer, error) {
	if config == nil {
		config = NewConfig()
		config.Producer.Return.Successes = true
	}

	if err := verifyProducerConfig(config); err != nil {
		return nil, err
	}

	p, err := NewAsyncProducer(addrs, config)
	if err != nil {
		return nil, err
	}
	return newSyncProducerFromAsyncProducer(p.(*asyncProducer)), nil
}

func newSyncProducerFromAsyncProducer(p *asyncProducer) *syncProducer {
	sp := &syncProducer{producer: p}

	sp.wg.Add(2)
	go withRecover(sp.handleSuccesses)
	go withRecover(sp.handleErrors)

	return sp
}

可以看到 newSyncProducerFromAsyncProducer 方法中开启两个 goroutine 处理消息成功与失败的“回调”。这里使用 waitGroup 进行等待，从而将异步操作转变为同步操作。

SyncProducer 提供了批量发送 API, 同一批次中的消息失败与否是独立的，当出现部分失败时，SendMessages 会返回类型为 []*ProducerError 的error。

type ProducerError struct {
	Msg *ProducerMessage
	Err error
}
func (sp *syncProducer) SendMessages(msgs []*ProducerMessage) error {
	expectations := make(chan chan *ProducerError, len(msgs))
	go func() {
		for _, msg := range msgs {
			expectation := make(chan *ProducerError, 1)
			msg.expectation = expectation
			sp.producer.Input() <- msg
			expectations <- expectation
		}
		close(expectations)
	}()

	var errors ProducerErrors
	for expectation := range expectations {
		if pErr := <-expectation; pErr != nil {
			errors = append(errors, pErr)
		}
	}

	if len(errors) > 0 {
		return errors
	}
	return nil
}

所以每个我们可以清楚的知道是哪一个消息发生错误以及得到错误的具体信息。

AsyncProducer

type AsyncProducer interface {

	// AsyncClose triggers a shutdown of the producer. The shutdown has completed
	// when both the Errors and Successes channels have been closed. When calling
	// AsyncClose, you *must* continue to read from those channels in order to
	// drain the results of any messages in flight.
	AsyncClose()

	// Close shuts down the producer and waits for any buffered messages to be
	// flushed. You must call this function before a producer object passes out of
	// scope, as it may otherwise leak memory. You must call this before process
	// shutting down, or you may lose messages. You must call this before calling
	// Close on the underlying client.
	Close() error

	// Input is the input channel for the user to write messages to that they
	// wish to send.
	Input() chan<- *ProducerMessage

	// Successes is the success output channel back to the user when Return.Successes is
	// enabled. If Return.Successes is true, you MUST read from this channel or the
	// Producer will deadlock. It is suggested that you send and read messages
	// together in a single select statement.
	Successes() <-chan *ProducerMessage

	// Errors is the error output channel back to the user. You MUST read from this
	// channel or the Producer will deadlock when the channel is full. Alternatively,
	// you can set Producer.Return.Errors in your config to false, which prevents
	// errors to be returned.
	Errors() <-chan *ProducerError
}

AsyncProducer 提供5个操作行为，分别提供了异步关闭、同步关闭和操作消息的三个行为。我们往 Input() 里丢数据，在 Successe() 和 Errors(）处理“回调” 结果。这种接口设计对用调用者十分友好。

总结

不管是 SyncProducer 还是 AsyncProducer，我们都需要关注“回调”结果，及时的处理提交失败的消息是保障消息不丢失的前提。

消费者

type ConsumerGroup interface {
	Consume(ctx context.Context, topics []string, handler ConsumerGroupHandler) error

	// Errors returns a read channel of errors that occurred during the consumer life-cycle.
	// By default, errors are logged and not returned over this channel.
	// If you want to implement any custom error handling, set your config's
	// Consumer.Return.Errors setting to true, and read from this channel.
	Errors() <-chan error

	// Close stops the ConsumerGroup and detaches any running sessions. It is required to call
	// this function before the object passes out of scope, as it will otherwise leak memory.
	Close() error

	// Pause suspends fetching from the requested partitions. Future calls to the broker will not return any
	// records from these partitions until they have been resumed using Resume()/ResumeAll().
	// Note that this method does not affect partition subscription.
	// In particular, it does not cause a group rebalance when automatic assignment is used.
	Pause(partitions map[string][]int32)

	// Resume resumes specified partitions which have been paused with Pause()/PauseAll().
	// New calls to the broker will return records from these partitions if there are any to be fetched.
	Resume(partitions map[string][]int32)

	// Pause suspends fetching from all partitions. Future calls to the broker will not return any
	// records from these partitions until they have been resumed using Resume()/ResumeAll().
	// Note that this method does not affect partition subscription.
	// In particular, it does not cause a group rebalance when automatic assignment is used.
	PauseAll()

	// Resume resumes all partitions which have been paused with Pause()/PauseAll().
	// New calls to the broker will return records from these partitions if there are any to be fetched.
	ResumeAll()
}

从上述代码得知，一个消费者组实例提供了上述方法，当我们调用 Consume 时，我们需要消费的主题以及处理函数

type ConsumerGroupHandler interface {
	// Setup is run at the beginning of a new session, before ConsumeClaim.
	Setup(ConsumerGroupSession) error

	// Cleanup is run at the end of a session, once all ConsumeClaim goroutines have exited
	// but before the offsets are committed for the very last time.
	Cleanup(ConsumerGroupSession) error

	// ConsumeClaim must start a consumer loop of ConsumerGroupClaim's Messages().
	// Once the Messages() channel is closed, the Handler must finish its processing
	// loop and exit.
	ConsumeClaim(ConsumerGroupSession, ConsumerGroupClaim) error
}

这个 ConsumerGroupHandler 对象是我们传入的，也就是说我们要实现 ConsumerGroupHandler 这个接口中约定的行为。其中 ConsumeClaim 是我们的主体逻辑。ConsumeClaim 有两个入参，分别是

type ConsumerGroupSession interface {
	// Claims returns information about the claimed partitions by topic.
	Claims() map[string][]int32

	// MemberID returns the cluster member ID.
	MemberID() string

	// GenerationID returns the current generation ID.
	GenerationID() int32

	// MarkOffset marks the provided offset, alongside a metadata string
	// that represents the state of the partition consumer at that point in time. The
	// metadata string can be used by another consumer to restore that state, so it
	// can resume consumption.
	//
	// To follow upstream conventions, you are expected to mark the offset of the
	// next message to read, not the last message read. Thus, when calling `MarkOffset`
	// you should typically add one to the offset of the last consumed message.
	//
	// Note: calling MarkOffset does not necessarily commit the offset to the backend
	// store immediately for efficiency reasons, and it may never be committed if
	// your application crashes. This means that you may end up processing the same
	// message twice, and your processing should ideally be idempotent.
	MarkOffset(topic string, partition int32, offset int64, metadata string)

	// Commit the offset to the backend
	//
	// Note: calling Commit performs a blocking synchronous operation.
	Commit()

	// ResetOffset resets to the provided offset, alongside a metadata string that
	// represents the state of the partition consumer at that point in time. Reset
	// acts as a counterpart to MarkOffset, the difference being that it allows to
	// reset an offset to an earlier or smaller value, where MarkOffset only
	// allows incrementing the offset. cf MarkOffset for more details.
	ResetOffset(topic string, partition int32, offset int64, metadata string)

	// MarkMessage marks a message as consumed.
	MarkMessage(msg *ConsumerMessage, metadata string)

	// Context returns the session context.
	Context() context.Context
}

type ConsumerGroupClaim interface {
	// Topic returns the consumed topic name.
	Topic() string

	// Partition returns the consumed partition.
	Partition() int32

	// InitialOffset returns the initial offset that was used as a starting point for this claim.
	InitialOffset() int64

	// HighWaterMarkOffset returns the high water mark offset of the partition,
	// i.e. the offset that will be used for the next message that will be produced.
	// You can use this to determine how far behind the processing is.
	HighWaterMarkOffset() int64

	// Messages returns the read channel for the messages that are returned by
	// the broker. The messages channel will be closed when a new rebalance cycle
	// is due. You must finish processing and mark offsets within
	// Config.Consumer.Group.Session.Timeout before the topic/partition is eventually
	// re-assigned to another group member.
	Messages() <-chan *ConsumerMessage
}

其中 ConsumerGroupSession 是消费者组连接信息，包括我们的消费位移管理，ConsumerGroupClaim 我们消费的消息数据管理

简单梳理一下消费者组主干流程：

新增消费者组实例

client, err := sarama.NewConsumerGroup(strings.Split(brokers, ","), group, config)

2.请求消费数据

 err := client.Consume(ctx, strings.Split(topics, ","), &consumer)

其中 consumer 为我们自定义的 Handler 实例

3.Consume 方法里初始化连接信息

sess, err := c.newSession(ctx, topics, handler, c.config.Consumer.Group.Rebalance.Retry.Max)

4. newSession 方法中处理从 kafka 集群的协调者获取的各种信息、各种逻辑，最终返回 newConsumerGroupSession

return newConsumerGroupSession(ctx, c, claims, join.MemberId, join.GenerationId, handler)

5.newConsumerGroupSession 里会初始化位移管理实例、处理心跳、管理分区等，

offsets, err := newOffsetManagerFromClient(parent.groupID, memberID, generationID, parent.client)
同时调用 sess.consume(topic, partition)

6.consume 方法里初始化 PartitionConsumer

claim, err := newConsumerGroupClaim(s, topic, partition, offset)

PartitionConsumer 的声明如下

type PartitionConsumer interface {
	AsyncClose()
	Close() error
	Messages() <-chan *ConsumerMessage
	Errors() <-chan *ConsumerError
	HighWaterMarkOffset() int64
	Pause()
	Resume()
	IsPaused() bool
}

其实 PartitionConsumer 才是最终消费数据的地方。

7.最终 PartitionConsumer 接口类型的实例会传递给实现 ConsumerGroupClaim 的实体。然后这个实体的实例会作为我们自定义 Handler 的第二个参数,这时就回到了我们自身实现的逻辑上。

if err := s.handler.ConsumeClaim(s, claim); err != nil {
		s.parent.handleError(err, topic, partition)
	}

其中需要注意的是

ConsumerGroupSession 提供的 Commit 是一个阻塞方法，如果要使用非阻塞标记位移提交则可以选择MarkMessage。其实就是只做了一个标记，在需要的时候在调用 Commit。sarama 会在在新增 newOffsetManagerFromClient 判断段是否开启自动位移提交，当开启自动位移提交时会启动一个 goroutine 定期检测执行位移提交操作。

if conf.Consumer.Offsets.AutoCommit.Enable {
		om.ticker = time.NewTicker(conf.Consumer.Offsets.AutoCommit.Interval)
		go withRecover(om.mainLoop)
	} 
func (om *offsetManager) mainLoop() {
	defer om.ticker.Stop()
	defer close(om.closed)

	for {
		select {
		case <-om.ticker.C:
			om.Commit()
		case <-om.closing:
			return
		}
	}
}

func (om *offsetManager) Commit() {
	om.flushToBroker()
	om.releasePOMs(false)
}

一般来说消费者端消息丢失较为少见，消息的重复处理场景较多。这种情况下一般需要业务方做幂等性处理。

Loading

sarama Kafka客户端生产者与消费者梳理

生产者

SyncProducer

AsyncProducer

总结

消费者

推荐阅读

公告