kafka server: Tried to send a message to a replica that is not the leader for some partition. Your metadata is out of date
错误如标题:
场景:k8s 容器中通过 go语言编写的 sarama 创建一个 AsyncProducer
错误原因查找
1.通过放开sarama的日志(自己实现日志接口,重定义Logger)
1.1 sarama源码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | */ package sarama import ( "io/ioutil" "log" ) // Logger is the instance of a StdLogger interface that Sarama writes connection // management events to. By default it is set to discard all log messages via ioutil.Discard, // but you can set it to redirect wherever you want. var Logger StdLogger = log.New(ioutil.Discard, "[Sarama] " , log.LstdFlags) // StdLogger is used to log error messages. type StdLogger interface { Print(v ... interface {}) Printf(format string, v ... interface {}) Println(v ... interface {}) } |
1.2 源码中具体实现
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | type Feedback struct { out *log.Logger log *log.Logger } func (fb *Feedback) Println(v ... interface {}) { fb.output(fmt.Sprintln(v...)) } func (fb *Feedback) Printf(format string, v ... interface {}) { fb.output(fmt.Sprintf(format, v...)) } func (fb *Feedback) Print(v ... interface {}) { fb.output(fmt.Sprint(v...)) } func (fb *Feedback) output(s string) { if fb.out != nil { fb.out.Output(2, s) } if fb.log != nil { fb.log.Output(2, s) } } |
1.3 自定义 打印日志类
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | type ourLog struct { } func (fb *ourLog) Println(v ... interface {}) { log.Debug(fmt.Sprintln(v...)) } func (fb *ourLog) Printf(format string, v ... interface {}) { log.Debug(fmt.Sprintf(format, v...)) } func (fb *ourLog) Print(v ... interface {}) { log.Debug(fmt.Sprint(v...)) } |
1.4 重定义 sarama.Logger
1 2 3 | #在main中加入 sarama.Logger = &ourLog{} |
2.重启k8s中docker服务后看程序执行日志
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 state change to [retrying-1] [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 abandoning broker 1003 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 state change to [closed] on bi-data-cti-prod-31/4 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 shut down [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) client/metadata fetching metadata for [bi-data-cti-prod-31] from broker sany-onprem-repm-node03:9092 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 starting up [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 state change to [open] on bi-data-cti-prod-31/4 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 selected broker 1003 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 state change to [flushing-1] [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 state change to [normal] [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 state change to [retrying] on bi-data-cti-prod-31/4 because kafka server: Tried to send a message to a replica that is not the leader for some partition. Your metadata is out of date. [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 state change to [retrying-2] [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 abandoning broker 1003 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 state change to [closed] on bi-data-cti-prod-31/4 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 shut down [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) client/metadata fetching metadata for [bi-data-cti-prod-31] from broker sany-onprem-repm-node03:9092 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 starting up [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 state change to [open] on bi-data-cti-prod-31/4 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 selected broker 1003 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 state change to [flushing-2] [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 state change to [normal] [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 state change to [retrying] on bi-data-cti-prod-31/4 because kafka server: Tried to send a message to a replica that is not the leader for some partition. Your metadata is out of date. [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 state change to [retrying-3] [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 abandoning broker 1003 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 state change to [closed] on bi-data-cti-prod-31/4 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 shut down [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) client/metadata fetching metadata for [bi-data-cti-prod-31] from broker sany-onprem-repm-node03:9092 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 starting up [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 state change to [open] on bi-data-cti-prod-31/4 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 selected broker 1003 [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 state change to [flushing-3] [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/leader/bi-data-cti-prod-31/4 state change to [normal] [11:25:58 CST 2020/08/06] [DEBG] (main.(*ourLog).Printf:47) producer/broker/1003 state change to [retrying] on bi-data-cti-prod-31/4 because kafka server: Tried to send a message to a replica that is not the leader for some partition. Your metadata is out of date. [11:25:58 CST 2020/08/06] [DEBG] (app/kafka.(*AsyncProducer).run:92) p-,&{addrs:[sany-onprem-repm-node02:9092 sany-onprem-repm-node03:9092 sany-onprem-repm-node01:9092] username: password: certFile: channelBufferSize:102400 producer:0xc4202ec8c0 done:0xc4202142a0} [11:25:58 CST 2020/08/06] [DEBG] (app/kafka.(*AsyncProducer).run:93) p.producer-, &{client:0xc420218300 conf:0xc420092300 ownClient:true errors:0xc4202c8300 input:0xc4202c8360 successes:0xc4202c83c0 retries:0xc4202c8420 inFlight:{noCopy:{} state1:[0 0 0 0 0 0 0 0 0 0 0 0] sema:0} brokers: map [0xc4200ea160:0xc4200a4240 0xc4200ea6e0:0xc4204990e0 0xc4200eb080:0xc4200a48a0] brokerRefs: map [0xc4200a4240:2 0xc4204990e0:1 0xc4200a48a0:1] brokerLock:{state:0 sema:0}} [11:25:58 CST 2020/08/06] [DEBG] (app/kafka.(*AsyncProducer).run:94) p.producer.Input-,0xc4202c8360 [11:25:58 CST 2020/08/06] [DEBG] (app/kafka.(*AsyncProducer).run:95) p.producer.Successes-, 0xc4202c8360 [11:25:58 CST 2020/08/06] [EROR] (app/kafka.(*AsyncProducer).run:96) kafka: Failed to produce message to topic bi-data-cti-prod-31: kafka server: Tried to send a message to a replica that is not the leader for some partition. Your metadata is out of date., &{Topic:bi-data-cti-prod-31 Key:119 Value:{ "MainType" :2, "ExtType" :9, "Mode" :2, "ModeParm" : "ag-19" , "MSGID" : "753" , "TELID" : "def" , "MSG" :{ "vcc_id" : "1" , "ag_id" : "19" , "que_id" :[ "1" ], "grp_id" : "0" , "ag_sta" : "3" , "ag_sta_reason" : "1" , "ag_sta_id" : "7" , "ag_sta_bef" : "2" , "ag_sta_time" : "1596684344" }} Metadata:<nil> Offset:0 Partition:4 Timestamp:0001-01-01 00:00:00 +0000 UTC retries:0 flags:0} |
3.日志分析
发现 client 每隔10分钟会定期从 kafka broker 拉取最新的 metadata,在我们新建Producer时,默认retries是3,当3次均拉取不到metadata时,那我们当前消息就写不到kafa,并抛出以上异常
kafka server: Tried to send a message to a replica that is not the leader for some partition. Your metadata is out of date, 尝试在kafka构建的服务器上发布此程序,没有发现取不到metadata
的情况,在k8s 容器上执行此程序,不定期就会存在此问题
4.处理方式
目前还没有找到解决方式,可以尝试增加retries的值
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· 终于写完轮子一部分:tcp代理 了,记录一下
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理