【kafka学习之四】kafka集群性能测试

kafka集群的性能受限于JVM参数、服务器的硬件配置以及kafka的配置,因此需要对所要部署kafka的机器进行性能测试,根据测试结果,找出符合业务需求的最佳配置。

1、kafka broker jVM参数
kafka broker jVM 是由脚本kafka-server-start.sh中参数KAFKA_HEAP_OPTS来控制的,如果不设置,默认是1G
可以在首行添加KAFKA_HEAP_OPTS配置,注意如果要使用G1垃圾回收器,堆内存最小4G,jdk至少jdk7u51以上
举例:
export KAFKA_HEAP_OPTS="-Xmx4G -Xms4G -Xmn2G -XX:PermSize=64m -XX:MaxPermSize=128m -XX:SurvivorRatio=6 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly"

2、kafka集群性能测试工具 (基于kafka_2.11-0.11.0.0)
kafka自带的测试工具:针对生产者的kafka-producer-perf-test.sh和针对消费者的kafka-consumer-perf-test.sh
2.1 kafka-producer-perf-test.sh
参数说明:
--help 显示帮助
--topic  topic名称
--record-size       每条消息的字节数
--throughput        消息吞吐量,每秒钟发送的消息数
--producer-props    生产者相关的配置属性, 如bootstrap.servers,client.id等,这些配置优先于--producer.config
--producer.config 生产者配置文件,producer.properties
--print-metrics 在测试结束时打印出度量值。(默认值: false)
--num-records       总共需要发送的消息数量
--payload-file 要发送消息所在的文件名,文件里是要发送消息数据,与num-records两者至少选一个
--payload-delimiter payload-file中消息分隔符
--transactional-id 用于测试并发事务的性能 (默认值:performance-producer-default-transactional-id)
--transaction-duration-ms 事务最大值,当超过这个时间 就会提交事务 (默认值: 0)

举例:
./bin/kafka-producer-perf-test.sh --topic test-pati3-rep2 --throughput 500000 --num-records 1500000 --record-size 1000 --producer.config config/producer.properties --producer-props bootstrap.servers=10.1.8.16:9092,10.1.8.15:9092,10.1.8.14:9092 acks=1

测试维度:可以调整JVM、分区数、副本数、throughput--吞吐量、record-size--消息大小、acks--副本响应模式、compression-codec--压缩方式

[cluster@PCS101 bin]$ ./kafka-producer-perf-test.sh --topic REC-CBBO-MSG-TOPIC --throughput 50000 --num-records 150000 --record-size 102400 --producer-props bootstrap.servers=134.32.123.101:9092,134.32.123.102:9092,134.32.123.103:9092 acks=all --print-metrics
12786 records sent, 2556.7 records/sec (249.68 MB/sec), 122.1 ms avg latency, 231.0 max latency.
14827 records sent, 2965.4 records/sec (289.59 MB/sec), 109.4 ms avg latency, 291.0 max latency.
14587 records sent, 2917.4 records/sec (284.90 MB/sec), 111.6 ms avg latency, 374.0 max latency.
14292 records sent, 2858.4 records/sec (279.14 MB/sec), 114.8 ms avg latency, 389.0 max latency.
14557 records sent, 2910.8 records/sec (284.26 MB/sec), 112.3 ms avg latency, 354.0 max latency.
14524 records sent, 2904.2 records/sec (283.62 MB/sec), 113.1 ms avg latency, 362.0 max latency.
14686 records sent, 2937.2 records/sec (286.84 MB/sec), 111.4 ms avg latency, 348.0 max latency.
14637 records sent, 2927.4 records/sec (285.88 MB/sec), 111.8 ms avg latency, 378.0 max latency.
15186 records sent, 3037.2 records/sec (296.60 MB/sec), 107.9 ms avg latency, 343.0 max latency.
14584 records sent, 2916.2 records/sec (284.79 MB/sec), 112.4 ms avg latency, 356.0 max latency.
150000 records sent, 2888.170055 records/sec (282.05 MB/sec), 112.78 ms avg latency, 389.00 ms max latency, 11 ms 50th, 321 ms 95th, 340 ms 99th, 375 ms 99.9th.

最后一条记录是个总体统计:发送的总记录数,平均的TPS(每秒处理的消息数),平均延迟,最大延迟, 然后我们将发送记录数最小的那一行作为生产者瓶颈(红色记录)

如果加上--print-metrics  最后会打印metrics统计信息:

 

Metric Name                                                                                 Value
kafka-metrics-count:count:{client-id=producer-1}                                          : 84.000
producer-metrics:batch-size-avg:{client-id=producer-1}                                    : 102472.000
producer-metrics:batch-size-max:{client-id=producer-1}                                    : 102472.000
producer-metrics:batch-split-rate:{client-id=producer-1}                                  : 0.000
producer-metrics:buffer-available-bytes:{client-id=producer-1}                            : 33554432.000
producer-metrics:buffer-exhausted-rate:{client-id=producer-1}                             : 0.000
producer-metrics:buffer-total-bytes:{client-id=producer-1}                                : 33554432.000
producer-metrics:bufferpool-wait-ratio:{client-id=producer-1}                             : 0.857
producer-metrics:compression-rate-avg:{client-id=producer-1}                              : 1.000
producer-metrics:connection-close-rate:{client-id=producer-1}                             : 0.000
producer-metrics:connection-count:{client-id=producer-1}                                  : 5.000
producer-metrics:connection-creation-rate:{client-id=producer-1}                          : 0.091
producer-metrics:incoming-byte-rate:{client-id=producer-1}                                : 87902.611
producer-metrics:io-ratio:{client-id=producer-1}                                          : 0.138
producer-metrics:io-time-ns-avg:{client-id=producer-1}                                    : 69622.263
producer-metrics:io-wait-ratio:{client-id=producer-1}                                     : 0.329
producer-metrics:io-wait-time-ns-avg:{client-id=producer-1}                               : 166147.404
producer-metrics:metadata-age:{client-id=producer-1}                                      : 55.104
producer-metrics:network-io-rate:{client-id=producer-1}                                   : 1557.405
producer-metrics:outgoing-byte-rate:{client-id=producer-1}                                : 278762290.882
producer-metrics:produce-throttle-time-avg:{client-id=producer-1}                         : 0.000
producer-metrics:produce-throttle-time-max:{client-id=producer-1}                         : 0.000
producer-metrics:record-error-rate:{client-id=producer-1}                                 : 0.000
producer-metrics:record-queue-time-avg:{client-id=producer-1}                             : 110.963
producer-metrics:record-queue-time-max:{client-id=producer-1}                             : 391.000
producer-metrics:record-retry-rate:{client-id=producer-1}                                 : 0.000
producer-metrics:record-send-rate:{client-id=producer-1}                                  : 2724.499
producer-metrics:record-size-avg:{client-id=producer-1}                                   : 102487.000
producer-metrics:record-size-max:{client-id=producer-1}                                   : 102487.000
producer-metrics:records-per-request-avg:{client-id=producer-1}                           : 3.493
producer-metrics:request-latency-avg:{client-id=producer-1}                               : 7.011
producer-metrics:request-latency-max:{client-id=producer-1}                               : 56.000
producer-metrics:request-rate:{client-id=producer-1}                                      : 778.702
producer-metrics:request-size-avg:{client-id=producer-1}                                  : 357989.537
producer-metrics:request-size-max:{client-id=producer-1}                                  : 614940.000
producer-metrics:requests-in-flight:{client-id=producer-1}                                : 0.000
producer-metrics:response-rate:{client-id=producer-1}                                     : 778.731
producer-metrics:select-rate:{client-id=producer-1}                                       : 1979.326
producer-metrics:waiting-threads:{client-id=producer-1}                                   : 0.000
producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node--1}          : 19.601
producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node--2}          : 3.956
producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node-0}           : 31220.396
producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node-1}           : 29885.883
producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node-2}           : 26920.163
producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node--1}          : 1.324
producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node--2}          : 0.436
producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node-0}           : 98518580.943
producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node-1}           : 82114190.903
producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node-2}           : 98518948.091
producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node--1}         : 0.000
producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node--2}         : 0.000
producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node-0}          : 6.891
producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node-1}          : 5.135
producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node-2}          : 11.202
producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node--1}         : -Infinity
producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node--2}         : -Infinity
producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node-0}          : 56.000
producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node-1}          : 46.000
producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node-2}          : 55.000
producer-node-metrics:request-rate:{client-id=producer-1, node-id=node--1}                : 0.036
producer-node-metrics:request-rate:{client-id=producer-1, node-id=node--2}                : 0.018
producer-node-metrics:request-rate:{client-id=producer-1, node-id=node-0}                 : 279.365
producer-node-metrics:request-rate:{client-id=producer-1, node-id=node-1}                 : 340.136
producer-node-metrics:request-rate:{client-id=producer-1, node-id=node-2}                 : 160.233
producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node--1}            : 36.500
producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node--2}            : 24.000
producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node-0}             : 352658.869
producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node-1}             : 241415.634
producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node-2}             : 614858.709
producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node--1}            : 49.000
producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node--2}            : 24.000
producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node-0}             : 614940.000
producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node-1}             : 512460.000
producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node-2}             : 614940.000
producer-node-metrics:response-rate:{client-id=producer-1, node-id=node--1}               : 0.036
producer-node-metrics:response-rate:{client-id=producer-1, node-id=node--2}               : 0.018
producer-node-metrics:response-rate:{client-id=producer-1, node-id=node-0}                : 279.486
producer-node-metrics:response-rate:{client-id=producer-1, node-id=node-1}                : 340.284
producer-node-metrics:response-rate:{client-id=producer-1, node-id=node-2}                : 160.233
producer-topic-metrics:byte-rate:{client-id=producer-1, topic=REC-CBBO-MSG-TOPIC}         : 279184829.991
producer-topic-metrics:compression-rate:{client-id=producer-1, topic=REC-CBBO-MSG-TOPIC}  : 1.000
producer-topic-metrics:record-error-rate:{client-id=producer-1, topic=REC-CBBO-MSG-TOPIC} : 0.000
producer-topic-metrics:record-retry-rate:{client-id=producer-1, topic=REC-CBBO-MSG-TOPIC} : 0.000
producer-topic-metrics:record-send-rate:{client-id=producer-1, topic=REC-CBBO-MSG-TOPIC}  : 2724.548

 

2.2 kafka-consumer-perf-test.sh
参数说明:
--help 显示帮助
--batch-size 在单个批处理中写入的消息数。(默认值: 200)
--broker-list 使用新的消费者是必需的,如果使用老的消费者就不是必需的
--supported codec 压缩方式 NoCompressionCodec 为 0(默认0不压缩), GZIPCompressionCodec 为 1, SnappyCompressionCodec 为 2, LZ4CompressionCodec 为3
--consumer.config 指定消费者配置文件 consumer.properties
--date-format 用于格式化时间字段的格式化字符串 (默认: yyyy-MM-dd HH:mm:ss:SSS)
--fetch-size 单个消费请求获取的数据字节量(默认: 1048576 (1M))
--from-latest 如果消费者还没有已建立的偏移量, 就从日志中的最新消息开始, 而不是最早的消息。
--group 消费者组id (默认值: perf-consumer-29512)
--hide-header 跳过打印数据头的统计信息
--message-size 每条消息大小(默认: 100字节)
--messages 必需,要获取的消息总数量
--new-consumer 使用新的消费者 这是默认值
--num-fetch-threads 获取消息的线程数 (默认: 1)
--print-metrics 打印出指标。这只适用于新的消费者。
--reporting-interval 打印报告信息的间隔 (以毫秒为单位,默认值: 5000)
--show-detailed-stats 根据报告间隔配置的每个报告间隔报告统计信息。
--socket-buffer-size TCP 获取信息的缓存大小(默认: 2097152 (2M))
--threads 处理线程数 (默认: 10)
--topic 必需 主题名称
--zookeeper zk清单,当使用老的消费者时必需

测试维度:调整以上参数值

[cluster@PCS101 bin]$ ./kafka-consumer-perf-test.sh --topic REC-CBBO-MSG-TOPIC --messages 500000 --message-size 102400 --batch-size 50000 --fetch-size 1048576 --num-fetch-threads 17 --threads 10 --zookeeper 134.32.123.101:2181,134.32.123.102:2181,134.32.123.103:2181 --print-metrics
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2018-10-01 09:50:43:707, 2018-10-01 09:51:40:553, 84487.7018, 1486.2559, 874167, 15377.8102

消费者瓶颈:1486.2559 MB.sec,15377nMsg.sec

 

3、可视化性能分析工具-Kafka Manager(Yammer Metrics)

 --后续更新

posted @ 2018-10-01 09:55  cac2020  阅读(7818)  评论(0编辑  收藏  举报