Apache/Inlong Sort指标监控埋点开发心得
wsl启动flink,但是windows无法访问localhost:8081 taskmanager,
原因:wsl2是基于hyper-v的虚拟机,采用的是新一套的NAT方案,较为独立。所以对于wsl2也要用和普通虚拟机一样的网络方案,如NAT转发和桥接。
方法:
%USERPROFILE%增加.wslconfig配置文件,
[experimental]
networkingMode=mirrored
dnsTunneling=true
firewall=true
autoProxy=true
hostAddressLoopback=true
这样搞还有一个好处就是不用搞clash代理了
Sort埋点到底要干什么,这是一个问题
1. 数据处理异常指标:
-
反序列化失败次数 (
deserializationErrors
):- 描述:记录数据从字节流反序列化为对象时失败的次数。这有助于发现数据格式错误或兼容性问题。
-
写入失败次数 (
writeFailures
):- 描述:记录数据写入下游存储系统(如 Kafka、HBase 等)时的失败次数,帮助监控目标系统的稳定性或网络问题。
-
数据转换错误 (
dataTransformationErrors
):- 描述:记录在数据转换过程中(如格式转换、类型转换)发生的错误次数,帮助检测数据清洗和处理中的问题。
-
脏数据记录 (
dirtyRecords
):- 描述:记录因格式不符或内容异常而无法处理的数据条目数。这类数据可以帮助识别和排查不合规的数据源。
2. Checkpoint 指标:
-
Checkpoint 成功次数 (
checkpointSuccesses
):- 描述:记录成功完成的 Checkpoint 数量,有助于监控作业的稳定性和数据一致性。
-
Checkpoint 失败次数 (
checkpointFailures
):- 描述:记录由于超时或资源不足等原因导致的 Checkpoint 失败次数。通过监控这个指标,可以提前预警潜在的系统瓶颈。
-
最后一个 Checkpoint 的耗时 (
lastCheckpointDuration
):- 描述:记录最后一个成功完成的 Checkpoint 所花费的时间,可以帮助优化 Checkpoint 的间隔和配置。
-
当前未完成的 Checkpoint 数量 (
pendingCheckpoints
):- 描述:记录当前正在进行但尚未完成的 Checkpoint 数量,以监控系统的压力和潜在的延迟。
3. 性能和延迟指标:
-
处理延迟 (
processingLatency
):- 描述:记录从数据进入系统到被处理完成的延迟时间,帮助分析处理效率和性能瓶颈。
-
消息堆积情况 (
messageBacklog
):- 描述:记录消息在队列中的堆积数量或字节数,可以帮助判断系统是否存在处理滞后的问题。
4. 资源使用情况指标:
-
内存使用情况 (
memoryUsage
):- 描述:记录作业内存的使用情况,包括堆内存和非堆内存的占用,帮助监控和优化资源配置。
-
CPU 使用率 (
cpuUsage
):- 描述:记录作业的 CPU 使用率,有助于判断系统的计算压力。
-
网络 I/O 使用情况 (
networkIOUsage
):- 描述:记录作业的网络输入输出数据量,帮助监控数据传输的性能和瓶颈。
5. 自定义业务指标:
- 关键业务事件处理次数 (
businessEventCount
):- 描述:根据具体的业务需求,记录关键事件的处理次数,比如订单创建、支付成功等,以便于业务层面的监控。
这些指标可以分类到 Counter
、Meter
、Gauge
和 Histogram
等不同的指标类型中
计数器是累积增加的指标类型,通常用于记录事件的发生次数。适用于以下指标:
- 反序列化失败次数 (
deserializationErrors
)- 描述:记录反序列化失败的总次数。
- 写入失败次数 (
writeFailures
)- 描述:记录写入下游系统失败的总次数。
- 数据转换错误 (
dataTransformationErrors
)- 描述:记录数据转换过程中发生的错误总次数。
- 脏数据记录 (
dirtyRecords
)- 描述:记录被标记为脏数据的条目总数。
- Checkpoint 成功次数 (
checkpointSuccesses
)- 描述:记录成功完成的 Checkpoint 总次数。
- Checkpoint 失败次数 (
checkpointFailures
)- 描述:记录由于各种原因导致的 Checkpoint 失败总次数。
- 关键业务事件处理次数 (
businessEventCount
)- 描述:记录特定业务事件(如订单创建、支付成功等)的处理次数。
2. Meter(速率计)
Meter 用于度量事件的发生速率(如每秒多少次)。适用于以下指标:
- 每秒输入/输出记录数 (
numRecordsInPerSecond
,numRecordsOutPerSecond
)- 描述:每秒处理的记录数量。
- 每秒输入/输出字节数 (
numBytesInPerSecond
,numBytesOutPerSecond
)- 描述:每秒处理的数据字节数。
- Checkpoint 成功/失败速率(派生自
checkpointSuccesses
和checkpointFailures
)- 描述:Checkpoint 成功或失败的速率。
3. Gauge(测量器)
Gauge 用于测量单一的即时值,通常用于监控系统的当前状态。适用于以下指标:
- 当前未完成的 Checkpoint 数量 (
pendingCheckpoints
)- 描述:当前正在进行但尚未完成的 Checkpoint 数量。
- 最后一个 Checkpoint 的耗时 (
lastCheckpointDuration
)- 描述:最后一个成功完成的 Checkpoint 耗费的时间。
- 处理延迟 (
processingLatency
)- 描述:从数据进入到处理完成的时间延迟。
- 内存使用情况 (
memoryUsage
)- 描述:当前作业的内存使用量(可以细分为堆内存和非堆内存)。
- CPU 使用率 (
cpuUsage
)- 描述:当前作业的 CPU 使用率。
- 消息堆积情况 (
messageBacklog
)- 描述:消息在队列中的当前堆积数量或字节数。
- 网络 I/O 使用情况 (
networkIOUsage
)- 描述:当前网络 I/O 的使用情况。
4. Histogram(直方图)
Histogram 用于度量数据的分布情况,适用于记录多个数据点的分布。适用于以下指标:
- 处理延迟分布(可从
processingLatency
扩展而来)- 描述:记录延迟的分布情况,可以了解系统的响应时间是否稳定。
- Checkpoint 时间分布(可从
lastCheckpointDuration
扩展而来)- 描述:记录每个 Checkpoint 所花费时间的分布情况。
5. 其他分类
-
Timer(计时器):实际上是
Meter
和Histogram
的结合,用于测量事件的时间间隔(例如处理延迟)。在 Flink 中,它可以通过Gauge
和Histogram
的组合来实现。获取原生flink metric
进入taksmanager和jobmanager容器,conf/flink-conf.properties中增加
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter metrics.reporter.slf4j.interval: 60 SECONDS
使用slf4j日志打印flink日志
我突然发现sort里面自己写的注册的counter比如
private Counter dirtyRecordsOut;
也会被打印出来那为什么还要自己实现PrintReporter呢,这是一个问题
TaskManager日志
TaskManager日志
2024-09-01 22:41:16 2024-09-01 14:41:16,750 INFO org.apache.flink.metrics.slf4j.Slf4jReporter [] -
2024-09-01 22:41:16 =========================== Starting metrics report ===========================
2024-09-01 22:41:16
2024-09-01 22:41:16 -- Counters -------------------------------------------------------------------
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBytesInLocal: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numRecordsOut: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numBytesIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numBytesIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBuffersInLocal: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBuffersInLocal: 252
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBuffersInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesOut: 11466
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesIn: 12248
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numBytesOutForMeter: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numRecordsOut: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numBytesIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numRecordsOut: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numRecordsOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numRecordsOutForMeter: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBytesInLocal: 12248
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBuffersInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numRecordsOut: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.dirtyBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numRecordsIn: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numRecordsInErrors: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesInLocal: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numRecordsIn: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numRecordsIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersInLocal: 252
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numRecordsIn: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersInLocal: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersOut: 5
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numBytesIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBytesInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.dirtyRecordsOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numRecordsOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numRecordsOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numRecordsIn: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBytesInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesInLocal: 12248
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numRecordsIn: 498
2024-09-01 22:41:16
2024-09-01 22:41:16 -- Gauges ---------------------------------------------------------------------
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.AvailableMemory: 134119424
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.currentOutputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Mapped.MemoryUsed: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.outPoolUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Output.Buffers.outputQueueSize: -4
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.isBackPressured: false
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.GarbageCollector.G1 Young Generation.Count: 6
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.softBackPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.currentOutputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.currentInputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.outPoolUsage: 0.1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Network.TotalMemorySegments: 4096
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.Buffers.inputExclusiveBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Direct.MemoryUsed: 135033857
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.inputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.maxHardBackPressureTimeMs: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.Buffers.inputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.ClassLoader.ClassesUnloaded: 57
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.checkpointAlignmentTime: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.GarbageCollector.G1 Young Generation.Time: 94
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.UsedMemorySegments: 3
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Network.AvailableMemorySegments: 4093
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.NonHeap.Used: 126174120
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.idleTimeMsPerSecond: 1000
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Flink.Memory.Managed.Used: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.maxHardBackPressureTimeMs: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.inputExclusiveBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Metaspace.Used: 84064344
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Output.Buffers.outPoolUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.Buffers.inputFloatingBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.sourceIdleTime: 293
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.inputExclusiveBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.NonHeap.Max: 780140544
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.TotalMemorySegments: 4096
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.hardBackPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.hardBackPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.GarbageCollector.G1 Old Generation.Time: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.Buffers.inputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.outputQueueLength: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Mapped.Count: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.AvailableMemorySegments: 4093
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.CPU.Load: 4.0944947186201034E-4
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.outputQueueLength: 1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Direct.Count: 4121
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Output.Buffers.outputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Output.Buffers.outputQueueLength: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.inputQueueLength: 1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.Buffers.inPoolUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.maxSoftBackPressureTimeMs: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.backPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Metaspace.Max: 268435456
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.busyTimeMsPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.GarbageCollector.G1 Old Generation.Count: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.maxSoftBackPressureTimeMs: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.RequestedMemoryUsage: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Heap.Used: 223104576
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.ClassLoader.ClassesLoaded: 14177
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.Buffers.inputQueueLength: 1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Heap.Max: 536870912
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.Buffers.inPoolUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.softBackPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.idleTimeMsPerSecond: 1000
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.outputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.UsedMemory: 98304
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Output.Buffers.outputQueueLength: 1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.isBackPressured: false
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.backPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Threads.Count: 119
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.Buffers.inputQueueLength: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.currentEmitEventTimeLag: -1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.outputQueueSize: -4
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.Buffers.inputExclusiveBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.currentInputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Flink.Memory.Managed.Total: 536870920
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.checkpointStartDelayNanos: 3000000
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.currentOutputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.currentFetchEventTimeLag: 64
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Heap.Committed: 536870912
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Metaspace.Committed: 87420928
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.currentInputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.NonHeap.Committed: 133357568
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.checkpointStartDelayNanos: 3000000
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.inputQueueLength: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.inputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.currentOutputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.inputFloatingBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.busyTimeMsPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Output.Buffers.outPoolUsage: 0.1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.inputFloatingBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.inPoolUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.TotalMemory: 134217728
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.currentInputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.Buffers.inputFloatingBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.CPU.Time: 33960000000
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Direct.TotalCapacity: 135033856
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Mapped.TotalCapacity: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.inPoolUsage: 0.0
2024-09-01 22:41:16
2024-09-01 22:41:16 -- Meters ---------------------------------------------------------------------
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numRecordsInPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numRecordsOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numBytesInPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numRecordsOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numRecordsInPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numRecordsInPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numRecordsOutPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numRecordsInPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersInLocalPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesInPerSecond: 47.18333333333333
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersInLocalPerSecond: 0.9666666666666667
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBytesInLocalPerSecond: 47.18333333333333
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersOutPerSecond: 0.016666666666666666
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numBytesInPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesInLocalPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBuffersInLocalPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numRecordsInPerSecond: 1.9166666666666667
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBuffersInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numRecordsOutPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numRecordsOutPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBytesInLocalPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBytesInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBytesInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBuffersInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numRecordsOutPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesOutPerSecond: 47.36666666666667
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBuffersInLocalPerSecond: 0.9666666666666667
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesInPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesInLocalPerSecond: 47.18333333333333
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numRecordsOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numBytesInPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numRecordsInPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numBytesInPerSecond: 0.0
2024-09-01 22:41:16
2024-09-01 22:41:16 -- Histograms -----------------------------------------------------------------
2024-09-01 22:41:16
2024-09-01 22:41:16 =========================== Finished metrics report ===========================
2024-09-01 22:41:16
JobManager日志
2024-09-01 22:44:29 =========================== Starting metrics report ===========================
2024-09-01 22:44:29
2024-09-01 22:44:29 -- Counters -------------------------------------------------------------------
2024-09-01 22:44:29
2024-09-01 22:44:29 -- Gauges ---------------------------------------------------------------------
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.GarbageCollector.G1 Young Generation.Count: 4
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.GarbageCollector.G1 Old Generation.Count: 0
2024-09-01 22:44:29 jobmanager.jobmanager.taskSlotsAvailable: 1
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.numberOfFailedCheckpoints: 0
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.GarbageCollector.G1 Old Generation.Time: 0
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.initializingTime: 0
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.NonHeap.Max: 780140544
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Mapped.TotalCapacity: 0
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Metaspace.Committed: 80605184
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.totalNumberOfCheckpoints: 8
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointRestoreTimestamp: -1
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Direct.Count: 66
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.ClassLoader.ClassesLoaded: 13354
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.restartingTime: 0
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.CPU.Time: 31700000000
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.CPU.Load: 2.020442735990302E-4
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointFullSize: 32085
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Direct.MemoryUsed: 168607432
2024-09-01 22:44:29 jobmanager.jobmanager.numRegisteredTaskManagers: 1
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.ClassLoader.ClassesUnloaded: 0
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.uptime: 451501
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.NonHeap.Used: 117687288
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.runningTime: 451501
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Metaspace.Max: 268435456
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Mapped.MemoryUsed: 0
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.numberOfCompletedCheckpoints: 8
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Heap.Used: 447316088
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointProcessedData: 0
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.failingTime: 0
2024-09-01 22:44:29 jobmanager.jobmanager.taskSlotsTotal: 2
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointPersistedData: 0
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.NonHeap.Committed: 124903424
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Mapped.Count: 0
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Heap.Committed: 1073741824
2024-09-01 22:44:29 jobmanager.jobmanager.numRunningJobs: 1
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Direct.TotalCapacity: 168607431
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointExternalPath: <checkpoint-not-externally-addressable>
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Threads.Count: 161
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.cancellingTime: 0
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Metaspace.Used: 77083448
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.createdTime: 0
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.downtime: 0
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointSize: 32085
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.GarbageCollector.G1 Young Generation.Time: 50
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.fullRestarts: 0
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.deployingTime: 0
2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Heap.Max: 1073741824
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.numberOfInProgressCheckpoints: 0
2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointDuration: 7
2024-09-01 22:44:29
2024-09-01 22:44:29 -- Meters ---------------------------------------------------------------------
2024-09-01 22:44:29
2024-09-01 22:44:29 -- Histograms -----------------------------------------------------------------
2024-09-01 22:44:29
2024-09-01 22:44:29 =========================== Finished metrics report ===========================
2024-09-01 22:44:29
最终实现
埋点分类
针对 InLong Sort Flink Connector 中的 metric
埋点策略,以下是 Postgres-CDC
中的埋点分类,并对应到代码中的具体变量名。此策略适用于大多数 Connector,但每个 Connector 可能需要根据具体业务逻辑做特定的调整。
1. 序列化/反序列化埋点
序列化和反序列化是数据处理中的核心环节。埋点可以帮助监控数据格式转换的性能瓶颈。
-
埋点内容:
-
反序列化成功次数:每次反序列化成功时增加统计。
- 代码变量:
Counter numDeserializeSuccess
- 代码变量:
-
反序列化失败次数:反序列化过程中出现异常时记录错误次数。
- 代码变量:
Counter numDeserializeError
- 代码变量:
-
反序列化延迟:记录从反序列化开始到结束的耗时,帮助分析性能瓶颈。
- 代码变量:
Gauge<Long> deserializeTimeLag
- 代码变量:
-
序列化成功失败次数:序列化已经实现在sinkMetricData中,但是未完全迁移到sinkExactlyMetric。invokeDirty可以记录脏数据条数和Bytes数,可以认为是序列化失败埋点,参考
org.apache.inlong.sort.elasticsearch.table.ElasticsearchSinkFunctionBase#process
对于脏数据的处理如果如果认领的Connector中有invokeDirty/invokeDirtyWithEstimate逻辑,记得从SinkMetricData中迁移到SinkExactlyMetric中。 -
序列化延迟:考虑为涉及serialize的sink(如ElasticSearch)增加序列化延迟(
Counter serializeTimeLag
)()
-
2. SnapshotState 相关的埋点
Flink 的状态处理机制允许对 Connector 状态进行快照和恢复。监控这些状态可以帮助识别状态管理过程中可能的性能问题。
- 埋点内容:
- 快照创建次数:每次执行
snapshotState
时记录快照的创建次数。- 代码变量:
Counter numSnapshotCreate
- 代码变量:
- 快照错误次数:在快照过程中出现错误时增加错误统计。
- 代码变量:
Counter numSnapshotError
- 代码变量:
- 快照创建次数:每次执行
3. 通知完成 (notifyComplete)
Flink 在完成某个批次或事务后,会触发 notifyComplete
回调,通知任务的完成状态。在这一步埋点,可以帮助确保事务或批次的正确完成。
- 埋点内容:
- 完成的事务或批次数:统计每次
notifyComplete
调用时的成功事务或批次次数。- 代码变量:
Counter numCompletedSnapshots
- 代码变量:
- 处理时间:记录每次
notifyComplete
从调用到结束的处理时间,分析可能的延迟。- 代码变量:
Gauge<Long> snapshotToCheckpointTimeLag
- 代码变量:
- 完成的事务或批次数:统计每次
4. source 读取/sink 写入的条数及数据大小(可能已经实现)
监控数据的读取(source)或写入(sink)情况是至关重要的。通过埋点,可以统计读取和写入的数据条数和数据大小。
埋点内容: 读取条数:记录从 source 中读取的数据记录数。 - 代码变量:numRecordsIn
写入条数:记录 sink 中写入的数据记录数。 - 代码变量:numRecordsOut
读取/写入数据大小:记录每批读取或写入的数据的总大小,用于评估吞吐量。 - 代码变量:numBytesIn
, numBytesOut
例子:Postgres-CDC v1.15
inlong-sort/sort-flink/sort-flink-v1.15/sort-connectors/postgres-cdc
1. 反序列化埋点
实现细节:
-
反序列化时间:通过计算
deserialize
方法调用时的起始时间和结束时间,记录每次反序列化操作的时间延迟。 -
反序列化成功/错误:反序列化成功后调用
incNumDeserializeSuccess()
增加成功计数,反序列化出错时调用incNumDeserializeError()
记录错误次数。 -
相关代码位置:
-
类
RowDataDebeziumDeserializeSchema
中的deserialize()
方法:@Override public void deserialize(SourceRecord record, Collector<RowData> out) throws Exception { long deserializeStartTime = System.currentTimeMillis(); try { doDeserialize(record, out, deserializeStartTime); } catch (Exception e) { if (sourceExactlyMetric != null) { sourceExactlyMetric.incNumDeserializeError(); } throw e; } } private void doDeserialize(SourceRecord record, Collector<RowData> out, long deserializeStartTime) throws Exception { // Perform deserialization logic... if (sourceExactlyMetric != null) { sourceExactlyMetric.incNumDeserializeSuccess(); sourceExactlyMetric.recordDeserializeDelay(System.currentTimeMillis() - deserializeStartTime); } }
-
埋点内容:
- 反序列化延迟:记录反序列化的耗时。
- 反序列化成功/失败计数:统计反序列化成功和错误次数。
2. SnapshotState相关的埋点
实现细节:
-
SnapshotState的埋点:在每次
snapshotState
调用时,通过sourceExactlyMetric
记录快照的创建次数以及在遇到错误时记录错误次数。 -
相关代码位置:
-
类
DebeziumSourceFunction
中的snapshotState()
方法:@Override public void snapshotState(FunctionSnapshotContext functionSnapshotContext) throws Exception { try { doSnapshotState(functionSnapshotContext); } catch (Exception e) { if (sourceExactlyMetric != null) { sourceExactlyMetric.incNumSnapshotError(); } throw e; } } private void doSnapshotState(FunctionSnapshotContext functionSnapshotContext) throws Exception { // snapshot代码 // record the start time of each checkpoint Long checkpointId = functionSnapshotContext.getCheckpointId(); if (checkpointStartTimeMap != null) { checkpointStartTimeMap.put(checkpointId, System.currentTimeMillis()); } else { LOG.error("checkpointStartTimeMap is null, can't record the start time of checkpoint"); } if (sourceExactlyMetric != null) { sourceExactlyMetric.incNumSnapshotCreate(); } }
-
埋点内容:
- 快照创建次数:每次快照创建时调用
incNumSnapshotCreate()
。 - 快照错误次数:如果快照过程中遇到错误,调用
incNumSnapshotError()
记录错误次数。
3. 通知Checkpoint完成 (notifyCheckpointComplete)
实现细节:
-
notifyComplete埋点:在
notifyCheckpointComplete
方法中埋点,统计每次检查点完成后的时间延迟,并记录成功的检查点数量。 -
相关代码位置:
-
类
DebeziumSourceFunction
中的notifyCheckpointComplete()
方法:@Override public void notifyCheckpointComplete(long checkpointId) { // Record completed checkpoint and delay time // get the start time of the currently completed checkpoint if (checkpointStartTimeMap != null) { Long snapShotStartTimeById = checkpointStartTimeMap.remove(checkpointId); if (snapShotStartTimeById != null && sourceExactlyMetric != null) { sourceExactlyMetric.incNumCompletedSnapshots(); sourceExactlyMetric .recordSnapshotToCheckpointDelay(System.currentTimeMillis() - snapShotStartTimeById); } } else { LOG.error("checkpointStartTimeMap is null, can't get the start time of checkpoint"); } }
-
说明:多重checkpoint和checkpoint重叠的情况下,可能一个checkpoint还没有完成另一个新的checkpoint就开始被创建,所以这里用map来管理checkpoint开始的时间(待确认合理性)
埋点内容:
- 检查点完成次数:每当检查点完成时调用
incNumCompletedSnapshots()
。 - 快照到检查点延迟:记录从快照开始到检查点完成的时间延迟,帮助分析系统性能。
效果展示
![e3a03ea57cdf477591c32e41d283d482](file:///D:/%E5%9B%BE%E7%89%87/Typedown/e3a03ea5-7cdf-4775-91c3-2e41d283d482.png?msec=1726991091239)
测试方法
v1.13已经废弃,1.18用于批处理,所以我们基本在sort-end-to-end-tests-v1.15
编写测试。
STEP1: 在你需要使用的测试类对应的SQL中增加metric label
记得在flinksql中添加
'inlong.metric.labels' = 'groupId=pggroup&streamId=pgStream&nodeId=pgNode',
把group/stream/nodeid换成需要的名字(名字不影响功能,但一定要有这个'inlong.metric.labels'
)
例子:
![f69fc634e35b48e5917cbd5d986b44bd](file:///D:/%E5%9B%BE%E7%89%87/Typedown/f69fc634-e35b-48e5-917c-bd5d986b44bd.png?msec=1726991091240)
STEP2 想办法让flink把metrics打印出来
(当然也可以用fllnk web dashboard里面的metrics)end2endTest里面的Test都继承了FlinkContainerTestEnv
,观察代码发现FlinkContainerTestEnv
启动了Flink环境,所以在这个类中增加打印metrics的配置即可。具体就是在private static final String FLINK_PROPERTIES
中增加以下配置(注意格式),如图所示。然后就可以在Flink容器中看到metrics的日志了。🥳
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 5 SECONDS
![fa9dd42f2f0043b48ecbf99fcfb29588](file:///D:/%E5%9B%BE%E7%89%87/Typedown/fa9dd42f-2f00-43b4-8ecb-f99fcfb29588.png?msec=1726991091250)
STEP3 运行End2EndTest
进入sort-end-to-end-tests-v1.15文件夹,运行
mvn test -Dtest=Postgres2StarRocksTest
把-DTest=后面的类换成你写的/想要使用的end2end测试类
运行test报错的话看这里
从GitHub Action上弄下来的原始测试方法,如果报错的话,先在/inlong-sort
目录下运行一遍这个(应该有更简单的方法,但是这个能用)
mvn --update-snapshots -e -V verify -pl :sort-core,:sort-end-to-end-tests-v1.15 -am -Pv1.15
然后应该可以跑起来了。以后如果还要测试的话,用这个就可以了。(当然如果改动了Test以外的文件还是要重新编译(记得用install)的~~)如何不完全编译
mvn clean install -pl :sort-connector-postgres-cdc -am
然后再运行刚才的test命令
STEP4 持久化测试的对话,方便观察指标以及执行自己的 SQL
但是现在有个问题就是Test运行完了container马上就关了,看不见打印出来的metric,也没法动数据库,让flink跑我们自己的数据(假设你没有动end2endtest的话数据库应该只处理了一点点数据),所以想办法让test完成之后不停下来就行,比如在你用的方法最后加上while(true){}
System.out.println("Test Paused");
while(true) {}
tips: container启动的速度可能非常慢,大概要一分钟,得耐心一点
如图:
![9cc1be46ccca48c1934733067279690b](file:///D:/%E5%9B%BE%E7%89%87/Typedown/9cc1be46-ccca-48c1-9347-33067279690b.png?msec=1726991091250)
Debug和注意点:
1. end2end测试报错
先确保在inlong-sort的根目录下运行:
mvn --update-snapshots -e -V verify -pl :sort-core,:sort-end-to-end-tests-v1.15 -am -Pv1.15
如果显示什么connector没有,就从target里面弄出来放进去,但是一般全量编译之后都会有。只有driver确实可能会没有,需要自行网上下载jar包,比如postgres-driver.jar
如下:
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.131 s <<< FAILURE! - in org.apache.inlong.sort.tests.Postgres2ClickHouseTest
[ERROR] org.apache.inlong.sort.tests.Postgres2ClickHouseTest Time elapsed: 0.131 s <<< ERROR!
java.lang.ExceptionInInitializerError
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: No resource file could be found that matches the pattern postgres-driver.jar. This could mean that the test module must be rebuilt via maven.
at org.apache.inlong.sort.tests.utils.TestUtils.getResource(TestUtils.java:72)
at org.apache.inlong.sort.tests.Postgres2ClickHouseTest.<clinit>(Postgres2ClickHouseTest.java:56)
... 33 more
Caused by: java.io.FileNotFoundException: No resource file could be found that matches the pattern postgres-driver.jar. This could mean that the test module must be rebuilt via maven.
... 35 more
2. 自定义指标没有注册(大概率会发生的问题)
如果发现自定义指标没有打印出来,基本上是因为connector new一个SourceExactlyMetric
的时候只传了MetricOption
,没有传MetricGroup
,然后只传MetricOption
的构造函数不会注册Metric。文松佬说是因为以前不需要这些Metric,所以搞了个不会注册指标的构造函数。自己传一个MetricGroup
即可
P.S. 一般一个connector共用一个sourceExactlyMetric对象。
3. transient关键字的注意点
注意sourceFunction会在jobmanager中被创建实例,然后序列化后传到各个taskManager中反序列化。
此时transient关键字修饰的成员变量会被忽略,即使使用的是private final transient Map<Long, Long> map = new HashMap(), map最终也会是null
关于是否对成员变量用transient关键字修饰:
van:flink是这样, 在jobmanager 构造好 source sink function, 然后序列化传给每个taskmanager, 然后taskmanager 再调用 open 方法进行初始化
van:所以如果你是一些需要taskmanager本地信息的字段, 或者要构造网络连接之类的操作, 需要放在open里面进行初始化, 这些字段可以配置成transient
van:其他有些字段,比如你在jobmanager 构造 sourcesink function的时候需要从启动参数拿的,就不用transient,在构造函数里面配置上,序列化到tm来
4. 创建metric对象的时候的注意点
创建metric对象的时候需要传入MetricOption,注意一定要判断MetricOption非空再创建metric,否则调用metricOptions.getLbels()
的时候会报空指针
使用sourceExactlyMetric对象的时候也一定要判断非空,否则如果用户的sql中如果没有inlong.metric.labels的话,sourceExactlyMetric是null,导致空指针异常