Apache/Inlong Sort指标监控埋点开发心得

wsl启动flink,但是windows无法访问localhost:8081 taskmanager,
原因:wsl2是基于hyper-v的虚拟机,采用的是新一套的NAT方案,较为独立。所以对于wsl2也要用和普通虚拟机一样的网络方案,如NAT转发和桥接。
方法:
%USERPROFILE%增加.wslconfig配置文件,
[experimental]
networkingMode=mirrored
dnsTunneling=true
firewall=true
autoProxy=true
hostAddressLoopback=true

这样搞还有一个好处就是不用搞clash代理了

Sort埋点到底要干什么,这是一个问题

1. 数据处理异常指标

  • 反序列化失败次数 (deserializationErrors):

    • 描述:记录数据从字节流反序列化为对象时失败的次数。这有助于发现数据格式错误或兼容性问题。
  • 写入失败次数 (writeFailures):

    • 描述:记录数据写入下游存储系统(如 Kafka、HBase 等)时的失败次数,帮助监控目标系统的稳定性或网络问题。
  • 数据转换错误 (dataTransformationErrors):

    • 描述:记录在数据转换过程中(如格式转换、类型转换)发生的错误次数,帮助检测数据清洗和处理中的问题。
  • 脏数据记录 (dirtyRecords):

    • 描述:记录因格式不符或内容异常而无法处理的数据条目数。这类数据可以帮助识别和排查不合规的数据源。

2. Checkpoint 指标

  • Checkpoint 成功次数 (checkpointSuccesses):

    • 描述:记录成功完成的 Checkpoint 数量,有助于监控作业的稳定性和数据一致性。
  • Checkpoint 失败次数 (checkpointFailures):

    • 描述:记录由于超时或资源不足等原因导致的 Checkpoint 失败次数。通过监控这个指标,可以提前预警潜在的系统瓶颈。
  • 最后一个 Checkpoint 的耗时 (lastCheckpointDuration):

    • 描述:记录最后一个成功完成的 Checkpoint 所花费的时间,可以帮助优化 Checkpoint 的间隔和配置。
  • 当前未完成的 Checkpoint 数量 (pendingCheckpoints):

    • 描述:记录当前正在进行但尚未完成的 Checkpoint 数量,以监控系统的压力和潜在的延迟。

3. 性能和延迟指标

  • 处理延迟 (processingLatency):

    • 描述:记录从数据进入系统到被处理完成的延迟时间,帮助分析处理效率和性能瓶颈。
  • 消息堆积情况 (messageBacklog):

    • 描述:记录消息在队列中的堆积数量或字节数,可以帮助判断系统是否存在处理滞后的问题。

4. 资源使用情况指标

  • 内存使用情况 (memoryUsage):

    • 描述:记录作业内存的使用情况,包括堆内存和非堆内存的占用,帮助监控和优化资源配置。
  • CPU 使用率 (cpuUsage):

    • 描述:记录作业的 CPU 使用率,有助于判断系统的计算压力。
  • 网络 I/O 使用情况 (networkIOUsage):

    • 描述:记录作业的网络输入输出数据量,帮助监控数据传输的性能和瓶颈。

5. 自定义业务指标

  • 关键业务事件处理次数 (businessEventCount):
    • 描述:根据具体的业务需求,记录关键事件的处理次数,比如订单创建、支付成功等,以便于业务层面的监控。

这些指标可以分类到 CounterMeterGaugeHistogram 等不同的指标类型中

计数器是累积增加的指标类型,通常用于记录事件的发生次数。适用于以下指标:

  • 反序列化失败次数 (deserializationErrors)
    • 描述:记录反序列化失败的总次数。
  • 写入失败次数 (writeFailures)
    • 描述:记录写入下游系统失败的总次数。
  • 数据转换错误 (dataTransformationErrors)
    • 描述:记录数据转换过程中发生的错误总次数。
  • 脏数据记录 (dirtyRecords)
    • 描述:记录被标记为脏数据的条目总数。
  • Checkpoint 成功次数 (checkpointSuccesses)
    • 描述:记录成功完成的 Checkpoint 总次数。
  • Checkpoint 失败次数 (checkpointFailures)
    • 描述:记录由于各种原因导致的 Checkpoint 失败总次数。
  • 关键业务事件处理次数 (businessEventCount)
    • 描述:记录特定业务事件(如订单创建、支付成功等)的处理次数。

2. Meter(速率计)

Meter 用于度量事件的发生速率(如每秒多少次)。适用于以下指标:

  • 每秒输入/输出记录数 (numRecordsInPerSecond, numRecordsOutPerSecond)
    • 描述:每秒处理的记录数量。
  • 每秒输入/输出字节数 (numBytesInPerSecond, numBytesOutPerSecond)
    • 描述:每秒处理的数据字节数。
  • Checkpoint 成功/失败速率(派生自 checkpointSuccessescheckpointFailures
    • 描述:Checkpoint 成功或失败的速率。

3. Gauge(测量器)

Gauge 用于测量单一的即时值,通常用于监控系统的当前状态。适用于以下指标:

  • 当前未完成的 Checkpoint 数量 (pendingCheckpoints)
    • 描述:当前正在进行但尚未完成的 Checkpoint 数量。
  • 最后一个 Checkpoint 的耗时 (lastCheckpointDuration)
    • 描述:最后一个成功完成的 Checkpoint 耗费的时间。
  • 处理延迟 (processingLatency)
    • 描述:从数据进入到处理完成的时间延迟。
  • 内存使用情况 (memoryUsage)
    • 描述:当前作业的内存使用量(可以细分为堆内存和非堆内存)。
  • CPU 使用率 (cpuUsage)
    • 描述:当前作业的 CPU 使用率。
  • 消息堆积情况 (messageBacklog)
    • 描述:消息在队列中的当前堆积数量或字节数。
  • 网络 I/O 使用情况 (networkIOUsage)
    • 描述:当前网络 I/O 的使用情况。

4. Histogram(直方图)

Histogram 用于度量数据的分布情况,适用于记录多个数据点的分布。适用于以下指标:

  • 处理延迟分布(可从 processingLatency 扩展而来)
    • 描述:记录延迟的分布情况,可以了解系统的响应时间是否稳定。
  • Checkpoint 时间分布(可从 lastCheckpointDuration 扩展而来)
    • 描述:记录每个 Checkpoint 所花费时间的分布情况。

5. 其他分类

  • Timer(计时器):实际上是 MeterHistogram 的结合,用于测量事件的时间间隔(例如处理延迟)。在 Flink 中,它可以通过 GaugeHistogram 的组合来实现。

    进入taksmanager和jobmanager容器,conf/flink-conf.properties中增加

    metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
    metrics.reporter.slf4j.interval: 60 SECONDS
    

    使用slf4j日志打印flink日志

    我突然发现sort里面自己写的注册的counter比如private Counter dirtyRecordsOut;也会被打印出来

    那为什么还要自己实现PrintReporter呢,这是一个问题

    TaskManager日志

TaskManager日志
2024-09-01 22:41:16 2024-09-01 14:41:16,750 INFO  org.apache.flink.metrics.slf4j.Slf4jReporter                 [] - 
2024-09-01 22:41:16 =========================== Starting metrics report ===========================
2024-09-01 22:41:16 
2024-09-01 22:41:16 -- Counters -------------------------------------------------------------------
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBytesInLocal: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numRecordsOut: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numBytesIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numBytesIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBuffersInLocal: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBuffersInLocal: 252
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBuffersInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesOut: 11466
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesIn: 12248
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numBytesOutForMeter: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numRecordsOut: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numBytesIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numRecordsOut: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numRecordsOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numRecordsOutForMeter: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBytesInLocal: 12248
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBuffersInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numRecordsOut: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.dirtyBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numRecordsIn: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numRecordsInErrors: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesInLocal: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numRecordsIn: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numRecordsIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numBytesOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersInLocal: 252
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numRecordsIn: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersInLocal: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersOut: 5
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numBytesIn: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBytesInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.dirtyRecordsOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numRecordsOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numRecordsOut: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numRecordsIn: 246
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBytesInRemote: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesInLocal: 12248
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numRecordsIn: 498
2024-09-01 22:41:16 
2024-09-01 22:41:16 -- Gauges ---------------------------------------------------------------------
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.AvailableMemory: 134119424
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.currentOutputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Mapped.MemoryUsed: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.outPoolUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Output.Buffers.outputQueueSize: -4
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.isBackPressured: false
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.GarbageCollector.G1 Young Generation.Count: 6
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.softBackPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.currentOutputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.currentInputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.outPoolUsage: 0.1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Network.TotalMemorySegments: 4096
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.Buffers.inputExclusiveBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Direct.MemoryUsed: 135033857
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.inputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.maxHardBackPressureTimeMs: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.Buffers.inputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.ClassLoader.ClassesUnloaded: 57
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.checkpointAlignmentTime: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.GarbageCollector.G1 Young Generation.Time: 94
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.UsedMemorySegments: 3
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Network.AvailableMemorySegments: 4093
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.NonHeap.Used: 126174120
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.idleTimeMsPerSecond: 1000
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Flink.Memory.Managed.Used: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.maxHardBackPressureTimeMs: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.inputExclusiveBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Metaspace.Used: 84064344
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Output.Buffers.outPoolUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.Buffers.inputFloatingBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.sourceIdleTime: 293
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.inputExclusiveBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.NonHeap.Max: 780140544
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.TotalMemorySegments: 4096
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.hardBackPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.hardBackPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.GarbageCollector.G1 Old Generation.Time: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.Buffers.inputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.outputQueueLength: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Mapped.Count: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.AvailableMemorySegments: 4093
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.CPU.Load: 4.0944947186201034E-4
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.outputQueueLength: 1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Direct.Count: 4121
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Output.Buffers.outputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Output.Buffers.outputQueueLength: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.inputQueueLength: 1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.Buffers.inPoolUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.maxSoftBackPressureTimeMs: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.backPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Metaspace.Max: 268435456
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.busyTimeMsPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.GarbageCollector.G1 Old Generation.Count: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.maxSoftBackPressureTimeMs: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.RequestedMemoryUsage: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Heap.Used: 223104576
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.ClassLoader.ClassesLoaded: 14177
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.Buffers.inputQueueLength: 1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Heap.Max: 536870912
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.Buffers.inPoolUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.softBackPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.idleTimeMsPerSecond: 1000
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.outputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.UsedMemory: 98304
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Output.Buffers.outputQueueLength: 1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.isBackPressured: false
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.backPressuredTimeMsPerSecond: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Threads.Count: 119
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.Buffers.inputQueueLength: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.currentEmitEventTimeLag: -1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.outputQueueSize: -4
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.Buffers.inputExclusiveBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.currentInputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Flink.Memory.Managed.Total: 536870920
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.checkpointStartDelayNanos: 3000000
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.currentOutputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.currentFetchEventTimeLag: 64
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Heap.Committed: 536870912
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Metaspace.Committed: 87420928
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.currentInputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.NonHeap.Committed: 133357568
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.checkpointStartDelayNanos: 3000000
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.inputQueueLength: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.inputQueueSize: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.currentOutputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.inputFloatingBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.busyTimeMsPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Output.Buffers.outPoolUsage: 0.1
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.inputFloatingBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.buffers.inPoolUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.Shuffle.Netty.TotalMemory: 134217728
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.currentInputWatermark: -9223372036854775808
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.Buffers.inputFloatingBuffersUsage: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.CPU.Time: 33960000000
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Direct.TotalCapacity: 135033856
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.Status.JVM.Memory.Mapped.TotalCapacity: 0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.buffers.inPoolUsage: 0.0
2024-09-01 22:41:16 
2024-09-01 22:41:16 -- Meters ---------------------------------------------------------------------
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numRecordsInPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numRecordsOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numBytesInPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.groupId.group3.streamId.group3.nodeId.sink3.numRecordsOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numRecordsInPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numRecordsInPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numRecordsOutPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numRecordsInPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersInLocalPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesInPerSecond: 47.18333333333333
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersInLocalPerSecond: 0.9666666666666667
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBytesInLocalPerSecond: 47.18333333333333
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBuffersOutPerSecond: 0.016666666666666666
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBuffersInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numBytesInPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesInLocalPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBuffersInLocalPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numRecordsInPerSecond: 1.9166666666666667
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBuffersInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numRecordsOutPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numRecordsOutPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBytesInLocalPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBytesInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.Shuffle.Netty.Input.numBytesInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBuffersInRemotePerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4].0.numRecordsOutPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesOutPerSecond: 47.36666666666667
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.Shuffle.Netty.Input.numBuffersInLocalPerSecond: 0.9666666666666667
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.ConstraintEnforcer[4].0.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3] -> ConstraintEnforcer[4].0.numBytesInPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numBytesInLocalPerSecond: 47.18333333333333
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.SinkMaterializer[4] -> Sink: test_sink_table[4].0.numRecordsOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numBytesInPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numBytesOutPerSecond: 0.0
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Sink: test_sink_table[4].0.numRecordsInPerSecond: 0.95
2024-09-01 22:41:16 172_18_0_12.taskmanager.172_18_0_12:36367-7951bc.InLong-Sort-group3-group3.Source: table_source4[3].0.numBytesInPerSecond: 0.0
2024-09-01 22:41:16 
2024-09-01 22:41:16 -- Histograms -----------------------------------------------------------------
2024-09-01 22:41:16 
2024-09-01 22:41:16 =========================== Finished metrics report ===========================
2024-09-01 22:41:16 
JobManager日志
 2024-09-01 22:44:29 =========================== Starting metrics report ===========================
 2024-09-01 22:44:29 
 2024-09-01 22:44:29 -- Counters -------------------------------------------------------------------
 2024-09-01 22:44:29 
 2024-09-01 22:44:29 -- Gauges ---------------------------------------------------------------------
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.GarbageCollector.G1 Young Generation.Count: 4
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.GarbageCollector.G1 Old Generation.Count: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.taskSlotsAvailable: 1
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.numberOfFailedCheckpoints: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.GarbageCollector.G1 Old Generation.Time: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.initializingTime: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.NonHeap.Max: 780140544
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Mapped.TotalCapacity: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Metaspace.Committed: 80605184
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.totalNumberOfCheckpoints: 8
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointRestoreTimestamp: -1
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Direct.Count: 66
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.ClassLoader.ClassesLoaded: 13354
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.restartingTime: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.CPU.Time: 31700000000
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.CPU.Load: 2.020442735990302E-4
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointFullSize: 32085
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Direct.MemoryUsed: 168607432
 2024-09-01 22:44:29 jobmanager.jobmanager.numRegisteredTaskManagers: 1
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.ClassLoader.ClassesUnloaded: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.uptime: 451501
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.NonHeap.Used: 117687288
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.runningTime: 451501
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Metaspace.Max: 268435456
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Mapped.MemoryUsed: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.numberOfCompletedCheckpoints: 8
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Heap.Used: 447316088
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointProcessedData: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.failingTime: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.taskSlotsTotal: 2
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointPersistedData: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.NonHeap.Committed: 124903424
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Mapped.Count: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Heap.Committed: 1073741824
 2024-09-01 22:44:29 jobmanager.jobmanager.numRunningJobs: 1
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Direct.TotalCapacity: 168607431
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointExternalPath: <checkpoint-not-externally-addressable>
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Threads.Count: 161
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.cancellingTime: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Metaspace.Used: 77083448
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.createdTime: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.downtime: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointSize: 32085
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.GarbageCollector.G1 Young Generation.Time: 50
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.fullRestarts: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.deployingTime: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.Status.JVM.Memory.Heap.Max: 1073741824
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.numberOfInProgressCheckpoints: 0
 2024-09-01 22:44:29 jobmanager.jobmanager.InLong-Sort-group3-group3.lastCheckpointDuration: 7
 2024-09-01 22:44:29 
 2024-09-01 22:44:29 -- Meters ---------------------------------------------------------------------
 2024-09-01 22:44:29 
 2024-09-01 22:44:29 -- Histograms -----------------------------------------------------------------
 2024-09-01 22:44:29 
 2024-09-01 22:44:29 =========================== Finished metrics report ===========================
 2024-09-01 22:44:29 

最终实现

埋点分类

针对 InLong Sort Flink Connector 中的 metric 埋点策略,以下是 Postgres-CDC 中的埋点分类,并对应到代码中的具体变量名。此策略适用于大多数 Connector,但每个 Connector 可能需要根据具体业务逻辑做特定的调整。

1. 序列化/反序列化埋点

序列化和反序列化是数据处理中的核心环节。埋点可以帮助监控数据格式转换的性能瓶颈。

  • 埋点内容

    • 反序列化成功次数:每次反序列化成功时增加统计。

      • 代码变量:Counter numDeserializeSuccess
    • 反序列化失败次数:反序列化过程中出现异常时记录错误次数。

      • 代码变量:Counter numDeserializeError
    • 反序列化延迟:记录从反序列化开始到结束的耗时,帮助分析性能瓶颈。

      • 代码变量:Gauge<Long> deserializeTimeLag
    • 序列化成功失败次数:序列化已经实现在sinkMetricData中,但是未完全迁移到sinkExactlyMetric。invokeDirty可以记录脏数据条数和Bytes数,可以认为是序列化失败埋点,参考org.apache.inlong.sort.elasticsearch.table.ElasticsearchSinkFunctionBase#process对于脏数据的处理如果如果认领的Connector中有invokeDirty/invokeDirtyWithEstimate逻辑,记得从SinkMetricData中迁移到SinkExactlyMetric中。

    • 序列化延迟:考虑为涉及serialize的sink(如ElasticSearch)增加序列化延迟(Counter serializeTimeLag)()


2. SnapshotState 相关的埋点

Flink 的状态处理机制允许对 Connector 状态进行快照和恢复。监控这些状态可以帮助识别状态管理过程中可能的性能问题。

  • 埋点内容
    • 快照创建次数:每次执行 snapshotState 时记录快照的创建次数。
      • 代码变量:Counter numSnapshotCreate
    • 快照错误次数:在快照过程中出现错误时增加错误统计。
      • 代码变量:Counter numSnapshotError

3. 通知完成 (notifyComplete)

Flink 在完成某个批次或事务后,会触发 notifyComplete 回调,通知任务的完成状态。在这一步埋点,可以帮助确保事务或批次的正确完成。

  • 埋点内容
    • 完成的事务或批次数:统计每次 notifyComplete 调用时的成功事务或批次次数。
      • 代码变量:Counter numCompletedSnapshots
    • 处理时间:记录每次 notifyComplete 从调用到结束的处理时间,分析可能的延迟。
      • 代码变量:Gauge<Long> snapshotToCheckpointTimeLag

4. source 读取/sink 写入的条数及数据大小(可能已经实现)

监控数据的读取(source)或写入(sink)情况是至关重要的。通过埋点,可以统计读取和写入的数据条数和数据大小。

埋点内容读取条数:记录从 source 中读取的数据记录数。 - 代码变量:numRecordsIn 写入条数:记录 sink 中写入的数据记录数。 - 代码变量:numRecordsOut 读取/写入数据大小:记录每批读取或写入的数据的总大小,用于评估吞吐量。 - 代码变量:numBytesIn, numBytesOut


例子:Postgres-CDC v1.15

inlong-sort/sort-flink/sort-flink-v1.15/sort-connectors/postgres-cdc

1. 反序列化埋点

实现细节:

  • 反序列化时间:通过计算 deserialize 方法调用时的起始时间和结束时间,记录每次反序列化操作的时间延迟。

  • 反序列化成功/错误:反序列化成功后调用 incNumDeserializeSuccess() 增加成功计数,反序列化出错时调用 incNumDeserializeError() 记录错误次数。

  • 相关代码位置

    • RowDataDebeziumDeserializeSchema 中的 deserialize() 方法:

      @Override
      public void deserialize(SourceRecord record, Collector<RowData> out) throws Exception {
        long deserializeStartTime = System.currentTimeMillis();
        try {
            doDeserialize(record, out, deserializeStartTime);
        } catch (Exception e) {
            if (sourceExactlyMetric != null) {
                sourceExactlyMetric.incNumDeserializeError();
            }
            throw e;
        }
      }
      
      private void doDeserialize(SourceRecord record, Collector<RowData> out, long deserializeStartTime) throws Exception {
        // Perform deserialization logic...
        if (sourceExactlyMetric != null) {
            sourceExactlyMetric.incNumDeserializeSuccess();
            sourceExactlyMetric.recordDeserializeDelay(System.currentTimeMillis() - deserializeStartTime);
        }
      }
      

埋点内容:

  • 反序列化延迟:记录反序列化的耗时。
  • 反序列化成功/失败计数:统计反序列化成功和错误次数。

2. SnapshotState相关的埋点

实现细节:

  • SnapshotState的埋点:在每次 snapshotState 调用时,通过 sourceExactlyMetric 记录快照的创建次数以及在遇到错误时记录错误次数。

  • 相关代码位置

    • DebeziumSourceFunction 中的 snapshotState() 方法:

      @Override
      public void snapshotState(FunctionSnapshotContext functionSnapshotContext) throws Exception {
        try {
            doSnapshotState(functionSnapshotContext);
        } catch (Exception e) {
            if (sourceExactlyMetric != null) {
                sourceExactlyMetric.incNumSnapshotError();
            }
            throw e;
        }
      }
      
      private void doSnapshotState(FunctionSnapshotContext functionSnapshotContext) throws Exception {
         // snapshot代码
       // record the start time of each checkpoint
              Long checkpointId = functionSnapshotContext.getCheckpointId();
              if (checkpointStartTimeMap != null) {
                  checkpointStartTimeMap.put(checkpointId, System.currentTimeMillis());
              } else {
                  LOG.error("checkpointStartTimeMap is null, can't record the start time of checkpoint");
              }
              if (sourceExactlyMetric != null) {
                  sourceExactlyMetric.incNumSnapshotCreate();
              }
      }
      

埋点内容:

  • 快照创建次数:每次快照创建时调用 incNumSnapshotCreate()
  • 快照错误次数:如果快照过程中遇到错误,调用 incNumSnapshotError() 记录错误次数。

3. 通知Checkpoint完成 (notifyCheckpointComplete)

实现细节:

  • notifyComplete埋点:在 notifyCheckpointComplete 方法中埋点,统计每次检查点完成后的时间延迟,并记录成功的检查点数量。

  • 相关代码位置

    • DebeziumSourceFunction 中的 notifyCheckpointComplete() 方法:

      @Override
      public void notifyCheckpointComplete(long checkpointId) {
        // Record completed checkpoint and delay time
        // get the start time of the currently completed checkpoint
                  if (checkpointStartTimeMap != null) {
                      Long snapShotStartTimeById = checkpointStartTimeMap.remove(checkpointId);
                      if (snapShotStartTimeById != null && sourceExactlyMetric != null) {
                          sourceExactlyMetric.incNumCompletedSnapshots();
                          sourceExactlyMetric
                                  .recordSnapshotToCheckpointDelay(System.currentTimeMillis() - snapShotStartTimeById);
                      }
                  } else {
                      LOG.error("checkpointStartTimeMap is null, can't get the start time of checkpoint");
                  }
      }
      

说明:多重checkpoint和checkpoint重叠的情况下,可能一个checkpoint还没有完成另一个新的checkpoint就开始被创建,所以这里用map来管理checkpoint开始的时间(待确认合理性)

埋点内容:

  • 检查点完成次数:每当检查点完成时调用 incNumCompletedSnapshots()
  • 快照到检查点延迟:记录从快照开始到检查点完成的时间延迟,帮助分析系统性能。

效果展示

![e3a03ea57cdf477591c32e41d283d482](file:///D:/%E5%9B%BE%E7%89%87/Typedown/e3a03ea5-7cdf-4775-91c3-2e41d283d482.png?msec=1726991091239)

测试方法

v1.13已经废弃,1.18用于批处理,所以我们基本在sort-end-to-end-tests-v1.15编写测试。

STEP1: 在你需要使用的测试类对应的SQL中增加metric label

记得在flinksql中添加

    'inlong.metric.labels' = 'groupId=pggroup&streamId=pgStream&nodeId=pgNode',

把group/stream/nodeid换成需要的名字(名字不影响功能,但一定要有这个'inlong.metric.labels'

例子:

![f69fc634e35b48e5917cbd5d986b44bd](file:///D:/%E5%9B%BE%E7%89%87/Typedown/f69fc634-e35b-48e5-917c-bd5d986b44bd.png?msec=1726991091240)

STEP2 想办法让flink把metrics打印出来

(当然也可以用fllnk web dashboard里面的metrics)end2endTest里面的Test都继承了FlinkContainerTestEnv,观察代码发现FlinkContainerTestEnv启动了Flink环境,所以在这个类中增加打印metrics的配置即可。具体就是在private static final String FLINK_PROPERTIES中增加以下配置(注意格式),如图所示。然后就可以在Flink容器中看到metrics的日志了。🥳

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter

metrics.reporter.slf4j.interval: 5 SECONDS

![fa9dd42f2f0043b48ecbf99fcfb29588](file:///D:/%E5%9B%BE%E7%89%87/Typedown/fa9dd42f-2f00-43b4-8ecb-f99fcfb29588.png?msec=1726991091250)

STEP3 运行End2EndTest

进入sort-end-to-end-tests-v1.15文件夹,运行

mvn test -Dtest=Postgres2StarRocksTest

把-DTest=后面的类换成你写的/想要使用的end2end测试类


运行test报错的话看这里

从GitHub Action上弄下来的原始测试方法,如果报错的话,先在/inlong-sort目录下运行一遍这个(应该有更简单的方法,但是这个能用)

 mvn --update-snapshots -e -V verify -pl :sort-core,:sort-end-to-end-tests-v1.15 -am -Pv1.15

然后应该可以跑起来了。以后如果还要测试的话,用这个就可以了。(当然如果改动了Test以外的文件还是要重新编译(记得用install)的~~)如何不完全编译

mvn clean install -pl :sort-connector-postgres-cdc -am

然后再运行刚才的test命令


STEP4 持久化测试的对话,方便观察指标以及执行自己的 SQL

但是现在有个问题就是Test运行完了container马上就关了,看不见打印出来的metric,也没法动数据库,让flink跑我们自己的数据(假设你没有动end2endtest的话数据库应该只处理了一点点数据),所以想办法让test完成之后不停下来就行,比如在你用的方法最后加上while(true){}

        System.out.println("Test Paused");
        while(true) {}

tips: container启动的速度可能非常慢,大概要一分钟,得耐心一点

如图:

![9cc1be46ccca48c1934733067279690b](file:///D:/%E5%9B%BE%E7%89%87/Typedown/9cc1be46-ccca-48c1-9347-33067279690b.png?msec=1726991091250)

Debug和注意点:

1. end2end测试报错

先确保在inlong-sort的根目录下运行:

mvn --update-snapshots -e -V verify -pl :sort-core,:sort-end-to-end-tests-v1.15 -am -Pv1.15

如果显示什么connector没有,就从target里面弄出来放进去,但是一般全量编译之后都会有。只有driver确实可能会没有,需要自行网上下载jar包,比如postgres-driver.jar

如下:

[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.131 s <<< FAILURE! - in org.apache.inlong.sort.tests.Postgres2ClickHouseTest
[ERROR] org.apache.inlong.sort.tests.Postgres2ClickHouseTest  Time elapsed: 0.131 s  <<< ERROR!
java.lang.ExceptionInInitializerError
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: No resource file could be found that matches the pattern postgres-driver.jar. This could mean that the test module must be rebuilt via maven.
        at org.apache.inlong.sort.tests.utils.TestUtils.getResource(TestUtils.java:72)
        at org.apache.inlong.sort.tests.Postgres2ClickHouseTest.<clinit>(Postgres2ClickHouseTest.java:56)
        ... 33 more
Caused by: java.io.FileNotFoundException: No resource file could be found that matches the pattern postgres-driver.jar. This could mean that the test module must be rebuilt via maven.
        ... 35 more

2. 自定义指标没有注册(大概率会发生的问题)

如果发现自定义指标没有打印出来,基本上是因为connector new一个SourceExactlyMetric的时候只传了MetricOption,没有传MetricGroup,然后只传MetricOption的构造函数不会注册Metric。文松佬说是因为以前不需要这些Metric,所以搞了个不会注册指标的构造函数。自己传一个MetricGroup即可

P.S. 一般一个connector共用一个sourceExactlyMetric对象。

3. transient关键字的注意点

注意sourceFunction会在jobmanager中被创建实例,然后序列化后传到各个taskManager中反序列化。

此时transient关键字修饰的成员变量会被忽略,即使使用的是private final transient Map<Long, Long> map = new HashMap(), map最终也会是null

关于是否对成员变量用transient关键字修饰:

van:flink是这样, 在jobmanager 构造好 source sink function, 然后序列化传给每个taskmanager, 然后taskmanager 再调用 open 方法进行初始化

van:所以如果你是一些需要taskmanager本地信息的字段, 或者要构造网络连接之类的操作, 需要放在open里面进行初始化, 这些字段可以配置成transient

van:其他有些字段,比如你在jobmanager 构造 sourcesink function的时候需要从启动参数拿的,就不用transient,在构造函数里面配置上,序列化到tm来

4. 创建metric对象的时候的注意点

创建metric对象的时候需要传入MetricOption,注意一定要判断MetricOption非空再创建metric,否则调用metricOptions.getLbels()的时候会报空指针

使用sourceExactlyMetric对象的时候也一定要判断非空,否则如果用户的sql中如果没有inlong.metric.labels的话,sourceExactlyMetric是null,导致空指针异常

posted @ 2024-08-27 22:12  peterzh6  阅读(17)  评论(0编辑  收藏  举报