Flink自定义Assigning Timestamps和Watermarks 使用Scal语言
为了让event time工作,Flink需要知道事件的时间戳,这意味着流中的每个元素都需要分配其事件时间戳。这个通常是通过抽取或者访问事件中某些字段的时间戳来获取的。时间戳的分配伴随着水印的生成,告诉系统事件时间中的进度。下面介绍几种自定义事件时间戳方法
1.在数据流源中定义
可以看Flink静态Session Windows这边文章里面有
2.使用DataStream API中的assignAscendingTimestamps来指定时间戳。其中系统默认用此时间戳创建Watermark。注意::数据源任务中的时间戳是递增的,这是很必要的。
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val input=env.fromCollection(List(("a",1L),("b",1L),("b",5L),("b",5L)))
val timeWindow=input.assignAscendingTimestamps(t=>t._2)
val result=timeWindow.keyBy(0).timeWindow(Time.milliseconds(4)).sum("_2")
result.print()
env.execute()
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
结果:
3.实现BoundedOutOfOrdernessTimestampExtractor类
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.getConfig.setAutoWatermarkInterval(90000)
val input=env.fromCollection(List(("b",1L),("b",2L),("b",3L),("b",4L),("b",5L),("b",6L),("b",7L),("b",8L),("b",9L)))
val timeWindow=input.assignTimestampsAndWatermarks(
new BoundedOutOfOrdernessTimestampExtractor[(String, Long)](Time.milliseconds(1)) {
override def extractTimestamp(element: (String, Long)): Long = element._2
})
val result=timeWindow.keyBy(0).timeWindow(Time.milliseconds(4)).sum("_2")
result.print()
env.execute()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
结果:
注意:此类有一个参数Time.milliseconds(1),代表最长的时延1ms。可以查看源码
4实现AssignerWithPeriodicWatermarks接口
def main(args: Array[String]) {
val params = ParameterTool.fromArgs(args)
val senv = StreamExecutionEnvironment.getExecutionEnvironment
senv.getConfig.setAutoWatermarkInterval(900000)
senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val text = senv.socketTextStream("localhost", 9999)
.assignTimestampsAndWatermarks(new TimestampExtractor)
val counts = text.map {(m: String) => (m.split(",")(0), 1) }
.keyBy(0)
.timeWindow(Time.milliseconds(10))
.sum(1)
counts.print
senv.execute("EventTime processing example")
}
class TimestampExtractor extends AssignerWithPeriodicWatermarks[String] with Serializable {
private var currentMaxTimestamp = 0L
private val maxOutOfOrderness = 3l
override def extractTimestamp(e: String, prevElementTimestamp: Long) = {
val timestamp=e.split(",")(1).toLong
// println( e.split(",")(1).toLong)
currentMaxTimestamp = Math.max(prevElementTimestamp,timestamp)
e.split(",")(1).toLong
}
override def getCurrentWatermark(): Watermark = {
println(currentMaxTimestamp-maxOutOfOrderness)
new Watermark(currentMaxTimestamp-maxOutOfOrderness)
}
}
}
override def getCurrentWatermark(): Watermark = {
println(currentMaxTimestamp-maxOutOfOrderness)
new Watermark(currentMaxTimestamp-maxOutOfOrderness)
}
输入:
结果:
注意:1.窗口触发需要要满足两个条件:1.watermark>=window_end_time,2,此窗口内有数据。
2.同时也说明watermark对window的分段之间没有关系,比如输入(a,13),(a,12),(a,16)都在10ms~20ms窗口内
5.实现AssignerWithPunctuatedWatermarks接口
def main(args: Array[String]) {
// Checking input parameters
val params = ParameterTool.fromArgs(args)
// set up the execution environment
val senv = StreamExecutionEnvironment.getExecutionEnvironment
senv.getConfig.setAutoWatermarkInterval(900000)
senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val text = senv.socketTextStream("localhost", 9999)
.assignTimestampsAndWatermarks(new TimestampExtractor)
val counts = text.map {(m: String) => (m.split(",")(0), 1) }
.keyBy(0)
.timeWindow(Time.milliseconds(10))
.sum(1)
counts.print
senv.execute("EventTime processing example")
}
class TimestampExtractor extends AssignerWithPunctuatedWatermarks[String] with Serializable {
override def checkAndGetNextWatermark(lastElement: String, extractedTimestamp: Long): Watermark = {
if(lastElement.split(",")(1).toLong%2==0)
{
println(extractedTimestamp)
new Watermark(extractedTimestamp)
}
else null
}
override def extractTimestamp(element: String, previousElementTimestamp: Long): Long ={
element.split(",")(1).toLong
}
}
// Checking input parameters
val params = ParameterTool.fromArgs(args)
// set up the execution environment
val senv = StreamExecutionEnvironment.getExecutionEnvironment
senv.getConfig.setAutoWatermarkInterval(900000)
senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val text = senv.socketTextStream("localhost", 9999)
.assignTimestampsAndWatermarks(new TimestampExtractor)
val counts = text.map {(m: String) => (m.split(",")(0), 1) }
.keyBy(0)
.timeWindow(Time.milliseconds(10))
.sum(1)
counts.print
senv.execute("EventTime processing example")
override def checkAndGetNextWatermark(lastElement: String, extractedTimestamp: Long): Watermark = {
if(lastElement.split(",")(1).toLong%2==0)
{
println(extractedTimestamp)
new Watermark(extractedTimestamp)
}
else null
}
override def extractTimestamp(element: String, previousElementTimestamp: Long): Long ={
element.split(",")(1).toLong
}
结果:
总结:其中2~4是固定时延间隔指定timestamps和watermark,5是根据事件的特殊条件。
从中可以看出watermark的含义是在固定时延间隔乱序,整体是有序的。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· .NET10 - 预览版1新功能体验(一)