flink处理延迟

flink处理延迟

flink主要是处理实时数据的,在处理实时数据的过程中,难免会遇到乱序的存在。以事件时间举例,先发生的事件后到处理算子。flink针对乱序数据的处理主要有三种方式:

  • 拨慢水位线的生成,这种情况会在声明的窗口时间中,类似延迟窗口时间的大小,实际是把水位线的生成减小了1秒,导致窗口延迟关闭。下面的例子声明创建了一个滚动事件时间窗口,有效期是5秒,但是在生成水位线的时候,会拨慢1秒,如果是1000毫秒开始,实际计算触发的时间是6000毫秒时触发,窗口有效期是[0,4999)
.assignTimestampsAndWatermarks(WatermarkStrategy.
                        <Tuple3<String, String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(1))
                        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple3<String, String, Long>>() {
                            @Override
                            public long extractTimestamp(Tuple3<String, String, Long> element, long recordTimestamp) {
                                return element.f2;
                            }
                        })
                ).keyBy(data -> data.f0)
                .window(TumblingEventTimeWindows.of(Time.seconds(5)))

Bob,成都,1000
Bob,成都,4000
Bob,成都,5000
Bob,成都,6000
 窗口 0 - 5000 中共有 2 个元素,窗口关闭时,当前水位线4999
  • allowedLateness延长窗口时间
    上面的水位线到达临界点时,触发计算,触发计算的同时,关闭窗口,再迟到的数据就不再处理了。allowedLateness的原理就是延长窗口的关闭时间,水位线到点了,触发计算,但是窗口暂时不会关闭在allowedLateness周期内,再迟到的数据仍然允许再次处理
ds.map(new MapFunction<String, Tuple3<String,String,Long>>() {
                    @Override
                    public Tuple3<String, String, Long> map(String value) throws Exception {
                        String[] split = value.split(",");

                        System.out.println(value);
                        return Tuple3.of(split[0],split[1],Long.valueOf(split[2]));
                    }
                }).assignTimestampsAndWatermarks(WatermarkStrategy.
                        <Tuple3<String, String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(1))
                        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple3<String, String, Long>>() {
                            @Override
                            public long extractTimestamp(Tuple3<String, String, Long> element, long recordTimestamp) {
                                return element.f2;
                            }
                        })
                ).keyBy(data -> data.f0)
                .window(TumblingEventTimeWindows.of(Time.seconds(5)))
                .allowedLateness(Time.minutes(1))

Bob,成都,1000
Bob,成都,3000
Bob,成都,5000
Bob,成都,6000
 窗口 0 - 5000 中共有 2 个元素,窗口关闭时,当前水位线4999
Bob,成都,16000
 窗口 5000 - 10000 中共有 2 个元素,窗口关闭时,当前水位线14999
Bob,成都,4000
 窗口 0 - 5000 中共有 3 个元素,窗口关闭时,当前水位线14999

  • sideOutputLateData侧输出流
    上面的两种方式,都是通过延迟水位线或者延长窗口的方式来处理的,实际处理过程中都会占据资源,不可能一直延迟水位线或者让窗口一直存在,在允许范围内的数据处理完毕之后,还得有一种兜底方案,处理极限情况,那就是直接把迟到的数据输出到侧输出流。
OutputTag<Tuple3<String, String, Long>> lateTag = new OutputTag<Tuple3<String, String, Long>>("late"){};
        env.setParallelism(1);
        SingleOutputStreamOperator<String> process = ds.map(new MapFunction<String, Tuple3<String, String, Long>>() {
                    @Override
                    public Tuple3<String, String, Long> map(String value) throws Exception {
                        String[] split = value.split(",");

                        System.out.println(value);
                        return Tuple3.of(split[0], split[1], Long.valueOf(split[2]));
                    }
                }).assignTimestampsAndWatermarks(WatermarkStrategy.
                        <Tuple3<String, String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(1))
                        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple3<String, String, Long>>() {
                            @Override
                            public long extractTimestamp(Tuple3<String, String, Long> element, long recordTimestamp) {
                                return element.f2;
                            }
                        })
                ).keyBy(data -> data.f0)
                .window(TumblingEventTimeWindows.of(Time.seconds(5)))
                .allowedLateness(Time.minutes(1))
                .sideOutputLateData(lateTag)

完整的测试代码如下

public class SideOutPutLateTest {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> ds = env.socketTextStream("hadoop103", 9999);

        OutputTag<Tuple3<String, String, Long>> lateTag = new OutputTag<Tuple3<String, String, Long>>("late"){};
        env.setParallelism(1);
        SingleOutputStreamOperator<String> process = ds.map(new MapFunction<String, Tuple3<String, String, Long>>() {
                    @Override
                    public Tuple3<String, String, Long> map(String value) throws Exception {
                        String[] split = value.split(",");

                        System.out.println(value);
                        return Tuple3.of(split[0], split[1], Long.valueOf(split[2]));
                    }
                }).assignTimestampsAndWatermarks(WatermarkStrategy.
                        <Tuple3<String, String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(1))
                        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple3<String, String, Long>>() {
                            @Override
                            public long extractTimestamp(Tuple3<String, String, Long> element, long recordTimestamp) {
                                return element.f2;
                            }
                        })
                ).keyBy(data -> data.f0)
                .window(TumblingEventTimeWindows.of(Time.seconds(5)))
                .allowedLateness(Time.minutes(1))
                .sideOutputLateData(lateTag)
                .process(new ProcessWindowFunction<Tuple3<String, String, Long>, String, String, TimeWindow>() {
                    @Override
                    public void process(String s, ProcessWindowFunction<Tuple3<String, String, Long>, String, String, TimeWindow>.Context context, Iterable<Tuple3<String, String, Long>> iterable, Collector<String> collector) throws Exception {
                        long start = context.window().getStart();
                        long end = context.window().getEnd();
                        long watermark = context.currentWatermark();
                        long count = iterable.spliterator().getExactSizeIfKnown();

                        collector.collect(" 窗口 " + start + " - " + end + " 中共有 " + count + " 个元素,窗口关闭时,当前水位线" +
                                +watermark);

                    }
                });
//        ds.print();
        process.print();
        process.getSideOutput(lateTag).print("迟到数据");
        env.execute();
    }
}

posted @   clouderzheng  阅读(1162)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
点击右上角即可分享
微信分享提示