Flink 合流操作——CoProcessFunction
CoProcessFunction 简介
对于连接流ConnectedStreams的处理操作,需要分别定义对两条流的处理转换,因此接口中就会有两个相同的方法需要实现,用数字“1”“2”区分,在两条流中的数据到来时分别调用。我们把这种接口叫作“协同处理函数”(co-process function)。与CoMapFunction类似,如果是调用.flatMap()就需要传入一个CoFlatMapFunction,需要实现flatMap1()、flatMap2()两个方法;而调用.process()时,传入的则是一个CoProcessFunction。抽象类CoProcessFunction在源码中定义如下:
@PublicEvolving public abstract class CoProcessFunction<IN1, IN2, OUT> extends AbstractRichFunction { private static final long serialVersionUID = 1L; /** * This method is called for each element in the first of the connected streams. * * <p>This function can output zero or more elements using the {@link Collector} parameter and * also update internal state or set timers using the {@link Context} parameter. * * @param value The stream element * @param ctx A {@link Context} that allows querying the timestamp of the element, querying the * {@link TimeDomain} of the firing timer and getting a {@link TimerService} for registering * timers and querying the time. The context is only valid during the invocation of this * method, do not store it. * @param out The collector to emit resulting elements to * @throws Exception The function may throw exceptions which cause the streaming program to fail * and go into recovery. */ public abstract void processElement1(IN1 value, Context ctx, Collector<OUT> out) throws Exception; /** * This method is called for each element in the second of the connected streams. * * <p>This function can output zero or more elements using the {@link Collector} parameter and * also update internal state or set timers using the {@link Context} parameter. * * @param value The stream element * @param ctx A {@link Context} that allows querying the timestamp of the element, querying the * {@link TimeDomain} of the firing timer and getting a {@link TimerService} for registering * timers and querying the time. The context is only valid during the invocation of this * method, do not store it. * @param out The collector to emit resulting elements to * @throws Exception The function may throw exceptions which cause the streaming program to fail * and go into recovery. */ public abstract void processElement2(IN2 value, Context ctx, Collector<OUT> out) throws Exception; /** * Called when a timer set using {@link TimerService} fires. * * @param timestamp The timestamp of the firing timer. * @param ctx An {@link OnTimerContext} that allows querying the timestamp of the firing timer, * querying the {@link TimeDomain} of the firing timer and getting a {@link TimerService} * for registering timers and querying the time. The context is only valid during the * invocation of this method, do not store it. * @param out The collector for returning result values. * @throws Exception This method may throw exceptions. Throwing an exception will cause the * operation to fail and may trigger recovery. */ public void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out) throws Exception {} /** * Information available in an invocation of {@link #processElement1(Object, Context, * Collector)}/ {@link #processElement2(Object, Context, Collector)} or {@link #onTimer(long, * OnTimerContext, Collector)}. */ public abstract class Context { /** * Timestamp of the element currently being processed or timestamp of a firing timer. * * <p>This might be {@code null}, for example if the time characteristic of your program is * set to {@link org.apache.flink.streaming.api.TimeCharacteristic#ProcessingTime}. */ public abstract Long timestamp(); /** A {@link TimerService} for querying time and registering timers. */ public abstract TimerService timerService(); /** * Emits a record to the side output identified by the {@link OutputTag}. * * @param outputTag the {@code OutputTag} that identifies the side output to emit to. * @param value The record to emit. */ public abstract <X> void output(OutputTag<X> outputTag, X value); } /** * Information available in an invocation of {@link #onTimer(long, OnTimerContext, Collector)}. */ public abstract class OnTimerContext extends Context { /** The {@link TimeDomain} of the firing timer. */ public abstract TimeDomain timeDomain(); } }
可以看到,很明显CoProcessFunction也是“处理函数”家族中的一员,用法非常相似。它需要实现的就是processElement1()、processElement2()两个方法,在每个数据到来时,会根据来源的流调用其中的一个方法进行处理。CoProcessFunction同样可以通过上下文ctx来访问timestamp、水位线,并通过TimerService注册定时器;另外也提供了.onTimer()方法,用于定义定时触发的处理操作。下面是CoProcessFunction的一个具体示例:我们可以实现一个实时对账的需求,也就是app的支付操作和第三方的支付操作的一个双流Join。App的支付事件和第三方的支付事件将会互相等待5秒钟,如果等不来对应的支付事件,那么就输出报警信息.
参考代码
/** * 实时对账 demo */ public class BillCheckExample0828 { public static void main(String[] args) throws Exception { //1、获取执行环境 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); //1.1、便于测试,测试环境设置并行度为 1,生产环境记得设置为 kafka topic 的分区数 env.setParallelism(1); //2、读取数据 并 声明水位线 //2.1、模拟来自app 的数据 appStream SingleOutputStreamOperator<Tuple3<String, String, Long>> appStream = env.fromElements( Tuple3.of("order-1", "app", 1000L), Tuple3.of("order-2", "app", 2000L), Tuple3.of("order-3", "app", 3500L) ).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple3<String, String, Long>>forBoundedOutOfOrderness(Duration.ZERO) .withTimestampAssigner(new SerializableTimestampAssigner<Tuple3<String, String, Long>>() { @Override public long extractTimestamp(Tuple3<String, String, Long> element, long recordTimestamp) { return element.f2; } })); //2.2、模拟来自第三方支付平台的数据 SingleOutputStreamOperator<Tuple4<String, String, String, Long>> thirdPartStream = env.fromElements( Tuple4.of("order-1", "third-party", "success", 3000L), Tuple4.of("order-3", "third-party", "success", 4000L) ).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple4<String, String, String, Long>>forBoundedOutOfOrderness(Duration.ZERO) .withTimestampAssigner(new SerializableTimestampAssigner<Tuple4<String, String, String, Long>>() { @Override public long extractTimestamp(Tuple4<String, String, String, Long> element, long recordTimestamp) { return element.f3; } })); //3、调用实现 CoProcessFunction 的静态类 检查同一支付单,是否两条流种是否匹配 appStream.connect(thirdPartStream).keyBy(data -> data.f0, data -> data.f0) .process(new OrderMatchResult0828()) .print(); env.execute(); } /** * 自定义实现 CoProcessFunction */ public static class OrderMatchResult0828 extends CoProcessFunction<Tuple3<String, String, Long>, Tuple4<String, String, String, Long>, String> { //定义状态,保存已经到达的状态 private ValueState<Tuple3<String, String, Long>> appEventState; private ValueState<Tuple4<String, String, String, Long>> thirdPartyEventState; @Override public void open(Configuration parameters) throws Exception { appEventState = getRuntimeContext().getState( new ValueStateDescriptor<Tuple3<String, String, Long>>("app-state", Types.TUPLE(Types.STRING, Types.STRING, Types.LONG)) ); thirdPartyEventState = getRuntimeContext().getState( new ValueStateDescriptor<Tuple4<String, String, String, Long>>("thirt-party-state", Types.TUPLE(Types.STRING, Types.STRING, Types.STRING, Types.LONG)) ); } @Override public void processElement1(Tuple3<String, String, Long> value, Context ctx, Collector<String> out) throws Exception { //来的时 app 数据,查看 第三方数据是否来过 if (thirdPartyEventState.value() != null) { out.collect("对账成功:" + value + " " + thirdPartyEventState.value()); //对账成功后可以清空状态 thirdPartyEventState.clear(); } else { //更新状态 更新 app appEventState.update(value); //定义注册定时器,等待另一条流的数据 ctx.timerService().registerEventTimeTimer(value.f2 + 5000L); //等待 5s } } @Override public void processElement2(Tuple4<String, String, String, Long> value, Context ctx, Collector<String> out) throws Exception { //来的时 app 数据,查看 第三方数据是否来过 if (appEventState.value() != null) { out.collect("对账成功:" + appEventState.value() + " " + value); //对账成功后可以清空状态 appEventState.clear(); } else { //更新状态 更新 app thirdPartyEventState.update(value); //定义注册定时器,等待另一条流的数据 ctx.timerService().registerEventTimeTimer(value.f3 + 5000L); //等待 5s } } //定时触发 @Override public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception { //如果某个状态不为空,说明另一方流差数据 if (appEventState.value() != null) { out.collect("对账失败 " + appEventState.value() + " 第三方差数据"); } if (thirdPartyEventState.value() != null) { out.collect("对账失败 " + thirdPartyEventState.value() + " app差数据"); } //清空数据 appEventState.clear(); thirdPartyEventState.clear(); } } }
运行效果
对账成功:(order-1,app,1000) (order-1,third-party,success,3000) 对账成功:(order-3,app,3500) (order-3,third-party,success,4000) 对账失败 (order-2,app,2000) 第三方差数据
分类:
flink
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· 一文读懂知识蒸馏
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下