有的时候,我们需要创建有环执行流图,比如将一些处理过后还不满足条件的数据,返回到最开始重新处理。
之前在做的时候,会考虑将处理后还不满足的数据,写入到单独的 Topic 中重新消费处理
今天发现 Flink Iterate 算子,发现也能满足需求
官网介绍: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/
Creates a "feedback" loop in the flow, by redirecting the output of one operator to some previous operator. This is especially useful for defining algorithms that continuously update a model. The following code starts with a stream and applies the iteration body continuously. Elements that are greater than 0 are sent back to the feedback channel, and the rest of the elements are forwarded downstream.
通过将一个算子的输出重定向到某个先前的算子,在流中创建“feedback”循环。 这对于定义不断更新模型的算法特别有用。 以下代码从流开始,并连续应用迭代主体。 大于0的元素将被发送回反馈通道,其余元素将被转发到下游。
官网 Demo
// 创建 IterativeStream IterativeStream<Long> iteration = initialStream.iterate(); // 迭代操作 DataStream<Long> iterationBody = iteration.map (/*do something*/); // filter 过滤需要返回的内容 DataStream<Long> feedback = iterationBody.filter(new FilterFunction<Long>(){ @Override public boolean filter(Long value) throws Exception { // 满足条件的反馈 return value > 0; } }); // 将 feedback 流 反馈到 iteration 流中 iteration.closeWith(feedback); // 输出部分 DataStream<Long> output = iterationBody.filter(new FilterFunction<Long>(){ @Override public boolean filter(Long value) throws Exception { // 满足条件的输出 return value <= 0; } });
Scala Demo
业务场景:基于 Key 的窗口求和,如果窗口结果不满足条件,就重新进入窗口,再求和
object FeedbackStreamDemo { def main(args: Array[String]): Unit = { // environment val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment env.setParallelism(1) val source = env.addSource(new SimpleStringSource) val mapStream = source.map(str => { val arr = str.split(",") println("map : " + str) (arr(0), arr(1).toLong) }) .disableChaining() val itStrema = mapStream.iterate(ds => { // 迭代过程 val dsMap = ds.map(str => { (str._1, str._2 + 1) }) .keyBy(_._1) .window(TumblingProcessingTimeWindows.of(Time.seconds(10))) .process(new ProcessWindowFunction[(String, Long), (String, Long), String, TimeWindow] { override def process(key: String, context: Context, elements: Iterable[(String, Long)], out: Collector[(String, Long)]): Unit = { // process 简单的窗口求和 val it = elements.toIterator var sum = 0l while (it.hasNext) { val current = it.next() sum = sum + current._2 } out.collect(key, sum) } }) // 反馈分支:窗口输出数据小于 500,反馈到 mapStream,重新窗口求和 (dsMap.filter(s => { s._2 < 500 }) , // 输出分支:大于等于 500 的就处理完了,直接输出 dsMap.filter(s => { s._2 >= 500 }) ) }) .disableChaining() itStrema.print("result:") env.execute("FeedbackStreamDemo") } }
欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文