(二)Flink滑动窗口单词统计实例
需求一:通过socket实时产生单词,使用flink实时接收数据,对指定时间窗口内(例如:2秒)的数据进行聚合统计,并把时间窗口内计算的结果打印出来。
Flink程序开发步骤:
1、获得一个执行环境
2、加载/创建 初始化数据
3、指定操作数据的transaction算子
4、指定把计算好的数据放在哪里
5、调用execute()触发执行程序
注意:Flink程序是延迟计算的,只有最后调用execute()方法的时候才会真正出发执行程序。
延迟计算的好处是:你可以开发复杂的程序,但是Flink可以将复杂的程序转成一个Plan,将Plan作为一个整体单元执行!
实现代码:
Java版本:
import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.api.java.utils.ParameterTool; import org.apache.flink.contrib.streaming.state.RocksDBStateBackend; import org.apache.flink.runtime.state.filesystem.FsStateBackend; import org.apache.flink.runtime.state.memory.MemoryStateBackend; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.datastream.DataStreamSource; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.windowing.time.Time; import org.apache.flink.util.Collector; /** * 滑动窗口计算 * * 通过socket模拟产生单词数据 * flink对数据进行统计计算 * * 需要实现每隔1秒对最近2秒内的数据进行汇总计算 * */ public class SocketWindowWordCountJava { public static void main(String[] args) throws Exception{ //获取需要的端口号 int port; try { ParameterTool parameterTool = ParameterTool.fromArgs(args); port = parameterTool.getInt("port"); }catch (Exception e){ System.err.println("No port set. use default port 9000--java"); port = 9000; } //获取flink的运行环境 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); String hostname = "hadoop100"; String delimiter = "\n"; //连接socket获取输入的数据 DataStreamSource<String> text = env.socketTextStream(hostname, port, delimiter); // a a c // a 1 // a 1 // c 1 DataStream<WordWithCount> windowCounts = text.flatMap(new FlatMapFunction<String, WordWithCount>() { public void flatMap(String value, Collector<WordWithCount> out) throws Exception { String[] splits = value.split("\\s"); for (String word : splits) { out.collect(new WordWithCount(word, 1L)); } } }).keyBy("word") .timeWindow(Time.seconds(2), Time.seconds(1))//指定时间窗口大小为2秒,指定时间间隔为1秒 .sum("count");//在这里使用sum或者reduce都可以 /*.reduce(new ReduceFunction<WordWithCount>() { public WordWithCount reduce(WordWithCount a, WordWithCount b) throws Exception { return new WordWithCount(a.word,a.count+b.count); } })*/ //把数据打印到控制台并且设置并行度 windowCounts.print().setParallelism(1); //这一行代码一定要实现,否则程序不执行 env.execute("Socket window count"); } public static class WordWithCount{ public String word; public long count; public WordWithCount(){} public WordWithCount(String word,long count){ this.word = word; this.count = count; } @Override public String toString() { return "WordWithCount{" + "word='" + word + '\'' + ", count=" + count + '}'; } } }
Scala版本:
后续补充......
转载请注明地址: https://www.cnblogs.com/wynjauu/articles/10542807.html