|NO.Z.00008|——————————|BigDataEnd|——|Hadoop&Flink.V08|——|Flink.v08|快速应用|单词统计案例|流数据|Java版|
一、单词统计案例(流数据)
### --- 需求
~~~ Socket模拟实时发送单词,使用Flink实时接收数据,
~~~ 对指定时间窗口内(如5s)的数据进行聚合统计,每隔1s汇总计算一次,
~~~ 并且把时间窗口内计算结果打印出来。
二、编程代码实现
### --- 代码实现
package com.yanqi.java;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
public class WordCountJavaStream {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment executionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> dataStream = executionEnvironment.socketTextStream("hadoop01", 7777);
SingleOutputStreamOperator<Tuple2<String, Integer>> sum = dataStream.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
@Override
public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {
for (String word : s.split(" ")) {
collector.collect(new Tuple2<String, Integer>(word, 1));
}
}
}).keyBy(0).sum(1);
sum.print();
executionEnvironment.execute();
}
}
三、编译打印
### --- 准备资源文件
~~~ # IDEA配置:
——>IDEA Edit configrations——>Application:WordCounJavaStream——>
——>对勾:Include dependencies with “Provided” scope——>END
### --- 准备资源数据:在hadoop02通过nc发送数据
[root@hadoop01 ~]# nc -lp 7777
hello
you him she
hello # 再次写入hello,它会累加统计,数值会变成2
### --- 编译打印
D:\JAVA\jdk1.8.0_231\bin\java.exe "-javaagent:D:\IntelliJIDEA\IntelliJ IDEA 2019.3.3\lib\idea_rt.jar=61507:D:\IntelliJIDEA\IntelliJ IDEA 2019.3.3\bin" -Dfile.encoding=UTF-8 -classpath D:\JAVA\jdk1.8.0_231\jre\lib\charsets.jar;D:\JAVA\jdk1.8.0_231\jre\lib\deploy.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\access-bridge-64.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\cldrdata.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\dnsns.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\jaccess.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\jfxrt.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\localedata.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\nashorn.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\sunec.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\sunjce_provider.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\sunmscapi.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\sunpkcs11.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\zipfs.jar;D:\JAVA\jdk1.8.0_231\jre\lib\javaws.jar;D:\JAVA\jdk1.8.0_231\jre\lib\jce.jar;D:\JAVA\jdk1.8.0_231\jre\lib\jfr.jar;D:\JAVA\jdk1.8.0_231\jre\lib\jfxswt.jar;D:\JAVA\jdk1.8.0_231\jre\lib\jsse.jar;D:\JAVA\jdk1.8.0_231\jre\lib\management-agent.jar;D:\JAVA\jdk1.8.0_231\jre\lib\plugin.jar;D:\JAVA\jdk1.8.0_231\jre\lib\resources.jar;D:\JAVA\jdk1.8.0_231\jre\lib\rt.jar;E:\NO.Z.10000——javaproject\NO.Z.00002.Hadoop\FirstFlink\target\classes;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-java\1.11.1\flink-java-1.11.1.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-core\1.11.1\flink-core-1.11.1.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-annotations\1.11.1\flink-annotations-1.11.1.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-metrics-core\1.11.1\flink-metrics-core-1.11.1.jar;C:\Users\Administrator\.m2\repository\com\esotericsoftware\kryo\kryo\2.24.0\kryo-2.24.0.jar;C:\Users\Administrator\.m2\repository\com\esotericsoftware\minlog\minlog\1.2\minlog-1.2.jar;C:\Users\Administrator\.m2\repository\org\objenesis\objenesis\2.1\objenesis-2.1.jar;C:\Users\Administrator\.m2\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\Users\Administrator\.m2\repository\org\apache\commons\commons-compress\1.20\commons-compress-1.20.jar;C:\Users\Administrator\.m2\repository\org\apache\commons\commons-lang3\3.3.2\commons-lang3-3.3.2.jar;C:\Users\Administrator\.m2\repository\org\apache\commons\commons-math3\3.5\commons-math3-3.5.jar;C:\Users\Administrator\.m2\repository\org\slf4j\slf4j-api\1.7.15\slf4j-api-1.7.15.jar;C:\Users\Administrator\.m2\repository\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\force-shading\1.11.1\force-shading-1.11.1.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-streaming-java_2.12\1.11.1\flink-streaming-java_2.12-1.11.1.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-runtime_2.12\1.11.1\flink-runtime_2.12-1.11.1.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-queryable-state-client-java\1.11.1\flink-queryable-state-client-java-1.11.1.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-hadoop-fs\1.11.1\flink-hadoop-fs-1.11.1.jar;C:\Users\Administrator\.m2\repository\commons-io\commons-io\2.4\commons-io-2.4.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-shaded-netty\4.1.39.Final-11.0\flink-shaded-netty-4.1.39.Final-11.0.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-shaded-jackson\2.10.1-11.0\flink-shaded-jackson-2.10.1-11.0.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-shaded-zookeeper-3\3.4.14-11.0\flink-shaded-zookeeper-3-3.4.14-11.0.jar;C:\Users\Administrator\.m2\repository\org\javassist\javassist\3.24.0-GA\javassist-3.24.0-GA.jar;C:\Users\Administrator\.m2\repository\com\typesafe\akka\akka-actor_2.12\2.5.21\akka-actor_2.12-2.5.21.jar;C:\Users\Administrator\.m2\repository\com\typesafe\config\1.3.3\config-1.3.3.jar;C:\Users\Administrator\.m2\repository\org\scala-lang\modules\scala-java8-compat_2.12\0.8.0\scala-java8-compat_2.12-0.8.0.jar;C:\Users\Administrator\.m2\repository\com\typesafe\akka\akka-stream_2.12\2.5.21\akka-stream_2.12-2.5.21.jar;C:\Users\Administrator\.m2\repository\org\reactivestreams\reactive-streams\1.0.2\reactive-streams-1.0.2.jar;C:\Users\Administrator\.m2\repository\com\typesafe\ssl-config-core_2.12\0.3.7\ssl-config-core_2.12-0.3.7.jar;C:\Users\Administrator\.m2\repository\org\scala-lang\modules\scala-parser-combinators_2.12\1.1.1\scala-parser-combinators_2.12-1.1.1.jar;C:\Users\Administrator\.m2\repository\com\typesafe\akka\akka-protobuf_2.12\2.5.21\akka-protobuf_2.12-2.5.21.jar;C:\Users\Administrator\.m2\repository\com\typesafe\akka\akka-slf4j_2.12\2.5.21\akka-slf4j_2.12-2.5.21.jar;C:\Users\Administrator\.m2\repository\org\clapper\grizzled-slf4j_2.12\1.3.2\grizzled-slf4j_2.12-1.3.2.jar;C:\Users\Administrator\.m2\repository\com\github\scopt\scopt_2.12\3.5.0\scopt_2.12-3.5.0.jar;C:\Users\Administrator\.m2\repository\org\xerial\snappy\snappy-java\1.1.4\snappy-java-1.1.4.jar;C:\Users\Administrator\.m2\repository\com\twitter\chill_2.12\0.7.6\chill_2.12-0.7.6.jar;C:\Users\Administrator\.m2\repository\com\twitter\chill-java\0.7.6\chill-java-0.7.6.jar;C:\Users\Administrator\.m2\repository\org\lz4\lz4-java\1.6.0\lz4-java-1.6.0.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-shaded-guava\18.0-11.0\flink-shaded-guava-18.0-11.0.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-clients_2.12\1.11.1\flink-clients_2.12-1.11.1.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-optimizer_2.12\1.11.1\flink-optimizer_2.12-1.11.1.jar;C:\Users\Administrator\.m2\repository\commons-cli\commons-cli\1.3.1\commons-cli-1.3.1.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-scala_2.12\1.11.1\flink-scala_2.12-1.11.1.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-shaded-asm-7\7.1-11.0\flink-shaded-asm-7-7.1-11.0.jar;C:\Users\Administrator\.m2\repository\org\scala-lang\scala-reflect\2.12.7\scala-reflect-2.12.7.jar;C:\Users\Administrator\.m2\repository\org\scala-lang\scala-library\2.12.7\scala-library-2.12.7.jar;C:\Users\Administrator\.m2\repository\org\scala-lang\scala-compiler\2.12.7\scala-compiler-2.12.7.jar;C:\Users\Administrator\.m2\repository\org\scala-lang\modules\scala-xml_2.12\1.0.6\scala-xml_2.12-1.0.6.jar;C:\Users\Administrator\.m2\repository\org\apache\flink\flink-streaming-scala_2.12\1.11.1\flink-streaming-scala_2.12-1.11.1.jar;D:\JAVA\scala-2.12.2\lib\scala-library.jar;D:\JAVA\scala-2.12.2\lib\scala-reflect.jar WordCountScalaStream
2> (hello,1)
3> (you,1)
2> (him,1)
2> (she,1)
2> (hello,2) # 累加统计值为2
四、Flink程序开发的流程总结如下:
### --- Flink程序开发的流程总结如下:
~~~ 1)获得一个执行环境
~~~ 2)加载/创建初始化数据
~~~ 3)指定数据操作的算子
~~~ 4)指定结果数据存放位置
~~~ 5)调用execute()触发执行程序
~~~ 注意:Flink程序是延迟计算的,只有最后调用execute()方法的时候才会真正触发执行程序
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
分类:
bdv020-flink
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通