flink-java版学习-1-本地批处理和流处理
之前学习了java基础知识,也忘记的差不多了,日常工作还是天天写sql,不写就是不行,还是跟着项目多写写,也学习下flink。
学习主要参照尚硅谷的教程,尚硅谷Java版Flink(武老师清华硕士,原IBM-CDL负责人)_哔哩哔哩_bilibili,感谢尚硅谷的免费课程,有上海的周末培训班就好了,感谢B站。
1、首先,需要配置pom文件,我抄来的
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.shihuo</groupId>
<artifactId>FlinkTutorial</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>1.10.1</version>
</dependency>
</dependencies>
</project>
2、批任务的代码
package com.shihuo.wc;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.api.java.tuple.Tuple0;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
//这是个批的例子
public class WorldCount {
public static void main(String[] args) throws Exception{
// 创建执行环境
ExecutionEnvironment executionEnvironment = ExecutionEnvironment.getExecutionEnvironment();
//从文件中读取数据
String inputPath = "/Users/wangyuhang/Desktop/FlinkTutorial/src/main/resources/hello.txt";
DataSet<String> stringDataSet = executionEnvironment.readTextFile(inputPath);
//对数据集进行处理,按空格分词展开,转换成(world,1)二元祖进行统计
DataSet<Tuple2<String, Integer>> resultSet = stringDataSet.flatMap(new MyFlatMapper())
.groupBy(0)
.sum(1); //这里指的是第二个位置
resultSet.print();
}
//自定义类,实现FlatMapFunctionFunction接口
public static class MyFlatMapper implements FlatMapFunction<String, Tuple2<String ,Integer>>{
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
//按空格分词
String[] words = value.split(" ");
//遍历所有word,包成二元组输出
for(String word:words){
out.collect(new Tuple2<>(word,1));
}
}
}
}
然后是结果,中间输出了一坨日志,我还以为是报错
3、流处理的例子
package com.shihuo.wc;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class StreamingWorldCount {
public static void main(String[] args) throws Exception{
//创建执行流处理的执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//从文件中读取数据
String inputPath = "/Users/wangyuhang/Desktop/FlinkTutorial/src/main/resources/hello.txt";
DataStream<String> stringDataStream = env.readTextFile(inputPath);
//基于数据流进行转换计算
SingleOutputStreamOperator<Tuple2<String,Integer>> reslutStram = stringDataStream.flatMap(new WorldCount.MyFlatMapper())
.keyBy(0)
.sum(1);
reslutStram.print();
//需要执行
env.execute();
}
}
结果如下,这里数字代表分线程,默认并行度:
可以加上这个设置线程数:
env.setParallelism(4);