一.Flink流处理API之Environment与Source
一.Environment
1.getExecutionEnvironment
①ExecutionEnvironment.getExecutionEnvironment
②StreamExecutionEnvironment.getExecutionEnvironment
③如果没有设置并行度,会以flink-conf.yaml中的配置为准,默认是1
2.createLocalEnvironment
①val env = StreamExecutionEnvironment.createLocalEnvironment(1)
返回本地执行环境,需要在调用时指定默认的并行度
3.createRemoteEnvironment
val env = ExecutionEnvironment.createRemoteEnvironment("jobmanage-hostname", 6123,"YOURPATH//wordcount.jar")
返回集群执行环境,将Jar提交到远程服务器。需要在调用时指定JobManager的IP和端口号,并指定要在集群中运行的Jar包
二.Source
①从集合读取数据
DataStream<SensorReading> sensorDataStream = env.fromCollection(
Arrays.asList(
new SensorReading("sensor_1", 1547718199L, 35.8),
new SensorReading("sensor_6", 1547718201L, 15.4),
new SensorReading("sensor_7", 1547718202L, 6.7),
new SensorReading("sensor_10", 1547718205L, 38.1)
)
);
②从文件读取数据
val stream2 = env.readTextFile("YOUR_FILE_PATH")
③以kafka消息队列的数据作为来源
需要引入kafka连接器的依赖:在pom.xml文件中添加依赖
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.11_2.12</artifactId>
<version>1.10.1</version>
</dependency>
具体代码如下:
// kafka配置项
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "hadoop102:9092");
properties.setProperty("group.id", "consumer-group");
properties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("auto.offset.reset", "latest");
// 从kafka读取数据
DataStream<String> dataStream = env.addSource( new FlinkKafkaConsumer011<String>("sensor", new SimpleStringSchema(), properties));
④自定义Source
除了以上的source数据来源,我们还可以自定义source。需要做的,只是传入一个SourceFunction就可以。具体调用如下:
DataStream<SensorReading> dataStream = env.addSource( new CustomerSource());
我们希望可以随机生成传感器数据,CustomerSource具体的代码实现如下:
//2.从自定义的Source中读取数据
DataStreamSource<SensorReading> sensorDS = env.addSource(new CustomerSource());
//3.打印
sensorDS.print();
//4.启动任务
env.execute("Flink04_Source_Customer");
}
public static class CustomerSource implements SourceFunction<SensorReading> {
//定义标志位控制数据接收
private boolean running = true;
Random random = new Random();
@Override
public void run(SourceContext<SensorReading> ctx) throws Exception {
//定义Map
HashMap<String, Double> tempMap = new HashMap<String, Double>();
//向map中添加基准值
for (int i = 0; i < 10; i++) {
tempMap.put("_Sensor"+i,50 + random.nextGaussian() * 20);
}
while (running) {
for (String id : tempMap.keySet()) {
//提取上一次当前传感器温度
Double temp= tempMap.get(id);
double newTemp = temp+random.nextGaussian();
ctx.collect(new SensorReading(id,System.currentTimeMillis(),newTemp));
//将当前温度设置进Map,给下一次作为基准
tempMap.put(id,newTemp);
}
Thread.sleep(2000);
}
}
@Override
public void cancel() {
running = false;
}