一.Flink流处理API之Environment与Source

 

 

 一.Environment

  1.getExecutionEnvironment

    ①ExecutionEnvironment.getExecutionEnvironment

    ②StreamExecutionEnvironment.getExecutionEnvironment

    ③如果没有设置并行度,会以flink-conf.yaml中的配置为准,默认是1

  2.createLocalEnvironment

    ①val env = StreamExecutionEnvironment.createLocalEnvironment(1)

    返回本地执行环境,需要在调用时指定默认的并行度

  3.createRemoteEnvironment

    val env = ExecutionEnvironment.createRemoteEnvironment("jobmanage-hostname", 6123,"YOURPATH//wordcount.jar")

    返回集群执行环境,将Jar提交到远程服务器。需要在调用时指定JobManager的IP和端口号,并指定要在集群中运行的Jar包

二.Source

  ①从集合读取数据

    DataStream<SensorReading> sensorDataStream = env.fromCollection(
                Arrays.asList(
                        new SensorReading("sensor_1", 1547718199L, 35.8),
                        new SensorReading("sensor_6", 1547718201L, 15.4),
                        new SensorReading("sensor_7", 1547718202L, 6.7),
                        new SensorReading("sensor_10", 1547718205L, 38.1)
                )
        );

  ②从文件读取数据

  val stream2 = env.readTextFile("YOUR_FILE_PATH")

  ③以kafka消息队列的数据作为来源

    需要引入kafka连接器的依赖:在pom.xml文件中添加依赖

    <dependency>
        <groupId>org.apache.flink</groupId>
       <artifactId>flink-connector-kafka-0.11_2.12</artifactId>
        <version>1.10.1</version>
    </dependency>

    具体代码如下:

    // kafka配置项
    Properties properties = new Properties();
    properties.setProperty("bootstrap.servers", "hadoop102:9092");
    properties.setProperty("group.id", "consumer-group");
    properties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
    properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
    properties.setProperty("auto.offset.reset", "latest");

    // kafka读取数据
    DataStream<String> dataStream = env.addSource( new FlinkKafkaConsumer011<String>("sensor", new       SimpleStringSchema(), properties));

  ④自定义Source

  

  除了以上的source数据来源,我们还可以自定义source。需要做的,只是传入一个SourceFunction就可以。具体调用如下:

  DataStream<SensorReading> dataStream = env.addSource( new CustomerSource());

  我们希望可以随机生成传感器数据,CustomerSource具体的代码实现如下:

 

   //2.从自定义的Source中读取数据
DataStreamSource<SensorReading> sensorDS = env.addSource(new CustomerSource());
//3.打印
sensorDS.print();
//4.启动任务
env.execute("Flink04_Source_Customer");
}

public static class CustomerSource implements SourceFunction<SensorReading> {

//定义标志位控制数据接收
private boolean running = true;

Random random = new Random();


@Override

public void run(SourceContext<SensorReading> ctx) throws Exception {
//定义Map
HashMap<String, Double> tempMap = new HashMap<String, Double>();
//map中添加基准值
for (int i = 0; i < 10; i++) {
tempMap.put("_Sensor"+i,50 + random.nextGaussian() * 20);
}
while (running) {
for (String id : tempMap.keySet()) {
//提取上一次当前传感器温度
Double temp= tempMap.get(id);

double newTemp = temp+random.nextGaussian();
ctx.collect(new SensorReading(id,System.currentTimeMillis(),newTemp));
//将当前温度设置进Map,给下一次作为基准
tempMap.put(id,newTemp);
}
Thread.sleep(2000);

}
}

@Override
public void cancel() {
running = false;
}

  

  

posted @ 2020-11-29 10:14  huihui大数据  阅读(520)  评论(0编辑  收藏  举报