yangyang12138

导航

flink(二)

1.dataset和datastream

  1).DataSetAPI

    分类:

      Source: 数据源创建初始数据集,例如来自文件或Java集合
      Transformation: 数据转换将一个或多个DataSet转换为新的DataSet
      Sink: 将计算结果存储或返回
  2).DataStreamAPI

    DataStream算子将一个或多个DataStream转换为新DataStream。程序可以将多个转换组合成复杂的数据流拓扑。

    DataStreamAPI和DataSetAPI主要的区别在于Transformation部分。

2.Source

  数据源,flink数据的源头,flink执行开始前会进行一些必要的检查操作,然后构建有向无环图,如果是Stream模式下,程序会一直接收数据,如果为batch模式,数据在接收完毕后

  会自动退出。

  

package demo;

import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;


public class Main {

    private static final Logger LOG = LoggerFactory.getLogger(Main.class);

    public static void main(String[] args) throws Exception {


        // get the execution environment
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // get input data by connecting to the socket
        DataStream<String> text = env.socketTextStream("localhost", 55901, "\n");

        DataStream<WordWithCount> windowCounts = text
                .flatMap(new FlatMapFunction<String, WordWithCount>() {
                    @Override
                    public void flatMap(String s, org.apache.flink.util.Collector<WordWithCount> collector) throws Exception {
                        for (String word : s.split("\\s")) {
                            collector.collect(new WordWithCount(word, 1L));
                        }
                    }
                })
                .keyBy("word")
                .timeWindow(Time.seconds(5), Time.seconds(1))
                .reduce(new ReduceFunction<WordWithCount>() {
                    @Override
                    public WordWithCount reduce(WordWithCount a, WordWithCount b) {
                        return new WordWithCount(a.word, a.count + b.count);
                    }
                });

        // print the results with a single thread, rather than in parallel
        windowCounts.print();

        env.execute("Socket Window WordCount");
    }

    public static class WordWithCount {

        public String word;
        public long count;

        public WordWithCount(String word, long count) {
            this.word = word;
            this.count = count;


            LOG.info(this.toString());
        }

        @Override
        public String toString() {
            return word + " : " + count;
        }
    }
}

3.sink

  数据的保存也是整个flink执行流程的终点,能使整个流程终止的终点有collect和sink,print也是一种sink的形式。sink构建完毕后只需link form到相应的节点即可。

posted on 2020-06-18 01:22  杨杨09265  阅读(212)  评论(0编辑  收藏  举报