flink(二)
1.dataset和datastream
1).DataSetAPI
分类:
Source: 数据源创建初始数据集,例如来自文件或Java集合
Transformation: 数据转换将一个或多个DataSet转换为新的DataSet
Sink: 将计算结果存储或返回
2).DataStreamAPI
DataStream算子将一个或多个DataStream转换为新DataStream。程序可以将多个转换组合成复杂的数据流拓扑。
DataStreamAPI和DataSetAPI主要的区别在于Transformation部分。
2.Source
数据源,flink数据的源头,flink执行开始前会进行一些必要的检查操作,然后构建有向无环图,如果是Stream模式下,程序会一直接收数据,如果为batch模式,数据在接收完毕后
会自动退出。
package demo; import org.apache.flink.streaming.api.windowing.time.Time; import org.apache.flink.api.common.functions.ReduceFunction; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class Main { private static final Logger LOG = LoggerFactory.getLogger(Main.class); public static void main(String[] args) throws Exception { // get the execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // get input data by connecting to the socket DataStream<String> text = env.socketTextStream("localhost", 55901, "\n"); DataStream<WordWithCount> windowCounts = text .flatMap(new FlatMapFunction<String, WordWithCount>() { @Override public void flatMap(String s, org.apache.flink.util.Collector<WordWithCount> collector) throws Exception { for (String word : s.split("\\s")) { collector.collect(new WordWithCount(word, 1L)); } } }) .keyBy("word") .timeWindow(Time.seconds(5), Time.seconds(1)) .reduce(new ReduceFunction<WordWithCount>() { @Override public WordWithCount reduce(WordWithCount a, WordWithCount b) { return new WordWithCount(a.word, a.count + b.count); } }); // print the results with a single thread, rather than in parallel windowCounts.print(); env.execute("Socket Window WordCount"); } public static class WordWithCount { public String word; public long count; public WordWithCount(String word, long count) { this.word = word; this.count = count; LOG.info(this.toString()); } @Override public String toString() { return word + " : " + count; } } }
3.sink
数据的保存也是整个flink执行流程的终点,能使整个流程终止的终点有collect和sink,print也是一种sink的形式。sink构建完毕后只需link form到相应的节点即可。