随笔分类 -  Apache Flink

1 2 3 下一页

流计算技术实战 - CEP
摘要:CEP,Complex event processing Wiki定义 “Complex event processing, or CEP, is event processing that combines data from multiple sources[2] to infer events 阅读全文

posted @ 2017-12-22 11:46 fxjwind 阅读(16939) 评论(4) 推荐(0) 编辑

流计算技术实战 - 超大维表问题
摘要:维度表,作为数据仓库里面的概念,是维度属性的集合,比如时间维、地点维; 但这里要讨论流计算中的维度表问题, 流计算中维表问题和数据仓库中有所不同,往往是因为通过agent采集到的数据比较有限,在做数据业务的时候,需要先实时的把这些维度信息给补全; 这个问题其实就是,主数据流和多个静态表或半静态表之间的join问题。 在flink中称为side input问题,https://cwiki.a... 阅读全文

posted @ 2017-11-02 11:25 fxjwind 阅读(3909) 评论(2) 推荐(0) 编辑

Flink - allowedLateness
摘要:WindowOperator processElement 如果clear只是简单的注册EventTimeTimer,那么在onEventTime的时候一定有clear的逻辑、 WindowOperator.onEventTime 果然,onEventTime的时候会判断,如果Timer的time等 阅读全文

posted @ 2017-10-31 11:54 fxjwind 阅读(1142) 评论(0) 推荐(0) 编辑

Flink – process watermark
摘要:WindowOperator.processElement 主要的工作,将当前的element的value加到对应的window中, windowState.setCurrentNamespace(window); windowState.add(element.getValue()); triggerContex... 阅读全文

posted @ 2017-10-12 17:08 fxjwind 阅读(1945) 评论(0) 推荐(0) 编辑

Flink - InputGate
摘要:初始化 Task List consumedPartitions = tdd.getInputGates(); // Consumed intermediate result partitions this.inputGates = new SingleInputGate[consumedPartitions.size()]; this.inputGatesById = new Has... 阅读全文

posted @ 2017-10-09 15:35 fxjwind 阅读(1209) 评论(0) 推荐(0) 编辑

Flink - ResultPartition
摘要:发送数据一般通过,collector.collect public interface Collector { /** * Emits a record. * * @param record The record to collect. */ void collect(T record); /** ... 阅读全文

posted @ 2017-10-09 15:34 fxjwind 阅读(1329) 评论(0) 推荐(0) 编辑

Flink -- Keyed State
摘要:/* {@code * DataStream stream = ...; * KeyedStream keyedStream = stream.keyBy("id"); * * keyedStream.map(new RichMapFunction>() { * * private ValueState count;... 阅读全文

posted @ 2017-09-28 16:52 fxjwind 阅读(1211) 评论(0) 推荐(0) 编辑

Flink – CEP NFA
摘要:看看Flink cep如何将pattern转换为NFA? 当来了一条event,如果在NFA中执行的? 前面的链路,CEP –> PatternStream –> select –> CEPOperatorUtils.createPatternStream 1. 产生NFACompiler.compileFactory,完成pattern到state的转换final NFACompiler... 阅读全文

posted @ 2017-09-26 16:02 fxjwind 阅读(2022) 评论(0) 推荐(0) 编辑

FlinkCEP - Complex event processing for Flink
摘要:https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/libs/cep.html 首先目的是匹配pattern sequence pattern Sequence是由多个pattern构成 DataStream input = ... Pattern pattern = Pattern.begin("start").w... 阅读全文

posted @ 2017-08-08 16:31 fxjwind 阅读(1264) 评论(0) 推荐(0) 编辑

Flink - CoGroup
摘要:使用方式, dataStream.coGroup(otherStream) .where(0).equalTo(1) .window(TumblingEventTimeWindows.of(Time.seconds(3))) .apply (new CoGroupFunction () {...}); 可以看到coGroup只是产生CoGroupedStr... 阅读全文

posted @ 2017-07-21 12:00 fxjwind 阅读(2209) 评论(0) 推荐(0) 编辑

Flink – Stream Task执行过程
摘要:Task.run if (invokable instanceof StatefulTask) { StatefulTask op = (StatefulTask) invokable; op.setInitialState(taskStateHandles);} // run the invokableinvokable.invoke(); invokable是StreamT... 阅读全文

posted @ 2017-07-06 20:21 fxjwind 阅读(1115) 评论(0) 推荐(0) 编辑

Flink - Asynchronous I/O
摘要:https://docs.google.com/document/d/1Lr9UYXEz6s6R_3PWg3bZQLF3upGaNEkc0rQCFSzaYDI/edit // create the original stream DataStream stream = ...; // apply the async I/O transformation DataStream> re... 阅读全文

posted @ 2017-06-15 17:55 fxjwind 阅读(1336) 评论(0) 推荐(1) 编辑

Flink - FlinkKafkaConsumer010
摘要:Properties properties = new Properties(); properties.setProperty("bootstrap.servers", "localhost:9092"); // only required for Kafka 0.8 properties.setProperty("zookeeper.connect", "localhost:2181"); p... 阅读全文

posted @ 2017-06-07 16:43 fxjwind 阅读(6837) 评论(0) 推荐(0) 编辑

Flink - FlinkKafkaProducer010
摘要:https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kafka.html 使用的方式, DataStream stream = ...; FlinkKafkaProducer010Configuration myProducerConfig = FlinkKafkaProducer010.... 阅读全文

posted @ 2017-06-07 11:36 fxjwind 阅读(1819) 评论(0) 推荐(0) 编辑

Calcite - StreamingSQL
摘要:https://calcite.apache.org/docs/stream.html Calcite’s SQL is an extension to standard SQL, not another ‘SQL-like’ language. The distinction is important, for several reasons: Streaming SQL is ... 阅读全文

posted @ 2017-04-27 17:17 fxjwind 阅读(2476) 评论(0) 推荐(0) 编辑

Flink - ShipStrategyType
摘要:对于DataStream,可以选择如下的Strategy, /** * Sets the partitioning of the {@link DataStream} so that the output elements * are broadcasted to every parallel instance of the next operation. ... 阅读全文

posted @ 2017-04-14 10:55 fxjwind 阅读(819) 评论(0) 推荐(0) 编辑

Flink - Scheduler
摘要:Job资源分配的过程, 在submitJob中,会生成ExecutionGraph 最终调用到, executionGraph.scheduleForExecution(scheduler) 接着,ExecutionGraph public void scheduleForExecution(SlotProvider slotProvider) throws JobException... 阅读全文

posted @ 2017-04-13 15:13 fxjwind 阅读(1319) 评论(0) 推荐(0) 编辑

Flink – SlotSharingGroup
摘要:SlotSharingGroup 表示不同的task可以共享slot,但是这是soft的约束,即也可以不在一个slot 默认情况下,整个StreamGraph都会用一个默认的“default” SlotSharingGroup,即所有的JobVertex的task都可以共用一个slot /** * A slot sharing units defines which dif... 阅读全文

posted @ 2017-04-13 12:17 fxjwind 阅读(6454) 评论(0) 推荐(0) 编辑

Flink – JobManager.submitJob
摘要:JobManager作为actor, case SubmitJob(jobGraph, listeningBehaviour) => val client = sender() val jobInfo = new JobInfo(client, listeningBehaviour, System.currentTimeMillis(), jo... 阅读全文

posted @ 2017-04-05 17:25 fxjwind 阅读(2337) 评论(0) 推荐(0) 编辑

Flink - StreamJob
摘要:先看最简单的例子, final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream> stream = env.addSource(...); stream .map(new MapFunction() {...}) .add... 阅读全文

posted @ 2017-04-01 13:51 fxjwind 阅读(2613) 评论(0) 推荐(0) 编辑

1 2 3 下一页