窗口(Window)是 Flink 众多优势之一。

窗口可以将数据流装入大小有限的“桶”中,再对每个“桶”加以处理

随着实时数仓和 flink sql 越来越成熟,flink sql 又变成 flink 中最常用的 api

在这种场景下,我们来看看 flink sql 窗口的源码。

(练习一下表达能力,扯一些有的没的)

窗口样例

先来看 flink sql Group Windows 的写法:

select date_format(now(), 'yyyy-MM-dd HH:mm:ss')
     ,date_format(TUMBLE_START(proc_time, INTERVAL '1' minute), 'yyyy-MM-dd HH:mm:ss') AS wStart
     ,date_format(TUMBLE_END(proc_time, INTERVAL '1' minute), 'yyyy-MM-dd HH:mm:ss') AS wEnd
     ,count(user_id) pv
     ,count(distinct user_id) uv
     ,max(user_id)
from user_log
group by TUMBLE(proc_time, INTERVAL '1' minute)
;

完整sql 参加 github sqlSubmit kafka_window_agg.sql

Group Windows 源码

先看下执行流图:

Group Windows

先来看个简单的 Group Windows 的源码,Group Windows 就是和 Stream api 类似的 window,通过解析 sql ,调用 StreamExecGroupWindowAggregate.translateToPlanInternal 创建 window operator

StreamExecGroupWindowAggregate.translateToPlanInternal


  @SuppressWarnings("unchecked")
    @Override
    protected Transformation<RowData> translateToPlanInternal(
            PlannerBase planner, ExecNodeConfig config) {
        final boolean isCountWindow;
        
        ......

        // agg 函数 见下图
        final AggregateInfoList aggInfoList =
                transformToStreamAggregateInfoList(
                        inputRowType,
                        JavaScalaConversionUtil.toScala(Arrays.asList(aggCalls)),
                        aggCallNeedRetractions,
                        needRetraction,
                        true, // isStateBackendDataViews
                        true); // needDistinctInfo
        // 生成的 agg 函数代码
        final GeneratedClass<?> aggCodeGenerator =
                createAggsHandler(
                        aggInfoList,
                        config,
                        planner.getRelBuilder(),
                        inputRowType.getChildren(),
                        shiftTimeZone);

        .......

        // 创建 窗口算子
        final WindowOperator<?, ?> operator =
                createWindowOperator(
                        config,
                        aggCodeGenerator,
                        equaliser,
                        accTypes,
                        windowPropertyTypes,
                        aggValueTypes,
                        inputRowType.getChildren().toArray(new LogicalType[0]),
                        inputTimeFieldIndex,
                        shiftTimeZone,
                        inputCountIndex);
        .........
    }

  • 注: 代码比较长,截取部分

aggInfoList :

aggCodeGenerator:

StreamExecGroupWindowAggregate.createWindowOperator

    private WindowOperator<?, ?> createWindowOperator(
            ReadableConfig config,
            GeneratedClass<?> aggsHandler,
            GeneratedRecordEqualiser recordEqualiser,
            LogicalType[] accTypes,
            LogicalType[] windowPropertyTypes,
            LogicalType[] aggValueTypes,
            LogicalType[] inputFields,
            int timeFieldIndex,
            ZoneId shiftTimeZone,
            int inputCountIndex) {
        WindowOperatorBuilder builder =
                WindowOperatorBuilder.builder()
                        .withInputFields(inputFields)
                        .withShiftTimezone(shiftTimeZone)
                        .withInputCountIndex(inputCountIndex);
        // 根据窗口类型窗口对应的 窗口 builder 
        if (window instanceof TumblingGroupWindow) {
        	// 滚动窗口
            TumblingGroupWindow tumblingWindow = (TumblingGroupWindow) window;
            FieldReferenceExpression timeField = tumblingWindow.timeField();
            ValueLiteralExpression size = tumblingWindow.size();
            if (isProctimeAttribute(timeField) && hasTimeIntervalType(size)) {
            	// 处理时间窗口
                builder = builder.tumble(toDuration(size)).withProcessingTime();
            } else if (isRowtimeAttribute(timeField) && hasTimeIntervalType(size)) {
            	// 事件时间窗口
                builder = builder.tumble(toDuration(size)).withEventTime(timeFieldIndex);
            } else if (isProctimeAttribute(timeField) && hasRowIntervalType(size)) {
            	// count 窗口
                builder = builder.countWindow(toLong(size));
            } else {
                // TODO: EventTimeTumblingGroupWindow should sort the stream on event time
                // before applying the  windowing logic. Otherwise, this would be the same as a
                // ProcessingTimeTumblingGroupWindow
                throw new UnsupportedOperationException(
                        "Event-time grouping windows on row intervals are currently not supported.");
            }
        } else if (window instanceof SlidingGroupWindow) {
        	// 滑动窗口
            
            .....
            }
        } else if (window instanceof SessionGroupWindow) {
        	// session 窗口
            .....
        } else {
            throw new TableException("Unsupported window: " + window.toString());
        }

        // 窗口触发策略,比如示例 sql ,事件时间,窗口结束才触发
        WindowEmitStrategy emitStrategy = WindowEmitStrategy.apply(config, window);
        if (emitStrategy.produceUpdates()) {
            // mark this operator will send retraction and set new trigger
            builder.produceUpdates()
                    .triggering(emitStrategy.getTrigger())
                    .withAllowedLateness(Duration.ofMillis(emitStrategy.getAllowLateness()));
        }

        if (aggsHandler instanceof GeneratedNamespaceAggsHandleFunction) {
        	// agg 
            return builder.aggregate(
                            (GeneratedNamespaceAggsHandleFunction<?>) aggsHandler,
                            recordEqualiser,
                            accTypes,
                            aggValueTypes,
                            windowPropertyTypes)
            		// 创建 AggregateWindowOperator
                    .build();
        } else if (aggsHandler instanceof GeneratedNamespaceTableAggsHandleFunction) {
        	// table agg
            return builder.aggregate(
                            (GeneratedNamespaceTableAggsHandleFunction<?>) aggsHandler,
                            accTypes,
                            aggValueTypes,
                            windowPropertyTypes)
                    .build();
        } else {
            throw new TableException(
                    "Unsupported agg handler class: " + aggsHandler.getClass().getSimpleName());
        }
    }

窗口触发策略:

-----------补充----------
如果配置了提前触发参数:

set table.exec.emit.early-fire.enabled = true;
set table.exec.emit.early-fire.delay = 5000;


WindowOperatorBuilder.build

        public AggregateWindowOperator build() {
            checkNotNull(windowOperatorBuilder.trigger, "trigger is not set");
            if (generatedAggregateFunction != null && generatedEqualiser != null) {
            	// agg function 不为空
                //noinspection unchecked
                return new AggregateWindowOperator(
                        generatedAggregateFunction,
                        generatedEqualiser,
                        windowOperatorBuilder.windowAssigner,
                        windowOperatorBuilder.trigger,
                        windowOperatorBuilder.windowAssigner.getWindowSerializer(
                                new ExecutionConfig()),
                        windowOperatorBuilder.inputFieldTypes,
                        windowOperatorBuilder.accumulatorTypes,
                        windowOperatorBuilder.aggResultTypes,
                        windowOperatorBuilder.windowPropertyTypes,
                        windowOperatorBuilder.rowtimeIndex,
                        windowOperatorBuilder.produceUpdates,
                        windowOperatorBuilder.allowedLateness,
                        windowOperatorBuilder.shiftTimeZone,
                        windowOperatorBuilder.inputCountIndex);
            } else {
               
               ..... 
            }
        }

AggregateWindowOperator.AggregateWindowOperator 直接调用 父类 WindowOperator 的构造放

WindowOperator(
            WindowAssigner<W> windowAssigner,
            Trigger<W> trigger,
            TypeSerializer<W> windowSerializer,
            LogicalType[] inputFieldTypes,
            LogicalType[] accumulatorTypes,
            LogicalType[] aggResultTypes,
            LogicalType[] windowPropertyTypes,
            int rowtimeIndex,
            boolean produceUpdates,
            long allowedLateness,
            ZoneId shiftTimeZone,
            int inputCountIndex) {
        checkArgument(allowedLateness >= 0);
        this.windowAssigner = checkNotNull(windowAssigner);
        this.trigger = checkNotNull(trigger);
        this.windowSerializer = checkNotNull(windowSerializer);
        this.inputFieldTypes = checkNotNull(inputFieldTypes);
        this.accumulatorTypes = checkNotNull(accumulatorTypes);
        this.aggResultTypes = checkNotNull(aggResultTypes);
        this.windowPropertyTypes = checkNotNull(windowPropertyTypes);
        this.allowedLateness = allowedLateness;
        this.produceUpdates = produceUpdates;

        // rowtime index should >= 0 when in event time mode
        checkArgument(!windowAssigner.isEventTime() || rowtimeIndex >= 0);
        this.rowtimeIndex = rowtimeIndex;
        this.shiftTimeZone = shiftTimeZone;
        this.recordCounter = RecordCounter.of(inputCountIndex);

        setChainingStrategy(ChainingStrategy.ALWAYS);
    }

sql api 的 WindowOperator 和 stream api 的 WindowOperator 基本一样

核心方法:

// 数据进来的时候调用
public void processElement(StreamRecord<RowData> record) 

// 事件时间调用
public void onEventTime(InternalTimer<K, W> timer)

// 处理时间调用
public void onProcessingTime(InternalTimer<K, W> timer)

和抽象方法 emitWindowResult,在 AggregateWindowOperator 中实现

// 输出窗口结果
protected abstract void emitWindowResult(W window) throws Exception;

WindowOperator

AggregateWindowOperator.emitWindowResult:

 @Override
protected void emitWindowResult(W window) throws Exception {
    windowFunction.prepareAggregateAccumulatorForEmit(window);
    RowData acc = aggWindowAggregator.getAccumulators();
    RowData aggResult = aggWindowAggregator.getValue(window);
    // 判断是否发出 更新(-U) 消息,配置提前触发参数的窗口会发出更新消息
    if (produceUpdates) {
    	// 获取上一个窗口数据
        previousState.setCurrentNamespace(window);
        RowData previousAggResult = previousState.value();

        // 判断 acc 是否为空,
        if (!recordCounter.recordCountIsZero(acc)) {
            // has emitted result for the window
            // 判断上一窗口是否为 空
            if (previousAggResult != null) {
            	// 上一窗口不为空
                // current agg is not equal to the previous emitted, should emit retract

            	// 当前窗口结果是否和上一窗口结果相同
                if (!equaliser.equals(aggResult, previousAggResult)) {
                	// 不相同,发出,撤回上一窗口结果和新的结果
                    // send UPDATE_BEFORE
                    collect(
                            RowKind.UPDATE_BEFORE,
                            (RowData) getCurrentKey(),
                            previousAggResult);
                    // send UPDATE_AFTER
                    collect(RowKind.UPDATE_AFTER, (RowData) getCurrentKey(), aggResult);
                    // update previousState
                    previousState.update(aggResult);
                }
                // if the previous agg equals to the current agg, no need to send retract and
                // accumulate
            }
            // the first fire for the window, only send INSERT
            else {
            	// 上一窗口为空,发出 本窗口结果
                // send INSERT
                collect(RowKind.INSERT, (RowData) getCurrentKey(), aggResult);
                // update previousState
                previousState.update(aggResult);
            }
        } else {
        	// acc 为空,没有当前数据需要触发

        	// 判断之前窗口数据是否为空,为空撤回上一个窗口数据
            // has emitted result for the window
            // we retracted the last record for this key
            if (previousAggResult != null) {
                // send DELETE
                collect(RowKind.DELETE, (RowData) getCurrentKey(), previousAggResult);
                // clear previousState
                previousState.clear();
            }
            // if the counter is zero, no need to send accumulate
        }
    } else {
    	// 不需要的直接 发出 +I 数据
        if (!recordCounter.recordCountIsZero(acc)) {
            // send INSERT
            collect(RowKind.INSERT, (RowData) getCurrentKey(), aggResult);
        }
        // if the counter is zero, no need to send accumulate
        // there is no possible skip `if` branch when `produceUpdates` is false
    }
}

逻辑有点复杂,画了个图:

搞定

下篇: 《Flink sql Window TVF 源码解析》 待定

欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文

posted on 2022-06-14 18:11  Flink菜鸟  阅读(756)  评论(0编辑  收藏  举报