


public class ProcessFunctionTest {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        SingleOutputStreamOperator<Event> eventDS = env.addSource(new ClickSource())
                                .withTimestampAssigner(new SerializableTimestampAssigner<Event>() {
                                    public long extractTimestamp(Event element, long recordTimestamp) {
                                        return element.timestamp;

        eventDS.process(new ProcessFunction<Event, String>() {
            public void processElement(Event value, Context ctx, Collector<String> out) throws Exception {
                if (value.user.equals("依琳")) {
                    out.collect(value.user + " clicks " + value.url);
                } else if (value.user.equals("令狐冲")) {
"timestamp - > "+ctx.timestamp());
"currentWatermark - > "+ctx.timerService().currentWatermark()); } }).print(); env.execute();

这里在ProcessFunction中重写了.processElement()方法,自定义了一种处理逻辑:当数据的user为“依琳”时,将其输出一次;而如果为“令狐冲”时,将user输出两次。这里的输 出 , 是 通 过 调 用out.collect()来实现的。另外我们还可以调用ctx.timerService().currentWatermark()来获取当前的水位线打印输出。所以可以看到,ProcessFunction函数有点像FlatMapFunction的升级版。可以实现Map、Filter、FlatMap的所有功能。很明显,处理函数非常强大,能够做很多之前做不到的事情。


public abstract class ProcessFunction<I, O> extends AbstractRichFunction {

    private static final long serialVersionUID = 1L;

     * Process one element from the input stream.
     * <p>This function can output zero or more elements using the {@link Collector} parameter and
     * also update internal state or set timers using the {@link Context} parameter.
     * @param value The input value.
     * @param ctx A {@link Context} that allows querying the timestamp of the element and getting a
     *     {@link TimerService} for registering timers and querying the time. The context is only
     *     valid during the invocation of this method, do not store it.
     * @param out The collector for returning result values.
     * @throws Exception This method may throw exceptions. Throwing an exception will cause the
     *     operation to fail and may trigger recovery.
    public abstract void processElement(I value, Context ctx, Collector<O> out) throws Exception;

     * Called when a timer set using {@link TimerService} fires.
     * @param timestamp The timestamp of the firing timer.
     * @param ctx An {@link OnTimerContext} that allows querying the timestamp of the firing timer,
     *     querying the {@link TimeDomain} of the firing timer and getting a {@link TimerService}
     *     for registering timers and querying the time. The context is only valid during the
     *     invocation of this method, do not store it.
     * @param out The collector for returning result values.
     * @throws Exception This method may throw exceptions. Throwing an exception will cause the
     *     operation to fail and may trigger recovery.
    public void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception {}

     * Information available in an invocation of {@link #processElement(Object, Context, Collector)}
     * or {@link #onTimer(long, OnTimerContext, Collector)}.
    public abstract class Context {

         * Timestamp of the element currently being processed or timestamp of a firing timer.
         * <p>This might be {@code null}, for example if the time characteristic of your program is
         * set to {@link org.apache.flink.streaming.api.TimeCharacteristic#ProcessingTime}.
        public abstract Long timestamp();

        /** A {@link TimerService} for querying time and registering timers. */
        public abstract TimerService timerService();

         * Emits a record to the side output identified by the {@link OutputTag}.
         * @param outputTag the {@code OutputTag} that identifies the side output to emit to.
         * @param value The record to emit.
        public abstract <X> void output(OutputTag<X> outputTag, X value);

     * Information available in an invocation of {@link #onTimer(long, OnTimerContext, Collector)}.
    public abstract class OnTimerContext extends Context {
        /** The {@link TimeDomain} of the firing timer. */
        public abstract TimeDomain timeDomain();


并提供了用于查询时间和注册定时器的“定时服务”(TimerService),以及可以将数据发送到“侧输出流”(side output)的方法.output()。Context抽象类定义如下:
     * Information available in an invocation of {@link #processElement(Object, Context, Collector)}
     * or {@link #onTimer(long, OnTimerContext, Collector)}.
    public abstract class Context {

         * Timestamp of the element currently being processed or timestamp of a firing timer.
         * <p>This might be {@code null}, for example if the time characteristic of your program is
         * set to {@link org.apache.flink.streaming.api.TimeCharacteristic#ProcessingTime}.
        public abstract Long timestamp();

        /** A {@link TimerService} for querying time and registering timers. */
        public abstract TimerService timerService();

         * Emits a record to the side output identified by the {@link OutputTag}.
         * @param outputTag the {@code OutputTag} that identifies the side output to emit to.
         * @param value The record to emit.
        public abstract <X> void output(OutputTag<X> outputTag, X value);



posted @ 2022-07-07 12:51  晓枫的春天  阅读(270)  评论(0编辑  收藏  举报