Flink CEP简单使用

环境准备

使用Flink CEP组件之前需要将FlinkCEP的依赖库引入到项目中。本文基于1.9.1开发。
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-cep_2.11</artifactId>
    <version>1.9.1</version>
</dependency>

基本概念

事件定义

简单事件

处理单一事件，事件的定义可以直接观察出来，处理过程无需关注多个事件之间的关系，能够通过简单的数据处理手段将结果计算出来。
复杂事件

相对于简单事件，复杂事件处理的不仅是单一的事件，也处理由多个事件组成的复合事件。复杂事件处理监测分析事件流（Event Streaming），当特点事件发生时来触发某些动作。

事件关系

复杂事件中事件于事件之间包含多种类型关系，常见的有时序关系、聚合关系、依赖关系及因果关系等。

时序关系

动作事件和动作事件之间，动作事件和状态变化事件之间，都存在时间顺序。事件和事件的时序关系决定了大部分的时序规则，例如A事件状态持续为1的同时B事件状态变为0等。
聚合关系

动作事件和动作事件之间，状态事件和状态事件之间都存在聚合关系，即个体聚合形成整体集合。例如A事件状态为1的次数为10触发预警。
层次关系

分类事件和子类事件的层次关系，从父类到子类是具体化的，从子类到父类是泛化的。
依赖关系

事务的状态属性之间存在彼此的依赖关系和约束关系。例如A事件状态触发的前提条件是B事件触发，则A与B事件之间就形成了依赖关系。
因果关系

对于完整的动作过程，结果状态为果，初始状态和动作都可以视为因。例如A事件状态的改变导致了B事件的触发，则A事件就是因，而B事件就是果。

API用法

API	含义	示例	含义
where()	指定匹配条件	pattern.where(_ = 1)	匹配为1的数据
or()	匹配条件，或者关系	pattern.or(_ = 2)	匹配为2的数据
times()	模式发生的次数	pattern.times(2,4)	模式发生2-4次
oneOrMore()	模式发生的次数	pattern.oneOrMore()	发生1次或多次
timesOrMore()	模式发生的次数	pattern.timesOrMore(2)	发生2次或多次
optional()	要么触发要么不触发	pattern.times(2).optional()	发生0次或2次
greedy()	贪心匹配	pattern.times(2,4).greedy()	触发2、3、4次，尽可能重复执行(应用在多个Pattern中)
until()	停止条件	pattern.oneOrMore().until(_ = 0)	遇都0结束
subtype()	定义子类型条件	pattern.subtype(Event.class)	只与Event事件匹配
within()	事件限制	pattern.within(Time.seconds(5))	匹配在5s内发生
begin()	定义规则开始	Pattern.begin("start")	定义规则开始，事件类型为Event
next()	严格邻近	start.next("next").where(_=1)	严格匹配下一个必需是1
followedBy()	宽松近邻	start.followdBy("middle").where()	会忽略没有成功匹配的模式条件
followedByAny()	非确定宽松近邻	start.followdByAny("middle").where()	可以忽略已经匹配的条件
notNext()	不让某个事件严格紧邻前一个事件发生	start.notNext("not").where(_=1)	下一个不能为1
notFollowedBy()	不让某个事件在两个事件之间发生	start.notFollowedBy("not").where()...	不希望某个事件在某两个事件之间
consecutive()	严格匹配	start.where(_=1).times(3).consecutive()	必需连续三个1才能匹配成功
allowCombinations()	不确定连续	start.where(_=1).times(2).allowCombinations()	只要满足两个1就可以匹配成功

案例

使用where()/or()匹配

public class SimpleConditionsTest {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment();
        streamEnv.setParallelism(1);

        //输入数据源
        DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
                new Event(1L, "a1", "add", 1588298400L),
                new Event(2L, "c1", "add", 1588298400L),
                new Event(3L, "a2", "add", 1588298400L),
                new Event(4L, "b1", "add", 1588298400L),
                new Event(5L, "b1", "add", 1588298400L),
                new Event(6L, "a3", "add", 1588298400L)
        ));

        //1、定义规则  匹配以a和c开头的用户
        Pattern<Event, Event> pattern = Pattern.<Event>begin("start").where(
                new SimpleCondition<Event>() {
                    @Override
                    public boolean filter(Event event) throws Exception {
                        return event.getName().startsWith("a");
                    }
                }
        ).or(
                        new SimpleCondition<Event>() {
                            @Override
                            public boolean filter(Event event) throws Exception {
                                return event.getName().startsWith("c");
                            }
                        }
                );

        //2、模式检测
        PatternStream<Event> patternStream = CEP.pattern(input, pattern);

        patternStream.select(
            new PatternSelectFunction<Event, String>() {
                //返回匹配数据的id
                @Override
                public String select(Map<String, List<Event>> map) throws Exception {
                    StringBuffer sb = new StringBuffer();
                    for (Map.Entry<String, List<Event>> entry : map.entrySet()) {
                        Iterator<Event> iterator = entry.getValue().iterator();
                        iterator.forEachRemaining(i -> sb.append(i.getId()).append(","));
                    }
                    sb.deleteCharAt(sb.length() - 1);
                    return sb.toString();
                }
            }
        ).print();

        streamEnv.execute("simpleCEPTest");

    }

}

输出结果
1
2
3
6
由结果可以看出数据是逐条匹配的。

使用量词匹配times()

输入数据：

//输入数据源
DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
        new Event(1L, "a1", "add", 1588298400L),
        new Event(2L, "c1", "add", 1588298400L),
        new Event(3L, "a2", "add", 1588298400L),
        new Event(4L, "b1", "add", 1588298400L),
        new Event(5L, "b1", "add", 1588298400L),
        new Event(6L, "a3", "add", 1588298400L)
));

Pattern规则：

Pattern.<Event>begin("start").where(
        new SimpleCondition<Event>() {
            @Override
            public boolean filter(Event event) throws Exception {
                return event.getName().startsWith("a");
            }
        }
).times(2);

输出结果：
1,3
3,6
匹配到两条以a开头的数据后输出（严格按照数据输入的顺序往后顺延）。先读到id = 1的记录，满足where条件；id = 2不满足则忽略该条数据；id = 3满足条件，
且此时已经读到过id = 1的记录，刚好两条记录，匹配成功输出。4、5不满足where条件，忽略；6满足，3、6组合为一组事件，满足规则。

Pattern规则：

Pattern.<Event>begin("start").where(
        new SimpleCondition<Event>() {
            @Override
            public boolean filter(Event event) throws Exception {
                return event.getName().startsWith("a");
            }
        }
).times(1,3);

输出结果：
1
1,3
3
1,3,6
3,6
6
匹配1、2、3次.输入id = 1,匹配成功；然后输入id = 3，此时匹配1、3（出现2次），再输出3（3匹配一次），以此类推。匹配的顺序与输入数据的顺序一致。
同理 oneOrMore()和timesOrMore()一样。

使用严格匹配consecutive()

输入数据：

DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
        new Event(1L, "a1", "add", 1588298400L),
        new Event(2L, "c1", "add", 1588298400L),
        new Event(3L, "a2", "add", 1588298400L),
        new Event(4L, "b1", "add", 1588298400L),
        new Event(5L, "b1", "add", 1588298400L),
        new Event(6L, "a3", "add", 1588298400L)
));

Pattern规则：

Pattern.<Event>begin("start").where(
        new SimpleCondition<Event>() {
            @Override
            public boolean filter(Event event) throws Exception {
                return event.getName().startsWith("a");
            }
        }
).times(2).consecutive();

输出结果：
此时输出结果为空，没有满足规则的数据。

更改输入数据源(id = 2的记录改为a1)：

DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
        new Event(1L, "a1", "add", 1588298400L),
        new Event(2L, "a1", "add", 1588298400L),
        new Event(3L, "a2", "add", 1588298400L),
        new Event(4L, "b1", "add", 1588298400L),
        new Event(5L, "b1", "add", 1588298400L),
        new Event(6L, "a3", "add", 1588298400L)
));

输出结果：
1,2
2,3
此时匹配到两组数据[1,2]和[2,3]，由此可以看出使用严格匹配consecutive()后，事件必须是紧邻的才满足，如果中间有不满足条件的事件则忽略，与next()类似。

使用不确定连续匹配allowCombinations()

输入数据：

//输入数据源
DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
        new Event(1L, "a1", "add", 1588298400L),
        new Event(2L, "c1", "add", 1588298400L),
        new Event(3L, "a2", "add", 1588298400L),
        new Event(4L, "b1", "add", 1588298400L),
        new Event(5L, "b1", "add", 1588298400L),
        new Event(6L, "a3", "add", 1588298400L)
));

Pattern规则：

Pattern.<Event>begin("start").where(
        new SimpleCondition<Event>() {
            @Override
            public boolean filter(Event event) throws Exception {
                return event.getName().startsWith("a");
            }
        }
).times(2).allowCombinations();

输出结果：
1,3
1,6
3,6
由结果可以看出，只要输入的数据流中有2条记录满足where条件，就会匹配成功。如果不使用allowCombinations()，不会输出[1,6]结果。allowCombinations()匹配成功的元素仍然可以与后面的元素继续匹配。

greedy()使用

greedy()只有在多个pattern中使用时才起作用。在单个Pattern中使用时与不加greedy()是一样的。

public static void main(String[] args) throws Exception {
    StreamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment();
    streamEnv.setParallelism(1);

    //输入数据源
    DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
            new Event(1L, "a1", "add", 1588298400L),
            new Event(2L, "a2", "add", 1588298400L),
            new Event(3L, "a12", "add", 1588298400L),
            new Event(4L, "b11", "add", 1588298400L),
            new Event(5L, "b12", "add", 1588298400L),
            new Event(6L, "a3", "add", 1588298400L)
    ));

    //1、定义规则  匹配以a和c开头的用户

    Pattern<Event, Event> pattern = Pattern.<Event>begin("start").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("a");
                }
            }
    ).times(2,3).next("middle").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().length() == 3;
                }
            }).times(1,2);

    //2、模式检测
    PatternStream<Event> patternStream = CEP.pattern(input, pattern);

    patternStream.select(
            new PatternSelectFunction<Event, String>() {
                //返回匹配数据的id
                @Override
                public String select(Map<String, List<Event>> map) throws Exception {
                    StringBuffer sb = new StringBuffer();
                    for (Map.Entry<String, List<Event>> entry : map.entrySet()) {
                        Iterator<Event> iterator = entry.getValue().iterator();
                        iterator.forEachRemaining(i -> sb.append(i.getName()).append(","));
                        sb.append("|").append(",");
                    }
                    sb.delete(sb.length() - 4 , sb.length() - 1);
                    return sb.toString();
                }
            }
    ).print();

    streamEnv.execute("simpleCEPTest");

}

匹配名称以a开头且下一个名称长度为3的规则。如果不加greedy()输出结果如下：
a1,a2,|,a12
a1,a2,a12,|,b11
a1,a2,|,a12,b11
a2,a12,|,b11
a1,a2,a12,|,b11,b12
a2,a12,|,b11,b12
其中分隔符|用来将匹配的第一个where和第二个where隔开，有结果可以看出a12这条记录会在两个where中都去匹配。
1、读入a1，满足start的判断a开头，暂存[a1];
2、读入a2，满足start的判断a开头，且上一个为[a1]，组合[a1,a2]满足整个start判断（2-3个a开头的），存入状态[a1,a2]；
3、读入a12，满足start的判断a开头，存入状态[a1,a2,a12],[a2,a12]；同时也满足middle判断，存入状态[a12]，此时整个条件都满足，输出结果[a1,a2,|,a12];
4、读入b11，满足middle判断，存入状态[a12,b11],[b11];此时整个条件都满足，输出[a1,a2,a12,|,b11],[a1,a2,|,a12,b11],[a2,a12,|,b11]
5、读入b12，满足middle判断，存入状态[b11,b12],[b12];此时整个条件满足，输出[a1,a2,a12,|,b11,b12],[a2,a12,|,b11,b12]；由于[b12]是紧邻[b11]的，所以这里不会跳过[b11]而单独应用[b12],因此没有单独与[b12]匹配的结果；
6、读入a3，满足start判断，但此时相当于又从头开始匹配了，存入状态[a3];

使用greedy()的Pattern：

Pattern.<Event>begin("start").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("a");
                }
            }
    ).times(2,3).greedy().next("middle").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().length() == 3;
                }
}).times(1,2);

输出结果：
a1,a2,a12,|,b11
a2,a12,|,b11
a1,a2,a12,|,b11,b12
a2,a12,|,b11,b12
由结果可以看出，相较于不加greedy()，匹配结果变少了，a12只与前一个where匹配，忽略第二个where。

Groups of patterns使用

next()使用

输入数据：

//输入数据源
DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
        new Event(1L, "a1", "add", 1588298400L),
        new Event(2L, "a2", "add", 1588298400L),
        new Event(3L, "c12", "add", 1588298400L),
        new Event(4L, "b11", "add", 1588298400L),
        new Event(5L, "b12", "add", 1588298400L),
        new Event(6L, "c3", "add", 1588298400L),
        new Event(7L, "c7", "add", 1588298400L)
));

Pattern:

 Pattern.<Event>begin("start").where(
        new SimpleCondition<Event>() {
            @Override
            public boolean filter(Event event) throws Exception {
                return event.getName().startsWith("a");
            }
    }).times(1,2).next("middle").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("c");
                }
            }
    ).times(1,2);

输出结果：
a1,a2,c12
a2,c12
a1,a2,c12,c3
a2,c12,c3
next()严格紧邻。
1、读入a1，满足start，存入状态中[a1]；
2、读入a2，满足start，存入状态中[a2],[a1,a2];
3、读入c1，满足middle，存入状态中间结果[c12]，同时满足整个条件，输出结果集[a1,a2,c12]、[a2,c12]；此时没有输出[a1,c12]，这是由于next()是严格紧邻的，不能跳过a2元素;
4、读入b11，不满足条件，忽略；
5、读入b12，不满足条件，忽略；
6、读入c3，满足middle，存入状态中[c12,c3]、[c3]，同时满足整个条件，输出结果[a1,a2,c12,c3]、[a2,c12,c3]。
7、读入c7，此时也不会与前面匹配，跟c3匹配一样。

followedBy()使用

输入数据：

//输入数据源
DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
        new Event(1L, "a1", "add", 1588298400L),
        new Event(2L, "a2", "add", 1588298400L),
        new Event(3L, "c12", "add", 1588298400L),
        new Event(4L, "b11", "add", 1588298400L),
        new Event(5L, "b12", "add", 1588298400L),
        new Event(6L, "c3", "add", 1588298400L),
        new Event(7L, "c7", "add", 1588298400L)
));

Pattern:

Pattern.<Event>begin("start").where(
        new SimpleCondition<Event>() {
            @Override
            public boolean filter(Event event) throws Exception {
                return event.getName().startsWith("a");
            }
    }).times(1,2).followedBy("followedBy").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("c");
                }
            }
    ).times(1,2);

输出结果：
a1,a2,c12
a1,c12
a2,c12
a1,a2,c12,c3
a1,c12,c3
a2,c12,c3
匹配1-2个以a开头的数据，同时包含1-2个以c开头的数据。
1、读入a1，满足start，存入状态中[a1]；
2、读入a2，满足start，存入状态中[a2],[a1,a2];
3、读入c12，满足followedBy，存入状态中间结果[c12]，同时满足整个条件，输出结果集[a1,a2,c12]、[a1,c12]、[a2,c12]；
4、读入b11，不满足条件，忽略；
5、读入b12，不满足条件，忽略；
6、读入c3，满足follwoedBy，存入状态中[c12,c3]、[c3]，同时满足整个条件，输出结果[a1,a2,c12,c3]、[a1,c12,c3]、[a2,c12,ce]。此时不会单独输出与c3的匹配，由于c12是紧跟在a1,a2后面的，因此在模式匹配时不能忽略c12元素因此没有与c3的单独匹配；
7、读入c7，此时也不会与前面匹配，跟c3匹配一样。
与next()不一样，followedBy()是宽松紧邻，缓存的中间状态a1(对比第三步)是可用的。

followedByAny()使用

输入数据：

//输入数据源
DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
        new Event(1L, "a1", "add", 1588298400L),
        new Event(2L, "a2", "add", 1588298400L),
        new Event(3L, "c12", "add", 1588298400L),
        new Event(4L, "b11", "add", 1588298400L),
        new Event(5L, "b12", "add", 1588298400L),
        new Event(6L, "c3", "add", 1588298400L),
        new Event(7L, "c7", "add", 1588298400L)
));

Pattern:

Pattern.<Event>begin("start").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("a");
                }
            }
    ).times(1,2).followedByAny("followedByAny").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("c");
                }
            }
).times(1,2);

输出结果：
a1,a2,c12
a1,c12
a2,c12
a1,a2,c12,c3
a1,a2,c3
a1,c12,c3
a1,c3
a2,c12,c3
a2,c3
a1,a2,c3,c7
a1,a2,c7
a1,c3,c7
a1,c7
a2,c3,c7
a2,c7
1、读入a1，满足start条件，存入状态[a1];
2、读入a2，满足start条件，存入状态[a1,a2],[a2];
3、读入c12，满足followedByAny条件，存入状态[c12]；同时满足整个条件，此时输出结果[a1,a2,c12]、[a1,c12]、[a2,c12];
4、读入b11，不满足，忽略；
5、读入b12，不满足，忽略；
6、读入c3，满足followedByAny条件，存入状态[c12,c3],[c3];同时满足整个条件，此时输出结果[a1,a2,c12,c3]、[a1,a2,c3]、[a1,c12,c3]、[a1,c3]、[a2,c12,c3]、[a2,c3];
7、读入c7，满足followedByAny条件，存入状态[c3,c7],[c7];同时满足整个条件，此时输出结果[a1,a2,c3,c7]、[a1,a2,c7]、[a1,c3,c7]、[a1,c7]、[a2,c3,c7]、[a2,c7];
followedByAny()非确定宽松邻近，由结果可以看出c3可以跳过c12与前面数据匹配(对比followedByAny)，只要满足a在c前面即可。

until()使用

输入数据：

//输入数据源
DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
        new Event(1L, "a1", "add", 1588298400L),
        new Event(2L, "a2", "add", 1588298400L),
        new Event(3L, "c12", "add", 1588298400L),
        new Event(4L, "b11", "add", 1588298400L),
        new Event(5L, "b12", "add", 1588298400L),
        new Event(6L, "c3", "add", 1588298400L),
        new Event(7L, "c7", "add", 1588298400L)
));

Pattern:

Pattern.<Event>begin("start").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("a");
                }
            }
    ).oneOrMore().until(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("b");
                }
            }
    );

输出结果：
a1
a1,a2
a2
unitl()只能跟在oneOrMore()/timesOrMore()这种后面，定义结束条件，上面匹配一个或多个以a开头的，直到b开头的停止。
1、输入a1，满足start，存入状态[a1]；
2、输入a2，满足start，存入状态[a1,a2],[a2]；
3、输入c12，不满足start，忽略；
4、输入b11，满足until条件，停止检测；此时输出结果[a1]、[a1,a2]、[a2]
5、输入b12，虽然满足until条件，但是在b11时已经触发了结果，此时不会再处理触发，而是从start开始重新再匹配关系；
6、输入c3，不满足start，忽略；
7、输入c7，不满足，忽略。

notNext()使用

输入数据：

DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
        new Event(1L, "a1", "add", 1588298400L),
        new Event(2L, "a2", "add", 1588298400L),
        new Event(3L, "c12", "add", 1588298400L),
        new Event(4L, "b11", "add", 1588298400L),
        new Event(5L, "b12", "add", 1588298400L),
        new Event(6L, "c3", "add", 1588298400L),
        new Event(7L, "c7", "add", 1588298400L)
));

Pattern:

 Pattern.<Event>begin("start").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("a");
                }
            }
    ).times(1,2).notNext("notNext").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("c");
                }
            }
    );

输出结果：
a1
notNext(),不让某个事件严格紧邻前一个事件。匹配1-2个以a开头的，但是后面不能跟以c开头的。
1、读入a1,满足start,存入状态[a1];
2、读入a2,满足start,存入状态[a1,a2]，[a2];
3、读入c12，满足notNext()，由于a后面不能跟c，所以此时只输出a1,[a1,a2,c12]、[a2,c12]是满足判断条件的；
4、依次读入后面数据，与上面判断一样。

notFollowedBy()使用

输入数据：

//输入数据源
DataStream<Event> input = streamEnv.fromCollection(Arrays.asList(
        new Event(1L, "a1", "add", 1588298400L),
        new Event(2L, "a2", "add", 1588298400L),
        new Event(3L, "c12", "add", 1588298400L),
        new Event(4L, "b11", "add", 1588298400L),
        new Event(5L, "b12", "add", 1588298400L),
        new Event(6L, "c3", "add", 1588298400L),
        new Event(7L, "c7", "add", 1588298400L)
));

Pattern:

 Pattern.<Event>begin("start").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("a");
                }
            }
    ).times(1,2).notFollowedBy("notFollowed").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("b");
                }
            }
    ).next("next").where(
            new SimpleCondition<Event>() {
                @Override
                public boolean filter(Event event) throws Exception {
                    return event.getName().startsWith("c");
                }
            }
    );

输出结果：
a1,a2,c12
a2,c12
notFollowedBy(),不让某个事件在两个事件之间发生，模式序列不能以.notFollowedBy()结束。此处不能让以b开头的数据出现在以a和c开头的数据之间。
1、读入a1，满足start，存入状态[a1];
2、读入a2，满足start，存入状态[a1,a2]、[a2]；
3、读入c12，满足条件，存入状态[a1,a2,c12]、[a2,c12];
4、读入b11，暂不处理；
5、读入b12，暂不处理；
6、读入c3，此时整个匹配规则满足，[a1,a2,c12,b11,b12,c3]、[a2,c12,b11,b12,c3]，a和c中间出现了b的状态；满足条件表达式。
将结果输出[a1,a2,c12]、[a2,c12]
7、读入c7，此时读入c7需要从头开始判断了；因为c3时已经触发了。

winthin()使用

案例：

public static void main(String[] args) throws Exception {
    StreamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment();

    //定义事件时间
    streamEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
    streamEnv.setParallelism(1);

    DataStream<LoginEvent> input = streamEnv.fromCollection(Arrays.asList(
            new LoginEvent(1, "张三", "fail", 1577080457L),
            new LoginEvent(2, "张三", "fail", 1577080458L),
            new LoginEvent(3, "张三", "fail", 1577080460L),
            new LoginEvent(4, "李四", "fail", 1577080458L),
            new LoginEvent(5, "李四", "success", 1577080462L),
            new LoginEvent(6, "张三", "fail", 1577080462L)
    ))
      //注册watermark  乱序时间为0
      .assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks<LoginEvent>() {

        long maxEventTime = 0L;

        @Nullable
        @Override
        public Watermark getCurrentWatermark() {
            return new Watermark(maxEventTime);
        }

        @Override
        public long extractTimestamp(LoginEvent loginEvent, long l) {
            return loginEvent.getEventTime() * 1000;
        }
    });

    //1、定义规则
    //匹配一个恶意登录的模式（如果一个用户连续（在10秒内）失败三次，则是恶意登录）
    //每个规则都是以begin开始   把每个规则直接定义出来
    /*
        模式序列
        1、严格邻近 next：所有事件都按照顺序满足模式条件，不允许忽略任意不满足的模式。
        2、宽松邻近 followedBy：会忽略没有 成功匹配的模式条件
        3、非确定宽松邻近 followedByAny：和宽松邻近条件相比，非确定宽松邻近条件指在 模式匹配过程中可以忽略已经匹配的条件
     */
    Pattern<LoginEvent, LoginEvent> nextPattern = Pattern.<LoginEvent>begin("start")
            //第一个fail
            .where(new SimpleCondition<LoginEvent>() {
                @Override
                public boolean filter(LoginEvent loginEvent) throws Exception {
                    return "fail".equals(loginEvent.getEventType());
                }
            }).next("fail2").where(new SimpleCondition<LoginEvent>() {
                //第二个fail
                @Override
                public boolean filter(LoginEvent loginEvent) throws Exception {
                    return "fail".equals(loginEvent.getEventType());
                }
            }).next("fail3").where(new SimpleCondition<LoginEvent>() {
                //第三个fail
                @Override
                public boolean filter(LoginEvent loginEvent) throws Exception {
                    return "fail".equals(loginEvent.getEventType());
                }
            })
            //时间限制  10秒内进行匹配，超过这个范围则失效
            .within(Time.seconds(10));

    //2、模式检测   需要按照用户分组
    PatternStream<LoginEvent> patternStream = CEP.pattern(input.keyBy("userName"), nextPattern);

    /*
        3、选择结果
        3.1  select function 抽取正常事件
        3.2  flat select function 抽取正常事件，可以返回任意数量的结果
        3.3  process function
     */
    SingleOutputStreamOperator<String> result = patternStream.select(new PatternSelectFunction<LoginEvent, String>() {
        /**
         * map中的key为模式序列中pattern的名称，value为对应的pattern所接受的事件集合
         *
         * @param map
         * @return
         * @throws Exception
         */
        @Override
        public String select(Map<String, List<LoginEvent>> map) throws Exception {
            StringBuffer sb = new StringBuffer();
            String userName = null;
            for (Map.Entry<String, List<LoginEvent>> entry : map.entrySet()) {
                String patternName = entry.getKey();
                List<LoginEvent> patternValue = entry.getValue();
                System.out.println(patternName + ":" + patternValue.toString());
                if (userName == null) {
                    userName = patternValue.get(0).getUserName();
                }
                sb.append(patternValue.get(0).getEventTime()).append(",");
            }
            return userName + " -> " + sb.toString();
        }
    });

    //打印匹配结果
    result.print("result:");

    streamEnv.execute("cepTest");

}

输出结果：
start:[LoginEvent{id=1, userName='张三', eventType='fail', eventTime=2019-12-23 13:54:17.0}]
fail2:[LoginEvent{id=2, userName='张三', eventType='fail', eventTime=2019-12-23 13:54:18.0}]
fail3:[LoginEvent{id=3, userName='张三', eventType='fail', eventTime=2019-12-23 13:54:20.0}]
result:> 张三 -> 1577080457,1577080458,1577080460,
start:[LoginEvent{id=2, userName='张三', eventType='fail', eventTime=2019-12-23 13:54:18.0}]
fail2:[LoginEvent{id=3, userName='张三', eventType='fail', eventTime=2019-12-23 13:54:20.0}]
fail3:[LoginEvent{id=6, userName='张三', eventType='fail', eventTime=2019-12-23 13:54:22.0}]
result:> 张三 -> 1577080458,1577080460,1577080462,
按name分组匹配10秒内连续三次fail的数据（10秒钟内匹配有效）。张三在1577080457-1577080462期间总fail4次,因此输出两个结果。

更新数据，张三最后一次登录时间：

 new LoginEvent(6, "张三", "fail", 1577080468L)

输出结果：
start:[LoginEvent{id=1, userName='张三', eventType='fail', eventTime=2019-12-23 13:54:17.0}]
fail2:[LoginEvent{id=2, userName='张三', eventType='fail', eventTime=2019-12-23 13:54:18.0}]
fail3:[LoginEvent{id=3, userName='张三', eventType='fail', eventTime=2019-12-23 13:54:20.0}]
result:> 张三 -> 1577080457,1577080458,1577080460,
由于第2条数据和第6条数据相差10s，所以导致后面一次匹配无效，必须在10秒内完成。

注意

1、所有模式序列必须以.begin()开始;
2、模式序列不能以.notFollwedBy()结束；
3、"not"类型的模式不能被optional所修饰；

总结

本文只针对Flink CEP的各个模式及API进行初步使用，个人理解可能也存在偏差。文中如有表述不当地方欢迎指正，大家相互学习。

参考文章 https://juejin.im/post/5de1f32af265da05cc3190f9

posted @ 2020-05-11 09:56 kevin_cy 阅读(1090) 评论(0) 编辑收藏举报

刷新页面返回顶部

kevin_cy

Flink CEP简单使用

环境准备

基本概念

事件定义

事件关系

API用法

案例

使用where()/or()匹配

使用量词匹配times()

使用严格匹配consecutive()

使用不确定连续匹配allowCombinations()

greedy()使用

Groups of patterns使用

until()使用

notNext()使用

notFollowedBy()使用

winthin()使用

注意

总结

公告