Flink CEP实例及基础应用
1.flink CEP描述
CEP(Complex Event Processing)就是在无界事件流中检测事件模式,使能够掌握数据中重要的部分。
2.flink CEP编程的四个步骤
1>.输入数据流的创建
2>.模式(Pattern)定义
3>.Pattern应用在事件流上的检测
4>.选取结果
3.常用的个体连续连续模式:
严格连续模式,松散连续,不确定的松散连续。当然还有严格连续的NOT模式和松散连续的NOT模式,这两种并不常用,下面代码举例说明常用的三种模式
flink CEP编程需要导入的lib包
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-cep_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
package org.stsffap.cep.monitoring;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.IterativeCondition;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class MyCEPTest {
public static void main(String args[]) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> dataStream = env.fromElements(("a"), ("c"), ("b1"), ("b2"));
/*---------严格连续模式----------------------*/
Pattern strictPattern = Pattern.begin("start").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object s, Context<Object> context) {
return s.toString().equalsIgnoreCase("a");
}
}).next("middle").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object o, Context<Object> context) {
return o.toString().contains("b");
}
});
CEP.pattern(dataStream, strictPattern).select(map -> {
System.out.println("strictPattern:" + map.get("start").toString());
System.out.println("strictPattern:" + map.get("middle").toString());
return map;
}).print();
/*---------------------------------------------*/
/*---------松散连续----------------------*/
Pattern relaxedPattern = Pattern.begin("start").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object s, Context<Object> context) {
return s.toString().equalsIgnoreCase("a");
}
}).followedBy("middle").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object o, Context<Object> context) {
return o.toString().contains("b");
}
});
CEP.pattern(dataStream, relaxedPattern).select(map -> {
System.out.println("relaxedPattern:" + map.get("start").toString());
System.out.println("relaxedPattern:" + map.get("middle").toString());
return map;
}).print();
/*---------------------------------------------*/
/*---------不确定的松散连续----------------------*/
Pattern nonDeterminPattern = Pattern.begin("start").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object s, Context<Object> context) {
return s.toString().equalsIgnoreCase("a");
}
}).followedByAny("middle").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object o, Context<Object> context) {
return o.toString().contains("b");
}
});
CEP.pattern(dataStream, nonDeterminPattern).select(map -> {
System.out.println("nonDeterminPattern:" + map.get("start").toString());
System.out.println("nonDeterminPattern:" + map.get("middle").toString());
return map;
}).print();
/*---------------------------------------------*/
env.execute("Flink CEP Test");
}
}
输出结果
nonDeterminPattern:[a]
nonDeterminPattern:[b1]
relaxedPattern:[a]
relaxedPattern:[b1]
nonDeterminPattern:[a]
nonDeterminPattern:[b2]
2> {start=[a], middle=[b2]}
1> {start=[a], middle=[b1]}
1> {start=[a], middle=[b1]}
可以看出严格的连续模式并没有输出结果,因为a和b之间有c,而松散连续输出的结果为(a,b1),不确定的松散连续(a,b1),(a,b2)
4.组合模式举例
上面举例只说明的个体模式较为简单,现在举例说明一个稍微复杂的组合模式举例
a b+c模式:a和b之间是松散连续,b和c之间是严格连续
DataStream<String> dataStream = env.fromElements(("a"), ("b1"), ("d1"), ("b2"),("d2"),("b3"),("c"));
//a b+c模式:a和b之间是松散连续,b和c之间是严格连续
Pattern pattern = Pattern.begin("start").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object s, Context<Object> context) {
return s.toString().equalsIgnoreCase("a");
}
}).followedBy("middle").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object o, Context<Object> context) {
return o.toString().contains("b");
}
}).oneOrMore().next("last").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object o, Context<Object> context) {
return o.toString().contains("c");
}
});
CEP.pattern(dataStream, pattern).select(map -> {
System.out.println("pattern:" + map.get("start").toString());
System.out.println("pattern:" + map.get("middle").toString());
System.out.println("pattern:" + map.get("last").toString());
return map;
}).print();
输出结果为
pattern:[a]
pattern:[b1, b2, b3]
pattern:[c]
1> {start=[a], middle=[b1, b2, b3], last=[c]}
//a+b c模式:a和b之间是严格连续,b和c之间是松散连续
DataStream<String> dataStream = env.fromElements(("a"), ("b1"), ("d1"), ("b2"),("d2"),("b3"),("c"));
//a+b c模式:a和b之间是严格连续,b和c之间是松散连续
Pattern pattern = Pattern.begin("start").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object s, Context<Object> context) {
return s.toString().equalsIgnoreCase("a");
}
}).next("middle").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object o, Context<Object> context) {
return o.toString().contains("b");
}
}).oneOrMore().followedBy("last").where(new IterativeCondition<Object>() {
@Override
public boolean filter(Object o, Context<Object> context) {
return o.toString().contains("c");
}
});
CEP.pattern(dataStream, pattern).select(map -> {
System.out.println("--------------------------------------");
System.out.println("pattern:" + map.get("start").toString());
System.out.println("pattern:" + map.get("middle").toString());
System.out.println("pattern:" + map.get("last").toString());
return map;
}).print();
输出结果为:
--------------------------------------
pattern:[a]
pattern:[b1, b2, b3]
pattern:[c]
--------------------------------------
pattern:[a]
pattern:[b1, b2]
pattern:[c]
--------------------------------------
pattern:[a]
pattern:[b1]
pattern:[c]
3> {start=[a], middle=[b1], last=[c]}
1> {start=[a], middle=[b1, b2, b3], last=[c]}
2> {start=[a], middle=[b1, b2], last=[c]}
5.flink CEP应用场景及总结
flink CEP在实时流数据处理应用中并不仅仅上面介绍的这么简单,还有更多复杂的应用,具体可参照flink官方(https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/dev/libs/cep.html)。