【原创】StreamInsight查询系列(十三)——查询模式之基本模式
上篇文章介绍了查询模式中事件对齐部分,这篇博文将介绍基本模式。
基本模式
问题1:怎样检查事件B是否位于事件A发生后的90秒内?
让我们用一个例子来回答这个问题,先准备一些测试数据:
var sourceDataAB = new[] { new { SourceId = "A", Value = 22, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:12:00 PM") }, new { SourceId = "A", Value = 24, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:13:00 PM") }, new { SourceId = "A", Value = 31, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:14:00 PM") }, new { SourceId = "A", Value = 67, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:15:00 PM") }, new { SourceId = "A", Value = 54, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:16:00 PM") }, new { SourceId = "A", Value = 50, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:30:00 PM") }, new { SourceId = "A", Value = 87, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:35:00 PM") }, }; var sourceAB = sourceDataAB.ToPointStream(Application, ev => PointEvent.CreateInsert(ev.TimeStamp.ToLocalTime(), ev), AdvanceTimeSettings.StrictlyIncreasingStartTime);
下面我们试图在sourceAB中找出一个事件A发生后90s内的另一个事件B,且事件B的Value值比事件A的Value值大上30:
var resultAB = from first in sourceAB.AlterEventDuration(e => TimeSpan.FromSeconds(90)) join second in sourceAB on first.SourceId equals second.SourceId where second.Value > first.Value + 30 select new { second.SourceId, second.Value, delta = second.Value - first.Value };
上述代码首先延伸了原有事件流中的所有点类型事件的持续时间到90秒,而后与原有流进行联接操作(请回忆一下StreamInsight一下联接的两个要素),找出Value值相差30的事件B以及差值,最终得到一个输出事件如下:
问题2:怎样判断事件A之后的5分钟内是否出现事件B?
同样,我们用一个例子来做介绍。首先准备数据源:
var sourceDataNoAB = new [] { new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:00:00 PM") }, new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:10:00 PM") }, new { SourceId = "A", Value = 0, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:12:00 PM") }, new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:13:00 PM") }, new { SourceId = "A", Value = 0, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:14:00 PM") }, new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:15:00 PM") }, new { SourceId = "A", Value = 0, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:20:00 PM") }, new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:30:00 PM") }, new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:35:00 PM") }, }; var sourceNoAB = sourceDataNoAB.ToPointStream(Application, ev => PointEvent.CreateInsert(ev.TimeStamp.ToLocalTime(), ev), AdvanceTimeSettings.StrictlyIncreasingStartTime);
在上面的静态数据中,我们使用了SourceId来标识事件类型。为了能够达到检测事件B是否出现在事件A后的5分钟内,这里介绍一个可以想到的方法:将所有事件A的起始时间往后挪动5分钟,并将事件B事件持续时间延长5分钟,最后进行左反半部联接。关于左反半部联接,后面会有单独的一篇用来介绍,这里大家可以将其理解为集合中的A-B操作。实现代码如下:
将所有的事件A整体向后移动5分钟:
var forwardA = (from e in sourceNoAB where e.SourceId == "A" select e).ShiftEventTime(e => e.StartTime + TimeSpan.FromMinutes(5));
将所有的事件B持续时间延伸为5分钟:
var stretchB = (from e in sourceNoAB where e.SourceId == "B" select e).AlterEventDuration(e => TimeSpan.FromMinutes(5));
最后进行左反半部联接:
var resultAB = from e in forwardA where (from p in stretchB select p).IsEmpty() select e;
结果如下,仅有发生在"10/23/2009 4:20:00 PM"的事件A之后5分钟内有事件B发生:
注:另外一种方法可以直接使用AlterEventLifeTime将事件B开始时间移动到5分钟前,并将持续时间设为5分钟,这么做的好处在于不用修改事件a的生命周期,问题4中会显示该怎样操作。
问题3:怎样检测A、B、C三类事件是否发生在各自5分钟内?
让我们再次以一个简单的例子来介绍如何解决上述问题。
首先创建一个基本的数据流。在这个例子中,我们希望找出发生在5分钟内的(2)、(3)和(4)号事件。
int[] data = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; var inputStream = data.ToPointStream(Application, payload => PointEvent.CreateInsert(DateTime.Now + TimeSpan.FromMinutes(payload), new { payload }), AdvanceTimeSettings.IncreasingStartTime);
将原有事件流持续时间拉伸至5分钟,并与自身进行联接已找到某个事件点,使得2号事件和3号事件都在那段时间同时出现(在5分钟的窗口内):
var selfJoin = from e1 in inputStream.AlterEventDuration(e => TimeSpan.FromMinutes(5)) from e2 in inputStream.AlterEventDuration(e => TimeSpan.FromMinutes(5)) where e1.payload == 2 && e2.payload == 3 select new { a = e1.payload, b = e2.payload };
将结果事件流selfJoin再次与自身进行联接,以使得4号事件也在时间窗口内:
var selfJoin2 = from e3 in inputStream.AlterEventDuration(e => TimeSpan.FromMinutes(5)) from e1 in selfJoin where e1.a == 2 && e1.b == 3 && e3.payload == 4 select new { a = e1.a, b = e1.b, c = e3.payload };
输出结果如下:
问题4:怎样找出5分钟内A、B、C三个事件不都发生的事件(以事件A为基准)?
解决问题4的一个比较好的方法是采用“排除法”:即先计算出所有5分钟内A、B、C三个事件都发生的事件,然后和原始事件流做一次左反半部联接得到结果。
首先准备数据源:
var sourceData = new[] { new { StartTime = new DateTime(2009, 6, 25, 0, 00, 00), ID = "B"}, new { StartTime = new DateTime(2009, 6, 25, 0, 00, 01), ID = "C"}, new { StartTime = new DateTime(2009, 6, 25, 0, 00, 02), ID = "A"}, new { StartTime = new DateTime(2009, 6, 25, 0, 05, 00), ID = "A"}, new { StartTime = new DateTime(2009, 6, 25, 0, 11, 00), ID = "A"}, new { StartTime = new DateTime(2009, 6, 25, 0, 11, 00), ID = "B"}, new { StartTime = new DateTime(2009, 6, 25, 0, 11, 00), ID = "C"}, new { StartTime = new DateTime(2009, 6, 25, 0, 15, 59), ID = "A"}, new { StartTime = new DateTime(2009, 6, 25, 0, 15, 59), ID = "A"}, new { StartTime = new DateTime(2009, 6, 25, 0, 16, 00), ID = "A"}, new { StartTime = new DateTime(2009, 6, 25, 0, 16, 00), ID = "A"}, new { StartTime = new DateTime(2009, 6, 25, 0, 18, 00), ID = "B"}, new { StartTime = new DateTime(2009, 6, 25, 0, 20, 59), ID = "C"}, new { StartTime = new DateTime(2009, 6, 25, 0, 25, 59), ID = "A"}, new { StartTime = new DateTime(2009, 6, 25, 0, 26, 01), ID = "B"}, new { StartTime = new DateTime(2009, 6, 25, 0, 29, 59), ID = "A"}, new { StartTime = new DateTime(2009, 6, 25, 0, 30, 59), ID = "C"}, }; var source = sourceData.ToPointStream(Application, ev => PointEvent.CreateInsert(ev.StartTime.ToLocalTime(), ev), AdvanceTimeSettings.IncreasingStartTime);
为了表述方便,这里还定义了几个常量时间:
// 设置窗口大小和一个较小偏移时间 var windowSize = TimeSpan.FromMinutes(5); var oneTick = new TimeSpan(1);
使用过滤操作分别得到仅包含事件A、事件B和事件C的事件流:
// 根据ID过滤出相应事件流 var aStream = from e in source where e.ID == "A" select e; var bStream = from e in source where e.ID == "B" select e; var cStream = from e in source where e.ID == "C" select e;
首先找出事件B发生在事件A后5分钟内的结果:
// 找出事件A发生5分钟内出现时间B的联接结果 // 完成这个操作可以通过将事件B向后移动5分钟*,延伸持续时间为5分钟并事件A进行联接得到结果 var abStream = from a in aStream // *注:由于事件生命期区间都是左闭右开的,因此当我们改变B的生命期时, // 需要加上一个刻度来确保原有的StartTime包含在改变后的生命期之内 from b in bStream.AlterEventLifetime(e => e.StartTime + oneTick - windowSize, e => windowSize) where true select a;接下去在abStream的基础上找出事件C发生在事件A后5分钟内的结果:
var abcStream = from a in abStream from c in cStream.AlterEventLifetime(e => e.StartTime + oneTick - windowSize, e => windowSize) where true select a;
使用左反半部联接找出事件A后5分钟内事件B、事件C不都发生的事件流:
// 找出事件A后5分钟内事件B、事件C不都发生的事件流 var result2 = from a in aStream where (from abc in abcStream where true select abc).IsEmpty() select a;
最终结果result2输出如下:
下一篇将介绍StreamInsight查询模式中的相异计数部分。