【编译】StreamInsight的LINQ示例101
原文链接:http://blogs.msdn.com/b/masimms/archive/2010/09/16/101-ish-linq-samples-for-streaminsight.aspx
(译者注:Mark Simms在原文中将其称为StreamInsight LINQ示例介绍的第1部分,即过滤和聚合,然而我发现并没有第2部分或者后续的内容,因此我在标题并没有加入第1部分的字样。)
和LINQ示例101一样,我也写了一个针对StreamInsight的LINQ系列,我在考虑以后把它们放在StreamInsight开发者中心中以备查阅。
你可以使用这个应用程序框架(包含数据源)来运行下述查询。
这篇博文中我会重点介绍过滤以及窗口聚合,具体内容如下:
- 过滤
- 窗口和聚合
- 怎样计算聚合,如计算一系列事件的平均值?
- 怎样计算过去5秒内的所有事件的平均值?
- 怎样每隔2秒计算过去5秒内的所有事件的平均值?
- 怎样在每当一个新的事件进入时,计算过去5秒内的所有事件的平均值?
- 怎样计算过去5个事件的平均值?
- 怎样每隔2秒计算事件进入数目?
- 怎样计算过去10秒内每个事件组的平均值?
- 怎样每隔10秒计算过去20秒内的每个事件组的平均值?
- 怎样在每当一个新的事件进入事件组时,计算过去10秒内的每个事件组的平均值?
- 怎样每隔10秒找出具有最大值的事件?
- 怎样每隔5秒的找出过去10秒内具有两个最小值的事件?
- 怎样找出过去10秒内具有最高平均值的事件组?
- 怎样每隔5秒找出过去10秒内具有最高值的事件组?
- 怎样在一个特定的时间段内(例如5分钟的时间窗口)找出具有相同ID值的事件?
var filterQuery = from e in inputStream where e.Value > 20 select e;
[输入数据] 结果:
SimpleFilter,Point,12:00:02.000,1001,77 SimpleFilter,Point,12:00:03.000,1001,44 SimpleFilter,Point,12:00:04.000,1001,22 SimpleFilter,Point,12:00:05.000,1001,51 SimpleFilter,Point,12:00:06.000,1001,46 SimpleFilter,Point,12:00:07.000,1001,71 SimpleFilter,Point,12:00:08.000,1001,37 SimpleFilter,Point,12:00:09.000,1001,45 |
var filterQuery = from e in inputStream where e.Value > 70 && e.SensorId == 1001 select e;
[输入数据] 结果:
MediumFilter,Point,12:00:01.001,1001,77 MediumFilter,Point,12:00:03.010,1001,72 MediumFilter,Point,12:00:12.082,1001,73 MediumFilter,Point,12:00:14.145,1001,75 MediumFilter,Point,12:00:14.154,1001,71 MediumFilter,Point,12:00:15.163,1001,74 MediumFilter,Point,12:00:15.172,1001,73 |
所有的聚合操作,例如求平均值,最小值,最大值等等都针对的是事件集合。而StreamInsight中的事件集合均定义在一个时间窗口内。窗口操作或基于一段时间(跳跃窗口,翻转窗口和滑动窗口),或基于特定数量的事件(计数窗口)。
var query = from window in inputStream.TumblingWindow( TimeSpan.FromSeconds(5), HoppingWindowOutputPolicy.ClipToWindowEnd) select new { Average = window.Avg(e => e.Value) };
[输入数据] 结果:
SimpleTumblingWindow,Point,12:00:00.000,32.2 SimpleTumblingWindow,Point,12:00:05.000,50 |
var query = from window in inputStream.HoppingWindow( TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(2), HoppingWindowOutputPolicy.ClipToWindowEnd) select new { Average = window.Avg(e => e.Value) };
[输入数据] 结果:
SimpleHoppingWindow,Point,11:59:56.000,14 SimpleHoppingWindow,Point,11:59:58.000,31.66666 SimpleHoppingWindow,Point,12:00:00.000,32.2 SimpleHoppingWindow,Point,12:00:02.000,48 SimpleHoppingWindow,Point,12:00:04.000,45.4 SimpleHoppingWindow,Point,12:00:06.000,49.75 SimpleHoppingWindow,Point,12:00:08.000,41 |
怎样在每当一个新的事件进入时,计算过去5秒内的所有事件的平均值?
var query = from window in inputStream .AlterEventDuration(e => TimeSpan.FromSeconds(5)) .SnapshotWindow(SnapshotWindowOutputPolicy.Clip) select new { Average = window.Avg(e => e.Value) };
[输入数据] 结果:
SimpleSlidingWindow,Point,12:00:00.000,14 SimpleSlidingWindow,Point,12:00:01.000,9 SimpleSlidingWindow,Point,12:00:02.000,31.66 SimpleSlidingWindow,Point,12:00:03.000,34.75 SimpleSlidingWindow,Point,12:00:04.000,32.2 SimpleSlidingWindow,Point,12:00:05.000,39.6 SimpleSlidingWindow,Point,12:00:06.000,48 SimpleSlidingWindow,Point,12:00:07.000,46.8 SimpleSlidingWindow,Point,12:00:08.000,45.4 SimpleSlidingWindow,Point,12:00:09.000,50 SimpleSlidingWindow,Point,12:00:10.000,49.75 SimpleSlidingWindow,Point,12:00:11.000,51 SimpleSlidingWindow,Point,12:00:12.000,41 SimpleSlidingWindow,Point,12:00:13.000,45 |
使用计数窗口对一定数量的事件进行聚合。StreamInsight版本1中的计数窗口不可以使用内置的聚合函数(如Count,Avg,Min和Max),取而代之的必须使用用户自定义聚合操作。下面的代码样例展示了在计数窗口中使用用户自定义聚合操作:
var query = from window in inputStream.CountByStartTimeWindow( 5, CountWindowOutputPolicy.PointAlignToWindowEnd) select new { Average = window.UserDefinedAverage(e => e.Value) }; /// <summary> /// 计数窗口使用的简单平均值聚合函数 /// </summary> public class SimpleAggregate : CepAggregate<double, double> { public override double GenerateOutput(IEnumerable<double> payloads) { return payloads.Average(); } } /// <summary> /// LINQ 扩展方法 /// </summary> public static class MyExtensions { [CepUserDefinedAggregate(typeof(SimpleAggregate))] public static double UserDefinedAverage<InputT>(this CepWindow<InputT> window, Expression<Func<InputT, double>> map) { throw CepUtility.DoNotCall(); } }
[输入数据] 结果:
SimpleCountWindow,Point,12:00:04.000,32.2 SimpleCountWindow,Point,12:00:05.000,39.6 SimpleCountWindow,Point,12:00:06.000,48 SimpleCountWindow,Point,12:00:07.000,46.8 SimpleCountWindow,Point,12:00:08.000,45.4 SimpleCountWindow,Point,12:00:09.000,50 |
var query = from window in inputStream.TumblingWindow( TimeSpan.FromSeconds(2), HoppingWindowOutputPolicy.ClipToWindowEnd) select new { Count = window.Count() };
[输入数据] 结果:
SimpleEventCount,Point,12:00:00.000,54 SimpleEventCount,Point,12:00:02.000,54 SimpleEventCount,Point,12:00:04.000,54 SimpleEventCount,Point,12:00:06.000,54 SimpleEventCount,Point,12:00:08.000,54 SimpleEventCount,Point,12:00:10.000,54 SimpleEventCount,Point,12:00:12.000,54 SimpleEventCount,Point,12:00:14.000,54 SimpleEventCount,Point,12:00:16.000,54 |
这个例子中,我们每隔10秒计算一次每个传感器的平均值。首先将事件按照传感器ID进行分组,接下去计算每一个组的平均值(结果包含了事件组键——这个例子中是传感器的ID)。
var query = from e in inputStream group e by e.SensorId into sensorGroups from window in sensorGroups.TumblingWindow( TimeSpan.FromSeconds(10), HoppingWindowOutputPolicy.ClipToWindowEnd) select new { SensorId = sensorGroups.Key, Average = window.Avg(e => e.Value) };
[输入输入数据] 结果:
GroupedTumblingWindow,Point,12:00:00.000,42.1,1001 GroupedTumblingWindow,Point,12:00:00.000,34.6,1002 GroupedTumblingWindow,Point,12:00:00.000,39.7,1003 GroupedTumblingWindow,Point,12:00:00.000,50.9,1004 GroupedTumblingWindow,Point,12:00:00.000,30.7,1005 GroupedTumblingWindow,Point,12:00:00.000,38.3,1006 GroupedTumblingWindow,Point,12:00:00.000,36.8,1007 GroupedTumblingWindow,Point,12:00:00.000,41.7,1008 GroupedTumblingWindow,Point,12:00:00.000,34.4,1009 GroupedTumblingWindow,Point,12:00:10.000,46.375,1001 GroupedTumblingWindow,Point,12:00:10.000,40.2,1002 GroupedTumblingWindow,Point,12:00:10.000,36.1,1003 GroupedTumblingWindow,Point,12:00:10.000,37.3,1004 GroupedTumblingWindow,Point,12:00:10.000,31.8,1005 GroupedTumblingWindow,Point,12:00:10.000,41.4,1006 GroupedTumblingWindow,Point,12:00:10.000,40.3,1007 GroupedTumblingWindow,Point,12:00:10.000,43.5,1008 GroupedTumblingWindow,Point,12:00:10.000,43.3,1009 |
var query = from e in inputStream group e by e.SensorId into sensorGroups from window in sensorGroups.HoppingWindow( TimeSpan.FromSeconds(20), TimeSpan.FromSeconds(10), HoppingWindowOutputPolicy.ClipToWindowEnd) select new { SensorId = sensorGroups.Key, Average = window.Avg(e => e.Value) };
[输入数据] 结果:
GroupedHoppingWindow,Point,11:59:50.000,42.1,1001 GroupedHoppingWindow,Point,11:59:50.000,34.6,1002 GroupedHoppingWindow,Point,11:59:50.000,39.7,1003 GroupedHoppingWindow,Point,11:59:50.000,50.9,1004 GroupedHoppingWindow,Point,11:59:50.000,30.7,1005 GroupedHoppingWindow,Point,11:59:50.000,38.3,1006 GroupedHoppingWindow,Point,11:59:50.000,36.8,1007 GroupedHoppingWindow,Point,11:59:50.000,41.7,1008 GroupedHoppingWindow,Point,11:59:50.000,34.4,1009 GroupedHoppingWindow,Point,12:00:00.000,44,1001 GroupedHoppingWindow,Point,12:00:00.000,37.0,1002 GroupedHoppingWindow,Point,12:00:00.000,38.1,1003 GroupedHoppingWindow,Point,12:00:00.000,44.8,1004 GroupedHoppingWindow,Point,12:00:00.000,31.2,1005 GroupedHoppingWindow,Point,12:00:00.000,39.7,1006 GroupedHoppingWindow,Point,12:00:00.000,38.4,1007 GroupedHoppingWindow,Point,12:00:00.000,42.5,1008 GroupedHoppingWindow,Point,12:00:00.000,38.4,1009 GroupedHoppingWindow,Point,12:00:10.000,46.3,1001 GroupedHoppingWindow,Point,12:00:10.000,40.2,1002 GroupedHoppingWindow,Point,12:00:10.000,36.1,1003 GroupedHoppingWindow,Point,12:00:10.000,37.3,1004 GroupedHoppingWindow,Point,12:00:10.000,31.8,1005 GroupedHoppingWindow,Point,12:00:10.000,41.4,1006 GroupedHoppingWindow,Point,12:00:10.000,40.3,1007 GroupedHoppingWindow,Point,12:00:10.000,43.5,1008 GroupedHoppingWindow,Point,12:00:10.000,43.3,1009 |
怎样在每当一个新的事件进入事件组时,计算过去10秒内的每个事件组的平均值?
var query = from e in inputStream .AlterEventDuration(e => TimeSpan.FromSeconds(10)) group e by e.SensorId into sensorGroups from window in sensorGroups .SnapshotWindow(SnapshotWindowOutputPolicy.Clip) select new { SensorId = sensorGroups.Key, Average = window.Avg(e => e.Value) };
[输入数据] 结果:
<snip> GroupedSlidingWindow,Point,12:00:17.017,39.3,1008 GroupedSlidingWindow,Point,12:00:17.018,39.4,1009 GroupedSlidingWindow,Point,12:00:17.019,43,1001 GroupedSlidingWindow,Point,12:00:17.020,43.1,1002 GroupedSlidingWindow,Point,12:00:17.021,37.8,1003 GroupedSlidingWindow,Point,12:00:17.022,41.8,1004 GroupedSlidingWindow,Point,12:00:17.023,33.7,1005 GroupedSlidingWindow,Point,12:00:17.024,39.2,1006 GroupedSlidingWindow,Point,12:00:17.025,40.3,1007 GroupedSlidingWindow,Point,12:00:17.026,38.9,1008 GroupedSlidingWindow,Point,12:00:17.027,40.8,1009 <snip> |
var query = (from window in inputStream.TumblingWindow( TimeSpan.FromSeconds(10), HoppingWindowOutputPolicy.ClipToWindowEnd) from e in window orderby e.Value descending select e).Take(1);
[输入数据] 结果:
TopK_TumblingDescending,Point,12:00:00.000,1006,80 TopK_TumblingDescending,Point,12:00:00.000,1007,80 TopK_TumblingDescending,Point,12:00:00.000,1009,80 TopK_TumblingDescending,Point,12:00:10.000,1007,81 |
注意第一个事件组在相同的窗口时间内共有3个事件具有同样的最大值(80)。因此,所有具有最大值的事件都被输出了。
var query = (from window in inputStream.HoppingWindow( TimeSpan.FromSeconds(10), TimeSpan.FromSeconds(5), HoppingWindowOutputPolicy.ClipToWindowEnd) from e in window orderby e.Value ascending select e).Take(2);
[输入数据] 结果:
TopK_HoppingAscending,Point,11:59:55.000,1004,2 TopK_HoppingAscending,Point,11:59:55.000,1007,1 TopK_HoppingAscending,Point,11:59:55.000,1008,2 TopK_HoppingAscending,Point,12:00:00.000,1001,1 TopK_HoppingAscending,Point,12:00:00.000,1001,1 TopK_HoppingAscending,Point,12:00:00.000,1003,1 TopK_HoppingAscending,Point,12:00:00.000,1007,1 TopK_HoppingAscending,Point,12:00:05.000,1001,1 TopK_HoppingAscending,Point,12:00:05.000,1001,1 TopK_HoppingAscending,Point,12:00:05.000,1003,1 TopK_HoppingAscending,Point,12:00:10.000,1004,2 TopK_HoppingAscending,Point,12:00:10.000,1004,2 TopK_HoppingAscending,Point,12:00:10.000,1006,2 TopK_HoppingAscending,Point,12:00:10.000,1007,2 TopK_HoppingAscending,Point,12:00:15.000,1004,2 TopK_HoppingAscending,Point,12:00:15.000,1007,2 |
注意这里有同样的现象——窗口内包含多个具有相同最小值的事件。
这个查询分两个阶段执行。第一个查询中在过去的10秒内计算每一个传感器的平均值(使用group by,翻转窗口和Avg()函数)。接下去取出该聚合的结果并在其中执行快照(针对第一个查询中的事件集合),排序并取出最高值。
var avg = from e in inputStream group e by e.SensorId into sensorGroups from window in sensorGroups.TumblingWindow( TimeSpan.FromSeconds(10), HoppingWindowOutputPolicy.ClipToWindowEnd) select new { SensorId = sensorGroups.Key, Average = window.Avg(e => e.Value) }; var query = (from win in avg.SnapshotWindow(SnapshotWindowOutputPolicy.Clip) from e in win orderby e.Average descending select e).Take(1);
[输入数据] 结果:
TopK_GroupTumblingDescending,Point,12:00:00.000,50.9,1004 TopK_GroupTumblingDescending,Point,12:00:10.000,46.3,1001 |
var avg = from e in inputStream group e by e.SensorId into sensorGroups from window in sensorGroups.HoppingWindow( TimeSpan.FromSeconds(10), TimeSpan.FromSeconds(5), HoppingWindowOutputPolicy.ClipToWindowEnd) select new { SensorId = sensorGroups.Key, Maximum = window.Max(e => e.Value), Minimum = window.Min(e => e.Value) }; var query = (from win in avg.SnapshotWindow(SnapshotWindowOutputPolicy.Clip) from e in win orderby e.Maximum descending select e).Take(1);
[输入数据] 结果:
TopK_GroupHoppingDescending,Point,11:59:55.000,80,6,1006 TopK_GroupHoppingDescending,Point,11:59:55.000,80,1,1007 TopK_GroupHoppingDescending,Point,11:59:55.000,80,4,1009 TopK_GroupHoppingDescending,Point,12:00:00.000,80,6,1006 TopK_GroupHoppingDescending,Point,12:00:00.000,80,6,1006 TopK_GroupHoppingDescending,Point,12:00:00.000,80,1,1007 TopK_GroupHoppingDescending,Point,12:00:00.000,80,1,1007 TopK_GroupHoppingDescending,Point,12:00:00.000,80,2,1009 TopK_GroupHoppingDescending,Point,12:00:00.000,80,4,1009 TopK_GroupHoppingDescending,Point,12:00:05.000,80,2,1006 TopK_GroupHoppingDescending,Point,12:00:05.000,80,6,1006 TopK_GroupHoppingDescending,Point,12:00:05.000,80,1,1007 TopK_GroupHoppingDescending,Point,12:00:05.000,80,2,1009 TopK_GroupHoppingDescending,Point,12:00:10.000,81,2,1007 TopK_GroupHoppingDescending,Point,12:00:15.000,81,2,1007 TopK_GroupHoppingDescending,Point,12:00:15.000,81,2,1007 TopK_GroupHoppingDescending,Point,12:00:20.000,81,2,1007 |
再一次注意StreamInsight引擎根据最高值(80或81)输出了对应结果。
怎样在一个特定的时间段内(例如5分钟的时间窗口)找出具有相同ID值的事件?
实现这个查询,我们使用了一个滑动窗口来检查5分钟窗口内到达的事件,并通过它们的ID值进行分组。接下望去我们找出该窗口内具有相同ID的事件数量,并过滤出具有两个事件的事件组。换句话说,我们仅希望得到那些包含超过1个事件的事件组(传感器),而删除那些仅包含1个事件的事件组(不会存在包含0个事件的事件组)。
var eventCount = from e in inputStream.AlterEventDuration(e => TimeSpan.FromMinutes(5)) group e by e.SensorId into sensorGroups from win in sensorGroups.SnapshotWindow(SnapshotWindowOutputPolicy.Clip) select new { SensorId = sensorGroups.Key, Count = win.Count() }; var query = from e in eventCount where e.Count >= 2 select e;
[输入数据] 结果:
DetectMultipleInWindow,Point,12:05:16.206,4,1008 DetectMultipleInWindow,Point,12:05:16.207,4,1009 DetectMultipleInWindow,Point,12:05:16.208,3,1001 DetectMultipleInWindow,Point,12:05:16.209,3,1002 <snip> |