【原创】StreamInsight查询系列(七)——基本查询操作之基础排序
上篇博文介绍了StreamInsight基础查询操作中的分组聚合部分。这篇文章将主要介绍如何StreamInsight查询中的基础排序(TopK)。
测试数据准备
为了方便测试查询,我们首先准备一个静态的测试数据源:
var weatherData = new[] { new { Timestamp = new DateTime(2010, 1, 1, 0, 00, 00, DateTimeKind.Utc), Temperature = -9.0, StationCode = 71395, WindSpeed = 4}, new { Timestamp = new DateTime(2010, 1, 1, 0, 30, 00, DateTimeKind.Utc), Temperature = -4.5, StationCode = 71801, WindSpeed = 41}, new { Timestamp = new DateTime(2010, 1, 1, 1, 00, 00, DateTimeKind.Utc), Temperature = -8.8, StationCode = 71395, WindSpeed = 6}, new { Timestamp = new DateTime(2010, 1, 1, 1, 30, 00, DateTimeKind.Utc), Temperature = -4.4, StationCode = 71801, WindSpeed = 39}, new { Timestamp = new DateTime(2010, 1, 1, 2, 00, 00, DateTimeKind.Utc), Temperature = -9.7, StationCode = 71395, WindSpeed = 9}, new { Timestamp = new DateTime(2010, 1, 1, 2, 30, 00, DateTimeKind.Utc), Temperature = -4.6, StationCode = 71801, WindSpeed = 59}, new { Timestamp = new DateTime(2010, 1, 1, 3, 00, 00, DateTimeKind.Utc), Temperature = -9.6, StationCode = 71395, WindSpeed = 9}, };
weatherData代表了一系列的天气信息(时间戳、温度、气象站编码以及风速)。
接下去将weatherData转变为点类型复杂事件流:
var weatherStream = weatherData.ToPointStream(Application, t => PointEvent.CreateInsert(t.Timestamp, t), AdvanceTimeSettings.IncreasingStartTime);
基础排序
问题1:怎样找出每4个小时内的最大值事件?
放在上面的例子中,我们可以把问题转变为“怎样找出每4个小时内平均温度最高的事件?”。要解决这个问题,要用到一个复合查询,首先使用翻转窗口TumblingWindow固定4小时,而后在这个4小时内的时间窗口内按照温度进行排序(orderby)。代码如下:
var topKQuery = (from win in weatherStream .TumblingWindow(TimeSpan.FromHours(4), HoppingWindowOutputPolicy.ClipToWindowEnd) from e in win orderby e.Temperature descending select e).Take(1);
LINQPad中的结果如下:
问题2:怎样每隔2小时的计算过去4小时的两个最小值事件?
与问题1较为类似,这里是group..by子句与跳跃窗口HoppingWindow之间的组合。
var topKQuery2 = (from win in weatherStream .HoppingWindow(TimeSpan.FromHours(4), TimeSpan.FromHours(2), HoppingWindowOutputPolicy.ClipToWindowEnd) from e in win orderby e.Temperature ascending select e).Take(2);
LINQPad输出结果如下:
下一篇将介绍StreamInsight基础查询操作中的分组排序(TopK)部分。