【原创】StreamInsight查询系列(十六)——查询模式之左外联接
上篇文章介绍了查询模式中窗口比率部分,这篇博文将介绍StreamInsight中的左外联接(Left Outer Join)。
测试数据准备
为了方便测试查询,我们首先准备一个静态的测试数据源:
var time = DateTime.Parse("10/23/2011 09:37:00 PM"); var inputEvents = new[] { new { Timestamp = time+TimeSpan.FromMinutes(1), accountNo = "800001", branchName="43rd Street", customerName="John", amountWithdrawn = 100.0 }, new { Timestamp = time+TimeSpan.FromMinutes(2), accountNo = "800002", branchName="23rd Street", customerName="Mark", amountWithdrawn = 250.0 }, new { Timestamp = time+TimeSpan.FromMinutes(3), accountNo = "800003", branchName="43rd Street", customerName="Chandler", amountWithdrawn = 500.0 }, new { Timestamp = time+TimeSpan.FromMinutes(4), accountNo = "800004", branchName="43rd Street", customerName="Dave", amountWithdrawn = 380.0 }, new { Timestamp = time+TimeSpan.FromMinutes(4), accountNo = "800005", branchName="43rd Street", customerName="Mike", amountWithdrawn = 5000.0 }, new { Timestamp = time+TimeSpan.FromMinutes(5), accountNo = "800006", branchName="23rd Street", customerName="Sarah", amountWithdrawn = 80.0 }, new { Timestamp = time+TimeSpan.FromMinutes(6), accountNo = "800007", branchName="43rd Street", customerName="Ross", amountWithdrawn = 500.0 }, new { Timestamp = time+TimeSpan.FromMinutes(8), accountNo = "800008", branchName="43rd Street", customerName="Joey", amountWithdrawn = 1500.0 } }; var branchInfo = new[] { new { branchName = "23rd Street", city = "Some Town", zip = "11111" } };
接下去将上述数据源转变为点类型复杂事件流(其中branchStream将作为引用事件流):
var branchStream = branchInfo.ToPointStream(Application, t => PointEvent.CreateInsert(time, t), AdvanceTimeSettings.StrictlyIncreasingStartTime); // 将branchStream事件流中的所有时间持续时间延伸至无穷大值,方便后续作为引用流 var refStream = branchStream.AlterEventDuration(e => TimeSpan.MaxValue); var inputStream = inputEvents.ToPointStream(Application, t => PointEvent.CreateInsert(t.Timestamp, t), AdvanceTimeSettings.StrictlyIncreasingStartTime);
左外联接
熟悉SQL的读者一定很熟悉左外联接(Left Outer Join),这里再举一个左外联接的简单例子:
假设存在两张表A和B,表A记录了人员编号id及人员姓名Name,如下:
id |
name |
1 | 张三 |
2 | 李四 |
3 | 王五 |
表B记录了人员编号id及对应职业job,如下:
id |
job |
1 | 学生 |
2 | 教师 |
4 | 程序员 |
将表A与表B进行左外联接
SELECT A.*, B.* FROM A
LEFT JOIN B ON A.id = B.id
结果如下:
id | name | id | job |
1 | 张三 | 1 | 学生 |
2 | 李四 | 2 | 教师 |
3 | 王五 | NULL | NULL |
好了复习完了左外联接的基础知识,让我们再回顾一下之前学习过的StreamInsight内部联接:
var innerJoinStream = from left in inputStream join right in refStream on left.branchName equals right.branchName select new { accountNo = left.accountNo, amountWithdrawn = left.amountWithdrawn, customerName = left.customerName, branchName = right.branchName, branchCity = right.city, branchZip = right.zip };
结果显示了inputStream和refStream中branchName相同且事件重叠的两个事件:
之所以介绍内部联接,是因为实现StreamInsight中的左外联接要用到内部联接。我们可以把左外联接看做是内部联接和左反半部联接的并集。
下面进行左外半部联接查询找出inputStream存在但是refStream没有的事件:
var lasjStream = from left in inputStream where (from right in refStream where left.branchName == right.branchName select right).IsEmpty() select new { accountNo = left.accountNo, amountWithdrawn = left.amountWithdrawn, customerName = left.customerName, branchName = left.branchName, branchCity = "MISSING", branchZip = "MISSING" };
结果如下:
最后左外联接的结果就是内部链接和左反半部联接的结果并集:
var leftOuterJoinStream = innerJoinStream.Union(lasjStream);
左外联接的最终结果如下:
下一篇将介绍StreamInsight查询模式中如何应对瞬变及报警泛滥。