使用EntityFramework访问数据时的一些效率问题
一、准备工作
1、数据库模型:
如你所见,EF模型是上图中三个表,第四个则是数据库视图。
2、数据:
先在HeadAddress表中插入三条数据,再在EndAddress表中也插入三条数据,最后往Customer表中插入三万条随机数据作为测试数据。
二、效率比较
1、视图 vs 跨表:遍历所有用户信息(HeadAddress、EndAddress、Customer中的字段)
1 // 视图(ToList) 2 var temp = _DataContext.CustomerView; 3 foreach (var item in temp) ; 4 // 跨表(ToList) 5 var temp = _DataContext.CustomerSet.Select(c => new 6 { 7 Name = c.Name, 8 Sex = c.Sex, 9 Street = c.EndAddress.Street, 10 Number = c.EndAddress.Number, 11 Province = c.EndAddress.HeadAddress.Province, 12 City = c.EndAddress.HeadAddress.City, 13 County = c.EndAddress.HeadAddress.County 14 }); 15 foreach (var item in temp) ;
对应的SQL:
1 SELECT 2 [Extent1].[Name] AS [Name], 3 [Extent1].[Sex] AS [Sex], 4 [Extent1].[Province] AS [Province], 5 [Extent1].[City] AS [City], 6 [Extent1].[County] AS [County], 7 [Extent1].[Street] AS [Street], 8 [Extent1].[Number] AS [Number] 9 FROM [dbo].[CustomerView] AS [Extent1]
1 SELECT 2 [Extent1].[EndAddressId] AS [EndAddressId], 3 [Extent1].[Name] AS [Name], 4 [Extent1].[Sex] AS [Sex], 5 [Extent2].[Street] AS [Street], 6 [Extent2].[Number] AS [Number], 7 [Extent3].[Province] AS [Province], 8 [Extent3].[City] AS [City], 9 [Extent3].[County] AS [County] 10 FROM [dbo].[CustomerSet] AS [Extent1] 11 INNER JOIN [dbo].[EndAddressSet] AS [Extent2] ON [Extent1].[EndAddressId] = [Extent2].[Id] 12 INNER JOIN [dbo].[HeadAddressSet] AS [Extent3] ON [Extent2].[HeadAddressId] = [Extent3].[Id]
结果:
在接下来的所有统计中,我都没有把第1次(即上图中的0次)的时间算在平均时间内(因为EF第一次访问有初始时间)。可见使用视图做遍历效果并没有提升,但是当我把测试代码改为:
1 // 视图(ToList) 2 var temp = _DataContext.CustomerView.ToList(); 3 foreach (var item in temp) ; 4 // 跨表(ToList) 5 var temp = _DataContext.CustomerSet.Select(c => new 6 { 7 Name = c.Name, 8 Sex = c.Sex, 9 Street = c.EndAddress.Street, 10 Number = c.EndAddress.Number, 11 Province = c.EndAddress.HeadAddress.Province, 12 City = c.EndAddress.HeadAddress.City, 13 County = c.EndAddress.HeadAddress.County 14 }).ToList();
时,我发现效率发生了明显改变:
我们看到,视图ToList所用时间与上次相比几乎一致,甚至还有缩短,而使用跨表查找然后ToList耗时大大增加。至于原因,我认为可能是视图的ToList结果在数据结构内部为我们节省了非常多的工作。
2、视图 vs 跨表:遍历省份是“湖北”的用户信息(HeadAddress、EndAddress、Customer中的字段)
1 // 视图 2 var temp = _DataContext.CustomerView.Where(c => c.Province == "湖北"); 3 foreach (var item in temp) ; 4 // 跨表 5 var temp = _DataContext.CustomerSet.Where(c => c.EndAddress.HeadAddress.Province == "湖北") 6 .Select(c => new 7 { 8 Name = c.Name, 9 Sex = c.Sex, 10 Street = c.EndAddress.Street, 11 Number = c.EndAddress.Number, 12 Province = c.EndAddress.HeadAddress.Province, 13 City = c.EndAddress.HeadAddress.City, 14 County = c.EndAddress.HeadAddress.County 15 }); 16 foreach (var item in temp) ;
对应的SQL与上面的非常相似,就是在最后多了一个Where语句,结果:
我们发现两者时间消耗基本一致,同样如果改为使用ToList的话,使用视图会比跨表查询快5ms左右。
3、Foreach vs Linq:测试把所有不为性别空的数据输出为List<T>,source是用户信息集
1 var source = _DataContext.CustomerSet; 2 // foreach 3 List<Customer> temp = new List<Customer>(); 4 foreach (var item in source) 5 if (item.Sex != null) 6 temp.Add(item); 7 // Linq 8 source.Where(c => c.Sex != null).ToList();
它们执行的SQL语句:
1 SELECT 2 [Extent1].[Id] AS [Id], 3 [Extent1].[Name] AS [Name], 4 [Extent1].[Sex] AS [Sex], 5 [Extent1].[EndAddressId] AS [EndAddressId] 6 FROM [dbo].[CustomerSet] AS [Extent1]
1 SELECT 2 [Extent1].[Id] AS [Id], 3 [Extent1].[Name] AS [Name], 4 [Extent1].[Sex] AS [Sex], 5 [Extent1].[EndAddressId] AS [EndAddressId] 6 FROM [dbo].[CustomerSet] AS [Extent1] 7 WHERE [Extent1].[Sex] IS NOT NULL
使用Foreach的时候是全部查询出来,然后进行筛选,而使用Linq的Where则是先筛选了Sex Not Null的数据,再组成List。我想大家都能猜到结果了,没错,Linq大大领先传统的foreach:
这也验证了一点,ToList()效率确实非常地高!
4、SelectMany vs Select New:使用SelectMany和new生成两次分组的数据,source是所有的用户信息,生成的分组数据是先按姓别分组,再按省份分组的数据集,要求保存两次分组的依据
1 var source = _DataContext.CustomerView; 2 // SelectMany 3 var result = source 4 .GroupBy(c => c.Sex) 5 .SelectMany(c => c 6 .GroupBy(a => a.Province) 7 .GroupBy(a => c.Key)); 8 foreach (var items in result) 9 foreach (var item in items) 10 ; 11 // Select New 12 var result = source 13 .GroupBy(c => c.Sex) 14 .Select(c => new { 15 Key = c.Key, 16 Value = c.GroupBy(a => a.Province) 17 }); 18 foreach (var items in result) 19 foreach (var item in items) 20 ;
使用SelectMany得到的结果数据类型:
使用Select New得到的结果数据类型:
SelectMany执行的SQL:
1 SELECT 2 [Project6].[Sex] AS [Sex], 3 [Project6].[C2] AS [C1], 4 [Project6].[C1] AS [C2], 5 [Project6].[C4] AS [C3], 6 [Project6].[Province] AS [Province], 7 [Project6].[C3] AS [C4], 8 [Project6].[Name] AS [Name], 9 [Project6].[Sex1] AS [Sex1], 10 [Project6].[Province1] AS [Province1], 11 [Project6].[City] AS [City], 12 [Project6].[County] AS [County], 13 [Project6].[Street] AS [Street], 14 [Project6].[Number] AS [Number] 15 FROM ( SELECT 16 [Project4].[C1] AS [C1], 17 [Project4].[Sex] AS [Sex], 18 [Project4].[C2] AS [C2], 19 [Filter3].[Province1] AS [Province], 20 [Filter3].[Name] AS [Name], 21 [Filter3].[Sex] AS [Sex1], 22 [Filter3].[Province2] AS [Province1], 23 [Filter3].[City] AS [City], 24 [Filter3].[County] AS [County], 25 [Filter3].[Street] AS [Street], 26 [Filter3].[Number] AS [Number], 27 CASE WHEN ([Filter3].[Province1] IS NULL) THEN CAST(NULL AS int) WHEN ([Filter3].[Name] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C3], 28 CASE WHEN ([Filter3].[Province1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C4] 29 FROM (SELECT 30 [Distinct3].[C1] AS [C1], 31 [Distinct1].[Sex] AS [Sex], 32 1 AS [C2] 33 FROM (SELECT DISTINCT 34 [Extent1].[Sex] AS [Sex] 35 FROM [dbo].[CustomerView] AS [Extent1] ) AS [Distinct1] 36 CROSS APPLY (SELECT DISTINCT 37 [Distinct1].[Sex] AS [C1] 38 FROM ( SELECT DISTINCT 39 [Extent2].[Province] AS [Province] 40 FROM [dbo].[CustomerView] AS [Extent2] 41 WHERE ([Distinct1].[Sex] = [Extent2].[Sex]) OR (([Distinct1].[Sex] IS NULL) AND ([Extent2].[Sex] IS NULL)) 42 ) AS [Distinct2] ) AS [Distinct3] ) AS [Project4] 43 OUTER APPLY (SELECT [Distinct4].[Province] AS [Province1], [Extent4].[Name] AS [Name], [Extent4].[Sex] AS [Sex], [Extent4].[Province] AS [Province2], [Extent4].[City] AS [City], [Extent4].[County] AS [County], [Extent4].[Street] AS [Street], [Extent4].[Number] AS [Number] 44 FROM (SELECT DISTINCT 45 [Extent3].[Province] AS [Province] 46 FROM [dbo].[CustomerView] AS [Extent3] 47 WHERE ([Project4].[Sex] = [Extent3].[Sex]) OR (([Project4].[Sex] IS NULL) AND ([Extent3].[Sex] IS NULL)) ) AS [Distinct4] 48 LEFT OUTER JOIN [dbo].[CustomerView] AS [Extent4] ON (([Project4].[Sex] = [Extent4].[Sex]) OR (([Project4].[Sex] IS NULL) AND ([Extent4].[Sex] IS NULL))) AND ([Distinct4].[Province] = [Extent4].[Province]) 49 WHERE ([Project4].[C1] = [Project4].[Sex]) OR (([Project4].[C1] IS NULL) AND ([Project4].[Sex] IS NULL)) ) AS [Filter3] 50 ) AS [Project6] 51 ORDER BY [Project6].[Sex] ASC, [Project6].[C1] ASC, [Project6].[C4] ASC, [Project6].[Province] ASC, [Project6].[C3] ASC
Select New执行的SQL:
1 SELECT 2 [Project4].[C1] AS [C1], 3 [Project4].[Sex] AS [Sex], 4 [Project4].[C3] AS [C2], 5 [Project4].[Province] AS [Province], 6 [Project4].[C2] AS [C3], 7 [Project4].[Name] AS [Name], 8 [Project4].[Sex1] AS [Sex1], 9 [Project4].[Province1] AS [Province1], 10 [Project4].[City] AS [City], 11 [Project4].[County] AS [County], 12 [Project4].[Street] AS [Street], 13 [Project4].[Number] AS [Number] 14 FROM ( SELECT 15 [Project2].[Sex] AS [Sex], 16 [Project2].[C1] AS [C1], 17 [Join1].[Province1] AS [Province], 18 [Join1].[Name] AS [Name], 19 [Join1].[Sex] AS [Sex1], 20 [Join1].[Province2] AS [Province1], 21 [Join1].[City] AS [City], 22 [Join1].[County] AS [County], 23 [Join1].[Street] AS [Street], 24 [Join1].[Number] AS [Number], 25 CASE WHEN ([Join1].[Province1] IS NULL) THEN CAST(NULL AS int) WHEN ([Join1].[Name] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2], 26 CASE WHEN ([Join1].[Province1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C3] 27 FROM (SELECT 28 [Distinct1].[Sex] AS [Sex], 29 1 AS [C1] 30 FROM ( SELECT DISTINCT 31 [Extent1].[Sex] AS [Sex] 32 FROM [dbo].[CustomerView] AS [Extent1] 33 ) AS [Distinct1] ) AS [Project2] 34 OUTER APPLY (SELECT [Distinct2].[Province] AS [Province1], [Extent3].[Name] AS [Name], [Extent3].[Sex] AS [Sex], [Extent3].[Province] AS [Province2], [Extent3].[City] AS [City], [Extent3].[County] AS [County], [Extent3].[Street] AS [Street], [Extent3].[Number] AS [Number] 35 FROM (SELECT DISTINCT 36 [Extent2].[Province] AS [Province] 37 FROM [dbo].[CustomerView] AS [Extent2] 38 WHERE ([Project2].[Sex] = [Extent2].[Sex]) OR (([Project2].[Sex] IS NULL) AND ([Extent2].[Sex] IS NULL)) ) AS [Distinct2] 39 LEFT OUTER JOIN [dbo].[CustomerView] AS [Extent3] ON (([Project2].[Sex] = [Extent3].[Sex]) OR (([Project2].[Sex] IS NULL) AND ([Extent3].[Sex] IS NULL))) AND ([Distinct2].[Province] = [Extent3].[Province]) ) AS [Join1] 40 ) AS [Project4] 41 ORDER BY [Project4].[Sex] ASC, [Project4].[C3] ASC, [Project4].[Province] ASC, [Project4].[C2] ASC
上面两种方法,都保留了每层的分组依据(性别、省份)的值,语法上来看,可能SelectMany更紧凑,Select New更为清晰,不过效率上由于SelectMany的投影操作,所以耗时会更多一些:
不过我们也可以看到,3W条数据,时间差别也并不是很大。如果我们只需要保存最后一层分组依据(省份)的值,把测试代码改为:
1 // 只保留省份分组Key 2 var result = source 3 .GroupBy(c => c.Sex) 4 .Select(c => c 5 .GroupBy(a => a.Province)); 6 foreach (var items in result) 7 foreach (var item in items) 8 ;
这样,消耗的时间平均会在309.4ms左右,但就不知道哪个组是哪个性别了:
4、SelectMany vs Double Foreach:测试使用SelectMany和双重循环来遍历两次分组后的数据,并统计生成List<T>,source是:先按姓别分组再按省份分组的数据集,List<T>是:各姓别在各省份的人数,T:Sex,Province,Count。
1 // 临时数据结果类 2 class TempDTO { public bool? Sex; public string Province; public int Count;} 3 // 数据源 4 var source = _DataContext.CustomerView.GroupBy(c => c.Sex) 5 .Select(c => new 6 { 7 Key = c.Key, 8 Value = c.GroupBy(b => b.Province) 9 }); 10 // SelectMany 11 var temp = source.SelectMany(c => c.Value.Select(b => new TempDTO() 12 { 13 Sex = c.Key, 14 Province = b.Key, 15 Count = b.Count() 16 })).ToList(); 17 // 双得Foreach 18 List<TempDTO> temp = new List<TempDTO>(); 19 foreach (var items in source) 20 { 21 bool? sex = items.Key; 22 foreach (var item in items.Value) 23 { 24 temp.Add(new TempDTO() 25 { 26 Sex = sex, 27 Province = item.Key, 28 Count = item.Count() 29 }); 30 } 31 }
结果:
产生这么悬殊的结果也出乎我的意料,起初我认为是因为SelectMany中ToList的原因,但是后来更改了测试方法并没有改变这一现象,而且上面两种方法得到的结果也是完全一致的。于是我想可能是由于Linq的延时查询技术在起作用。因为source返回的结果类型是IQuerable<T>,它并没有真实地查询,在使用SelectMany时会对生成的SQL语句一起进行优化,而Foreach则是先把source中的每个结果都算了出来,再一个一个地填。验证的方法很简单,把source添加一个ToList()就行了:
1 var source = _DataContext.CustomerView.GroupBy(c => c.Sex) 2 .Select(c => new 3 { 4 Key = c.Key, 5 Value = c.GroupBy(b => b.Province) 6 }).ToList();
所得测试结果非常小,都在1ms左右,多次测试难以认定哪种方法更好。于是我增加了50次循环量,所得结果:
可见SelectMany和双重Foreach在执行效率上实际上是一致的,当然前提说数据源是已经计算好的。
三、总结
写了一段时间的数据库,经常会被这些问题困扰,担心这担心那,于是便有此文,总的来说,这些方法都差不多,只是在不同的应用环境下(关键是:是否要把数据ToList保存起来)有不同的结果。
转载请注明原址:http://www.cnblogs.com/lekko/archive/2013/01/03/2843080.html