EF如何操作内存中的数据和加载外键数据:延迟加载、贪婪加载、显示加载
系列文章开始的时候提示大家,必须学会且习惯使用sql Profiler(开始 - 程序 - Microsoft SQL Server 2008 - 性能工具 - Sql Server Profiler),它可以监控到ef生成的sql是什么样子的,这不仅可以帮助我们更好的学习EF的API,也可以帮我们监测写出来的EF方法效率如何。好的,废话不多说了,先上解决方案图:
本章节要操作的两个实体分别:
/// <summary> /// 景点类 /// </summary> [Table("Locations", Schema = "baga")] //生成的表名:baga.Locations public class Destination { public Destination() { this.Lodgings = new List<Lodging>(); } [Column("LocationID")] public int DestinationId { get; set; } [Required, Column("LocationName")] [MaxLength(200)] public string Name { get; set; } public string Country { get; set; } [MaxLength(500)] public string Description { get; set; } [Column(TypeName = "image")] public byte[] Photo { get; set; } public string TravelWarnings { get; set; } public string ClimateInfo { get; set; } public List<Lodging> Lodgings { get; set; } }
/// <summary> /// 住宿类 /// </summary> public class Lodging { public int LodgingId { get; set; } [Required] [MaxLength(200)] [MinLength(10)] public string Name { get; set; } public string Owner { get; set; } public decimal MilesFromNearestAirport { get; set; } [Column("destination_id")] public int DestinationId { get; set; } public Destination Destination { get; set; } }
实体不再做过多的介绍了,上个系列讲的很详细了。实体都是通过Data Annotation配置映射的,还不是很了解Data Anntation的看这里。
程序跑起来InitializeDBWithSeedData类会添加了一些测试数据到数据库中。ok,我们先小试下牛刀,添加一个方法PrintAllDestinations在控制台上打印出所有景点:
/// <summary> /// 查出所有景点 /// </summary> private static void PrintAllDestinations() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { foreach (var destination in context.Destinations) { Console.WriteLine(destination.Name); } } }
查出来的结果自然就是所有的景点类的名称:
Grand Canyon
Hawaii
Wine Glass Bay
Great Barrier Reef
如果按照以往手写ado的方式的话,要出来上面的结果得先select * from baga.Locations表,然后用dataTable存下结果集,再遍历dataTable才能拿到所有景点的名称;当然也可以用sqlDataReader的方式一行一行读取。随便一写就是几十行了。EF只要一行。ok,我们继续。再按照景点的名称排序下:
/// <summary> /// 按照名称排序 /// </summary> private static void PrintAllDestinationsSorted() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var query = from d in context.Destinations orderby d.Name select d; foreach (var destination in query) { Console.WriteLine(destination.Name); } } }
当然,查出来的数据也就是排序了的:
Grand Canyon
Great Barrier Reef
Hawaii
Wine Glass Bay
很简单,就是普通的linq写法,当然也有C# Lambda表达式的写法(文章结尾的源码里有)。下面的方法使用Find方法查询数据库:
/// <summary> /// Find方法查询数据库 /// </summary> private static void FindDestination() { Console.Write("Enter id of Destination to find: "); var id = int.Parse(Console.ReadLine()); using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var destination = context.Destinations.Find(id); if (destination == null) { Console.WriteLine("Destination not found!"); } else { Console.WriteLine(destination.Name); } } }
EF的find方法是先从内存中查询,内存中没有才查询数据库(什么是在内存中查询,本文第二段会详细讲解)为了演示我们添加一个方法:
/// <summary> /// 测试Find方法查询内存中的数据 /// </summary> private static void TestFindLocalData() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var query = from d in context.Destinations where d.Name == "Great Barrier Reef" select d; query.Load(); //加载名称为Great Barrier Reef的景点到内存中 Console.WriteLine(context.Destinations.Local.Count); //输出内存中的数据个数 var destination = context.Destinations.Find(4); if (destination == null) Console.WriteLine("Destination not found!"); else Console.WriteLine(destination.Name); } }
先从数据库加载Name为Great Barrier Reef的景点(主键destinationId为4)到内存中,然后调用find方法找4号id,我们看看sql Profiler监控到的sql:
可见,只有一条查询Name是Great Barrier Reef的sql,并没有发送查询destinationId为4的sql。很明显,Find方法查询的是内存,内存中有就用内存中的。继续,使用Single方法查询单个实体:
/// <summary> /// 获取一个实体对象(Single) /// </summary> private static void FindGreatBarrierReef() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var query = from d in context.Destinations where d.Name == "Great Barrier Reef" select d; var reef = query.Single(); //差不到记录或者多条记录就报错 Console.WriteLine(reef.Description); } }
Single方法不太好用,不管是查不到记录还是查到多条记录都会抛错;使用SingleOrDefault方法,如果查不到记录就返回null,查到多条记录也会报错:
/// <summary> /// 获取一个实体对象(SingleOrDefault) /// </summary> private static void FindGreatBarrierReef() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var query = from d in context.Destinations where d.Name == "Great Barrier Reef" select d; var reef = query.SingleOrDefault(); if (reef == null) Console.WriteLine("Can't find the reef!"); else Console.WriteLine(reef.Description); } }
使用sql Profiler监控到如下sql被发送到了数据库。虽然是查一条记录,但是会Select TOP 2,这是为了肯定有一条满足的记录。
SELECT TOP (2) [Extent1].[LocationID] AS [LocationID], [Extent1].[LocationName] AS [LocationName], [Extent1].[Country] AS [Country], [Extent1].[Description] AS [Description], [Extent1].[Photo] AS [Photo], [Extent1].[TravelWarnings] AS [TravelWarnings], [Extent1].[ClimateInfo] AS [ClimateInfo] FROM [baga].[Locations] AS [Extent1] WHERE N'Great Barrier Reef' = [Extent1].[LocationName]
二、操作内存中的数据
不同于普通的ado.net写sql直接去库里取数据。EF还为我们提供了从内存中取数据的方式,当然内存中的数据也是我们从数据库先取出来的。如上的查询写法都会发送数据到数据库查询数据,如果“Great Barrier Reef”这条数据是新标记添加但是还没插入数据库(没有调用上下文的saveChanges方法),那么如上查询都查不到数据(当然find方法是可以的,上面介绍了)。我们需要在内存中查询(Querying Local Data)。为了演示,我们使用SingleOrDefault方法做个试验:
/// <summary> /// 测试查询新标记添加但是没被插入数据库的数据 /// </summary> private static void TestGetNewAddedData() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var destination = new DbContexts.Model.Destination { Name = "bingmayong", Country = "China", Description = "too boring~~~~~~" }; context.Destinations.Add(destination); //标记添加新实体,还没有调用SaveChanges方法提交到数据库 var query = from d in context.Destinations where d.Name == "bingmayong" select d; var reef = query.SingleOrDefault(); //查询Name是刚才添加的兵马俑的对象 if (reef == null) Console.WriteLine("Can't find the reef!"); else Console.WriteLine(reef.Description); context.SaveChanges(); //调用SaveChanges方法,添加才会被插入数据库 } }
输出结果:
结果显示找不到我们新添加的景点,但是程序跑完了,它明明在数据库里了。很明显我们不调用SaveChanges方法的时候,所有标记添加、删除、修改的数据都是在内存中的。SingleOrDefault方法就是直接去数据库取数据的,当然Find方法直接就可以撸出内存中的数据,内存中没有再去库里找。其他方法请自行查询官方API或者自己试验,这里不做过多演示了。继续介绍一些方法:
/// <summary> /// 查询内存中的数据 /// </summary> private static void GetLocalDestinationCount() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var count = context.Destinations.Local.Count; Console.WriteLine("Destinations in memory: {0}", count); } }
输出结果:Destinations in memory: 0 可见并没有去查询数据库,直接查的内存中的数据。直接用sql profiler跟踪下,也没有任何查询的sql发送到数据库。我们测试下查询数据库:
/// <summary> /// 先加载数据库再查询本地数据 /// </summary> private static void GetLocalDestinationCountAfterCheckDB() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { foreach (var destination in context.Destinations) { Console.WriteLine(destination.Name); } var count = context.Destinations.Local.Count; Console.WriteLine("Destinations in memory: {0}", count); } }
这样就把所有数据都打印出来了,并且输出count值是4,正如我们预期,先把数据库中的值拿到Local本地,再输出count值。使用sql profiler跟踪到的sql:
SELECT [Extent1].[LocationID] AS [LocationID], [Extent1].[LocationName] AS [LocationName], [Extent1].[Country] AS [Country], [Extent1].[Description] AS [Description], [Extent1].[Photo] AS [Photo], [Extent1].[TravelWarnings] AS [TravelWarnings], [Extent1].[ClimateInfo] AS [ClimateInfo] FROM [baga].[Locations] AS [Extent1]
这种使用foreach遍历数据把数据加载到内存的方式有点不妥,不可能每次都先遍历把数据拿到内存中再操作,我们改用Load方法试试:
/// <summary> /// Load方法把数据加载到内存 /// </summary> private static void GetLocalDestinationCountWithLoad() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { context.Destinations.Load(); var count = context.Destinations.Local.Count; Console.WriteLine("Destinations in memory: {0}", count); } }
count值一样是4,产生的sql也跟上一个方法一模一样,但是Load方法还是比较实用的。linq写法是这样的:
/// <summary> /// Load方法把数据加载到内存(LINQ写法) /// </summary> private static void LoadAustralianDestinations() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var query = from d in context.Destinations where d.Country == "Australia" select d; query.Load(); var count = context.Destinations.Local.Count; Console.WriteLine("Aussie destinations in memory: {0}", count); } }
那么把数据加载到内存中有什么好处呢?上方法:
/// <summary> /// 操作内存中的数据 /// </summary> private static void LocalLinqQueries() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { context.Destinations.Load(); var sortedDestinations = from d in context.Destinations.Local orderby d.Name select d; Console.WriteLine("All Destinations:"); foreach (var destination in sortedDestinations) { Console.WriteLine(destination.Name); } var aussieDestinations = from d in context.Destinations.Local where d.Country == "Australia" select d; Console.WriteLine(); Console.WriteLine("Australian Destinations:"); foreach (var destination in aussieDestinations) { Console.WriteLine(destination.Name); } } }
这个方法先把Destination表的数据全部加载到内存中,然后在内存中排序输出;再在内存中已经排序的数据里查出Country为Australia的数据。整个过程操作和输出数据多次,但是只查询了一次数据库。这就是操作内存中的数据的好处。
好处归好处,但是不好的地方也很明显,操作内存中的数据,新添加、删除或者修改的数据,先前加载到内存中的数据就跟数据库里的不一致了。我们可以使用CollectionChanged事件来监控,内存中的数据增删改都会触发这个事件:
/// <summary> /// 监控内存中数据的变化 /// </summary> private static void ListenToLocalChanges() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { context.Destinations.Local.CollectionChanged += (sender, args) => { if (args.NewItems != null) { foreach (DbContexts.Model.Destination item in args.NewItems) { Console.WriteLine("Added: " + item.Name); } } if (args.OldItems != null) { foreach (DbContexts.Model.Destination item in args.OldItems) { Console.WriteLine("Removed: " + item.Name); } } }; context.Destinations.Load(); } }
在把Destination表的数据加载到内存前,我们监控了下CollectionChanged,打印结果:
Added: Grand Canyon
Added: Hawaii
Added: Wine Glass Bay
Added: Great Barrier Reef
正如我们预期,都是添加到内存中,没有删除。实际开发中,我们可以多多利用CollectionChanged事件来监控内存中数据的状态。
三、加载外键数据(Loading Related Data)
所谓加载外键数据就是,通俗点说就是从数据库中取有主外键关系的关联表数据。我们的项目中的表不可能都是单独的表,肯定都是存在一定关系的,以往我们写ado的时候都是先拿到主键表的数据,然后再根据主键的id去外键表查和其关联的数据,稍显笨拙。EF为我们提供了三种方式加载关联数据,写法不同,生成的sql也不同,当然效率也不同。他们分别是:延迟加载、贪婪加载、显示加载:
1.延迟加载(Lazy Loading)
/// <summary> /// 延迟加载LazyLoading /// </summary> private static void TestLazyLoading() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var query = from d in context.Destinations where d.Name == "Grand Canyon" select d; var canyon = query.Single(); Console.WriteLine("Grand Canyon Lodging:"); if (canyon.Lodgings != null) { foreach (var lodging in canyon.Lodgings) { Console.WriteLine(lodging.Name); } } } }
类似使用上下文对象调实体导航属性的方式都是延迟加载(context.Destinations)。上面的方法意思就是先去Locations表里找Name值是Grand Canyon的对象,然后去Lodgings表里找destination_id等于这个对象的LocationID的数据。输入结果:
Grand Canyon Lodging:
Grand Hotel
Dave's Dump
注:使用延迟加载必须标注为virtual。本例是标注Destination类里的Lodgings为virtual:
public virtual List<Lodging> Lodgings { get; set; } //virtual 延迟加载
监控到的两段sql是(主键数据和外键数据分开加载):
SELECT TOP (2) [Extent1].[LocationID] AS [LocationID], [Extent1].[LocationName] AS [LocationName], [Extent1].[Country] AS [Country], [Extent1].[Description] AS [Description], [Extent1].[Photo] AS [Photo], [Extent1].[TravelWarnings] AS [TravelWarnings], [Extent1].[ClimateInfo] AS [ClimateInfo] FROM [baga].[Locations] AS [Extent1] WHERE N'Grand Canyon' = [Extent1].[LocationName]
exec sp_executesql N'SELECT [Extent1].[LodgingId] AS [LodgingId], [Extent1].[Name] AS [Name], [Extent1].[Owner] AS [Owner], [Extent1].[MilesFromNearestAirport] AS [MilesFromNearestAirport], [Extent1].[destination_id] AS [destination_id], [Extent1].[PrimaryContactId] AS [PrimaryContactId], [Extent1].[SecondaryContactId] AS [SecondaryContactId] FROM [dbo].[Lodgings] AS [Extent1] WHERE [Extent1].[destination_id] = @EntityKeyValue1',N'@EntityKeyValue1 int',@EntityKeyValue1=1
看到生成的sql就很明显的知道为何叫这种加载方式为延迟加载了吧。因为先发sql去查询主键的id,然后根据主键id去外键表里查相关联的数据。以往写ado都是分两步写的,EF只需要一行。
但是延迟加载并不是那么美好,不正当的使用会发送很多多余的sql到数据库:比如我们要从数据库取Destioation表的50条记录,如果我们使用了延迟加载,那么会同时加载出Lodging表的数据,这么就发送了一条sql取50条Destination表数据,同时发送50条分别取Lodging表数据,其实这50条。这种情况使用join连表查询显然比延迟加载有效率的多。关闭延迟加载有两种方式:
1.去掉virtual;
2.context.Configuration.LazyLoadingEnabled =false;
2.贪婪加载(Eager Loading)
/// <summary> /// 贪婪加载EagerLoading /// </summary> private static void TestEagerLoading() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var allDestinations = context.Destinations.Include(d => d.Lodgings); foreach (var destination in allDestinations) { Console.WriteLine(destination.Name); foreach (var lodging in destination.Lodgings) { Console.WriteLine(" - " + lodging.Name); } } } }
很明显,使用了Include,加载Destination表的所有数据并贪婪加载出了所有相关联的外键表数据。输出结果:
Grand Canyon
- Grand Hotel
- Dave's Dump
Hawaii
Wine Glass Bay
Great Barrier Reef
只有一条sql被发送到了数据库:
贪婪加载是很灵活的,我们可以写下如下延迟加载,具体请自行调试和跟踪:
var AustraliaDestination = context.Destinations.Include(d => d.Lodgings).Where(d => d.Country == "Australia"); context.Lodgings.Include(l => l.PrimaryContact.Photo); context.Destinations.Include(d => d.Lodgings.Select(l => l.PrimaryContact)); context.Lodgings.Include(l => l.PrimaryContact).Include(l => l.SecondaryContact);
当然贪婪加载也有它不好的地方:我们看上面生成的sql自然就知道了,虽然贪婪加载生成的sql不多只有一条,但是随着我们贪婪加载的外键表越多,生成的join也就越复杂,迟早有一个我们都看不懂的多join查询的sql摆在我们面前让我们崩溃。当然越复杂也就越耗费性能,这是明显的。
3.显示加载(Explicit Loading)
/// <summary> /// 显示加载ExplicitLoading /// </summary> private static void TestExplicitLoading() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var query = from d in context.Destinations where d.Name == "Grand Canyon" select d; var canyon = query.Single(); context.Entry(canyon).Collection(d => d.Lodgings).Load(); Console.WriteLine("Grand Canyon Lodging:"); foreach (var lodging in canyon.Lodgings) { Console.WriteLine(lodging.Name); } } }
跟延迟加载一样,主键外键数据分开加载,但是显示加载又几个好处:不需要标记virtual了;生成的sql更加清晰明白。跟踪到的sql:
SELECT TOP (2) [Extent1].[LocationID] AS [LocationID], [Extent1].[LocationName] AS [LocationName], [Extent1].[Country] AS [Country], [Extent1].[Description] AS [Description], [Extent1].[Photo] AS [Photo], [Extent1].[TravelWarnings] AS [TravelWarnings], [Extent1].[ClimateInfo] AS [ClimateInfo] FROM [baga].[Locations] AS [Extent1] WHERE N'Grand Canyon' = [Extent1].[LocationName]
exec sp_executesql N'SELECT [Extent1].[LodgingId] AS [LodgingId], [Extent1].[Name] AS [Name], [Extent1].[Owner] AS [Owner], [Extent1].[MilesFromNearestAirport] AS [MilesFromNearestAirport], [Extent1].[destination_id] AS [destination_id], [Extent1].[PrimaryContactId] AS [PrimaryContactId], [Extent1].[SecondaryContactId] AS [SecondaryContactId] FROM [dbo].[Lodgings] AS [Extent1] WHERE [Extent1].[destination_id] = @EntityKeyValue1',N'@EntityKeyValue1 int',@EntityKeyValue1=1
两条查询,分别查出主键,再查出外键数据。查出来的结果:
Grand Canyon Lodging:
Grand Hotel
Dave's Dump
大家可能注意到了,显示加载使用的是Entry方法,延迟加载是Include。Entry方法中文意思为“条目”,提供了对单个实体的各种操作,配合使用Collection和Reference方法,分别是查询主键表和外键表的数据。下面是一条查询主键表数据的方法:
var lodging = context.Lodgings.First(); context.Entry(lodging).Reference(l => l.PrimaryContact).Load();
当然我们也可以使用IsLoaded方法判断想关联的外键数据是否已经从数据库加载出来了:
/// <summary> /// IsLoaded方法判断数据是否加载 /// </summary> private static void TestIsLoaded() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var canyon = (from d in context.Destinations where d.Name == "Grand Canyon" select d).Single(); var entry = context.Entry(canyon); Console.WriteLine("Before Load: {0}", entry.Collection(d => d.Lodgings).IsLoaded); entry.Collection(d => d.Lodgings).Load(); Console.WriteLine("After Load: {0}", entry.Collection(d => d.Lodgings).IsLoaded); } }
输出结果:
Before Load: False
After Load: True
IsLoaded方法有何意义呢?官方给出的解释:
If you are performing an explicit load, and the contents of the navigation property may
have already been loaded, you can use the IsLoaded flag to determine if the load is
required or not.
意思就是:IsLoaded方法是我们判断外键表数据是否已经加载了。
如上的Collection、Reference、Load等都是通过EF或者linq的形式把数据加载到了内存中,然后再进行排序,筛选等操作,然后展示到界面上的,我们来看一个方法:
/// <summary> /// 在内存中操作 /// </summary> private static void QueryLodgingDistance() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var canyonQuery = from d in context.Destinations where d.Name == "Grand Canyon" select d; var canyon = canyonQuery.Single(); var distanceQuery = from l in canyon.Lodgings where l.MilesFromNearestAirport <= 10 select l; foreach (var lodging in distanceQuery) { Console.WriteLine(lodging.Name); } } }
这个方法很简单,意思就是先查出Name为Grand Canyon的Destination,然后去查出外键表Lodging里距离最近几场不到10公里的住宿的地方,最后遍历输出。很明显使用了贪婪加载,我们只需要查出Grand Canyon附近距离机场不到10公里住宿的地方,而上面的方法使用了贪婪加载把Grand Canyon附近所有住宿的地方都加载到了内存中,然后在内存中做筛选的,自然不符合性能的要求,我们修改为:
/// <summary> /// 改进:在数据库中操作 /// </summary> private static void QueryLodgingDistancePro() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var canyonQuery = from d in context.Destinations where d.Name == "Grand Canyon" select d; var canyon = canyonQuery.Single(); var lodgingQuery = context.Entry(canyon).Collection(d => d.Lodgings).Query(); var distanceQuery = from l in lodgingQuery where l.MilesFromNearestAirport <= 10 select l; foreach (var lodging in distanceQuery) { Console.WriteLine(lodging.Name); } } }
有什么区别呢?认真看肯定看的出来了,我们查询操作是在数据库操作的,而不是使用贪婪加载加载到了内存中再筛选的。这就是效率的问题,认真研究EF的API之后,写程序的时候自然就知道对应生成的sql是什么样子的,复不复杂、会有什么性能上的损失。Perfect。再举一例:
/// <summary> /// 查询个数 /// </summary> private static void QueryLodgingCount() { using (var context = new DbContexts.DataAccess.BreakAwayContext()) { var canyonQuery = from d in context.Destinations where d.Name == "Grand Canyon" select d; var canyon = canyonQuery.Single(); var lodgingQuery = context.Entry(canyon) .Collection(d => d.Lodgings) .Query(); var lodgingCount = lodgingQuery.Count(); Console.WriteLine("Lodging at Grand Canyon: " + lodgingCount); } }
从数据库中算好个数再输出,而不是把所有记录都加载到内存中,然后在内存中算个数。当然我们也可以一起使用Query和Load方法,先查询数据库在加载到内存中:
context.Entry(canyon).Collection(d => d.Lodgings).Query().Where(l => l.Name.Contains("Hotel")).Load();
总之,调用Load方法就会加载数据到内存中,加载之前做好过滤就是在数据库中的,加载之后再过滤就是在内存中的操作,如果我们只需要简单求count,显然前者更合适。
感谢阅读,本章源码。后续还有更精彩的文章带你了解EF的各种增删改查,请保持关注!
转载https://www.cnblogs.com/Leo_wl/p/3274183.html