Hibernate N+1 问题
考虑下面的一对多关系:一个Department包含多个Employee:
1 public class Employee 2 { 3 public virtual int Id { get; set; } 4 public virtual string Name { get; set; } 5 public virtual string IdentityNumber { get; set; } 6 public virtual Department Department { get; set; } 7 } 8 9 public class Department 10 { 11 public virtual int Id { get; set; } 12 public virtual string Name { get; set; } 13 public virtual IList<Employee> Employees { get; set; } 14 }
数据库中已有数据如下:
默认情况下,Department.Employees是Lazy Load的,也就是说,当查询Department时,hibernate并不会将对应的employee也从数据库中加载进来,而是在第一次调用Department.Employees时加载。
1. N+1问题是如何出现的
当应用程序需要遍历多个Department及其Employee时,问题就出现了,看测试代码:
[Test] public void show_n_plus_1_problem() { var modelMapper = new ModelMapper(); modelMapper.Class<Department>(cm => { cm.Table("Department"); cm.Id(x => x.Id, map => map.Generator(Generators.Identity)); cm.Property(x => x.Name); cm.Bag(x => x.Employees, map => map.Key(k => k.Column("DepartmentId")), rel => rel.OneToMany()); }); MapEmployee(modelMapper); using (var sessionFactory = BuildSessionFactory(modelMapper)) using(var session = sessionFactory.OpenSession()) { // 1 query for retrieve all departments, got n departments // n query to retrieve all employees var departments = session.QueryOver<Department>().List<Department>(); Assert.AreEqual(3, departments.Count); departments.ForEach(department => department.Employees.ForEach(employee => Console.WriteLine(employee.Name))); } }
为了能在测试中动态改变nhibernate映射,这里使用了nHibernate 3.2新增的map by code特性。
private void MapEmployee(ModelMapper modelMapper) { modelMapper.Class<Employee>(cm => { cm.Table("Employee"); cm.Id(x => x.Id, map => map.Generator(Generators.Identity)); cm.Property(x => x.Name); cm.Property(x => x.IdentityNumber); cm.ManyToOne(x => x.Department, map => map.Column("DepartmentId")); }); } private ISessionFactory BuildSessionFactory(ModelMapper modelMapper) { var configuration = new Configuration(); configuration.Configure(); HbmMapping hbmMapping = modelMapper.CompileMappingForAllExplicitlyAddedEntities(); Console.WriteLine(hbmMapping.AsString()); configuration.AddDeserializedMapping(hbmMapping, ""); return configuration.BuildSessionFactory(); }
运行测试可以发现,hibernate实际执行了如下sql查询:
SELECT Id, Name FROM Department SELECT Id, Name, IdentityNumber, DepartmentId FROM Employee WHERE DepartmentId=1 SELECT Id, Name, IdentityNumber, DepartmentId FROM Employee WHERE DepartmentId=2 SELECT Id, Name, IdentityNumber, DepartmentId FROM Employee WHERE DepartmentId=3
当执行session.QueryOver<Department>().List<Department>()时, nhibernate执行了1次查询SELECT Id, Name FROM Department,此时department.Employees中并没有加载出对应的Employee,而每次调用department.Employees时,nhibernate都会从Employee表中加载相应的Employee。
因此,对N个Department,实际会执行N+1次查询。这将对性能造成很大影响。
2. 如何解决N+1问题
(1)采用Batch Load批量加载
[Test] public void resolve_n_plus_1_problem_by_batch_load() { var modelMapper = new ModelMapper(); modelMapper.Class<Department>(cm => { cm.Table("Department"); cm.Id(x => x.Id, map => map.Generator(Generators.Identity)); cm.Property(x => x.Name); cm.Bag(x => x.Employees, map => { map.Key(k => k.Column("DepartmentId")); map.BatchSize(2); // add batch size to batch load employees }, rel => rel.OneToMany()); }); MapEmployee(modelMapper); using (var sessionFactory = BuildSessionFactory(modelMapper)) using (var session = sessionFactory.OpenSession()) { // 1 query for retrieve all departments, got n departments // ceil(n/BatchSize) query to retrieve all employees var departments = session.QueryOver<Department>().List<Department>(); Assert.AreEqual(3, departments.Count); departments.ForEach(department => department.Employees.ForEach(employee => Console.WriteLine(employee.Name))); } }
nHibernate实际执行查询:
SELECT this_.Id as Id0_0_, this_.Name as Name0_0_ FROM Department this_ SELECT ... FROM Employee employees0_ WHERE employees0_.DepartmentId in (1, 2); SELECT ... FROM Employee employees0_ WHERE employees0_.DepartmentId = 3
这种方式并没有完全解决N+1查询问题,但显著减少了查询次数,实际查询为ceil(n/BatchSize) + 1次,BatchSize足够大时,查询次数为2次。
(2)使用Join Fetch一次性加载所有数据
[Test] public void resolve_n_plus_1_problem_by_join_fetch() { var modelMapper = new ModelMapper(); modelMapper.Class<Department>(cm => { cm.Table("Department"); cm.Id(x => x.Id, map => map.Generator(Generators.Identity)); cm.Property(x => x.Name); cm.Bag(x => x.Employees, map => { map.Key(k => k.Column("DepartmentId")); map.Fetch(CollectionFetchMode.Join); // eager load employees }, rel => rel.OneToMany()); }); MapEmployee(modelMapper); using (var sessionFactory = BuildSessionFactory(modelMapper)) using (var session = sessionFactory.OpenSession()) { // 1 query for retrieve all departments and employees var departments = new HashSet<Department>(session.QueryOver<Department>().List<Department>()).ToList(); Assert.AreEqual(3, departments.Count); departments.ForEach(department => department.Employees.ForEach(employee => Console.WriteLine(employee.Name))); } }
nHibernate实际执行查询:
SELECT ... FROM Department this_ left outer join Employee employees2_ on this_.Id=employees2_.DepartmentId
需要注意的是,这种join fetch会导致重复记录,比如本例中,因为Jim和Rechard同属于Finance部门,所以departments实际查询出4条记录,还需用Set过一遍以保证departments集合元素的唯一性。
另外,这种在映射中直接配置join fetch的做法也不够灵活,在某些场景中,可能并不关心Department.Employees,join fetch没有必要。
(3)Department.Employees映射仍然是Lazy的,仅在需要时通过join fetch加载
[Test] public void resolve_n_plus_1_problem_by_join_fetch_when_necessary() { var modelMapper = new ModelMapper(); modelMapper.Class<Department>(cm => { cm.Table("Department"); cm.Id(x => x.Id, map => map.Generator(Generators.Identity)); cm.Property(x => x.Name); cm.Bag(x => x.Employees, map => map.Key(k => k.Column("DepartmentId")), rel => rel.OneToMany()); }); MapEmployee(modelMapper); using (var sessionFactory = BuildSessionFactory(modelMapper)) using (var session = sessionFactory.OpenSession()) { // 1 query for retrieve all departments and all employees // Note: the retrieved departments will be duplicated if excluding the 'distinct' keyword var departments = session.CreateQuery("select distinct d from Department d left join fetch d.Employees").List<Department>(); Assert.AreEqual(3, departments.Count); departments.ForEach(department => department.Employees.ForEach(employee => Console.WriteLine(employee.Name))); } }
nHibernate实际执行查询:
select distinct d... from Department department0_ left outer join Employee employees1_ on department0_.Id=employees1_.DepartmentId
实际应用中,往往有些场景只关心department的数据,而不关心department.Employees; 而有些场景恰恰相反,关心department.Employees。上面的例子就很有价值了,默认department.Employees为lazy load,在需要遍历department.Employee的场景使用另外的方法加载department,例如:
public class DepartmentRepository { .... public IList<Department> FindAll() { return session.CreateQuery("select distinct d from Department d left join fetch d.Employees").List<Department>(); } }