LINQ之路(1):LINQ基础
本文将从什么是LINQ(What)、为什么使用LINQ(Why)以及如何使用LINQ(How)三个方面来进行说明。
1.什么是LINQ
LINQ(Language Integrated Query)是 Visual Studio 2008 中引入的一组功能,可为 C# 和 Visual Basic 语言语法提供强大的查询功能。 LINQ 引入了标准易学的数据查询和更新模式,可以扩展该方法来支持任何类型的数据存储。 Visual Studio 包括 LINQ 提供程序集,后者支持将 LINQ 与 .NET Framework 集合、SQL Server 数据库、ADO.NET 数据集和 XML 文档结合使用。说明来自MSDN。
2.为什么使用LINQ
我们知道,一个程序一般可能都会有多种数据源,例如:关系型数据库、XML、JSON等。在LINQ出现以前,微软还没有一套成熟的Data Mapping解决方案。我们开始的做法是针对对不同的数据源写相应的Helper类,这大大增加了工作量,而LINQ正是为了解决这个问题产生的,它用一种统一的方式即可在相应的语言中对各种数据进行查询和更新。举个例子:
var query = from x in context.T_Developer
select new
{
Developer = x,
DepartmentName = x.T_Department.Name
};
这是一个LINQ to SQL简单的例子,我们以这种更OO的方式来获取数据库中的数据,而不需要去关心它是如何操作数据库的,而且,当改成其它数据源时,只需稍微改动就能正常运行。
3.如何使用LINQ
使用LINQ之前,首先要知道语法以及标准查询运算符,下面以几个LINQ to Object例子来说明:
var developers = new List<Developer>
{
new Developer {ID = 1, Name = "Jello", DepartmentID = 1},
new Developer {ID = 2, Name = "Taffy", DepartmentID = 2},
new Developer {ID = 3, Name = "Tom", DepartmentID = 1},
new Developer {ID = 4, Name = "Lily", DepartmentID = 1}
};
var query = from x in developers
select new
{
Developer = x,
NameLength = x.Name.Length
};
这是一个基本的LINQ to Object的例子,其中,Developers是内存中的集合。初次看到会觉得很亲切,这不就是SQL么!其实真不是这么回事。
上图对该查询表达式结构做了分解说明,这里涉及到几个知识点:
1.隐式类型的局部变量,如var query
2.匿名类型,如new {...}
3.类型推断,如query变量、x范围变量等,当然,都可以具体指定类型
4.扩展方法及lambda表达式,该查询表达式等价于:
var query = developers.Select(x => new
{
Developer = x,
NameLength = x.Name.Length
});
5.延迟加载,这个特性很重要,试想,当要统计一个4G的文本文件中每行相同的字符数量,如果直接将文本文件加载到内存,这是多么恐怖!而用延迟加载每行将大大减少内存消耗。
接下来,我们将会在上面查询表达式的基础上说明其它标准查询运算符。
3.1 排序和过滤
var query = from x in developers
where x.DepartmentID == 1
orderby x.Name
select new
{
Developer = x,
NameLength = x.Name.Length
};
foreach (var item in query)
{
Console.WriteLine("{0}--{1}", item.Developer.Name, item.NameLength);
}
//Output:
Jello--5
Lily--4
Tom--3
where过滤表达式:过滤表达式会被转化为对Where扩展方法的调用,即当作进入数据流的每一个元素的谓词,只有返回true的元素才能出现在结果序列中。多个where字句时会同时起作用。
orderby排序表达式:orderby后面跟一个或多个排序规则,可以是ascending(默认)/descending,一个orderby字句中有多个排序规则时会转换为对扩展方法OrderBy(Descending)和ThenBy(Descending)的调用。当有多个orderby字句时,起作用的永远是最后一个。
3.2 联接和分组联接
我们知道,在SQL中联接(join)大致可分为inner join、outer join和cross join三种,其中outer join又可分为left outer join、right outer join和full outer join三种。在LINQ中,也有相同意义的实现。先准备数据:
var departments = new List<Department>
{
new Department {ID = 1, Name = "Product Department"},
new Department {ID = 2, Name = "Project Department"}
};
Inner join:
var query = from x in developers
join d in departments
on x.DepartmentID equals d.ID
select new
{
DeveloperName = x.Name,
DepartmentName = d.Name
};
foreach (var item in query)
{
Console.WriteLine("{0}-{1}", item.DeveloperName, item.DepartmentName);
}
//Output:
Jello-Product Department
Taffy-Project Department
Tom-Product Department
Lily-Product Department
Left outer join,是通过使用into分组,然后DefaultIfEmpty()来实现的。为了使效果更明显,这里将ID = 2的Department从departments移除:
var query = from x in developers
join d in departments
on x.DepartmentID equals d.ID into dpts
from ds in dpts.DefaultIfEmpty()
select new
{
DeveloperName = x.Name,
DepartmentName = default(Department) == ds ? "No Department" : ds.Name
};
foreach (var item in query)
{
Console.WriteLine("{0}-{1}", item.DeveloperName, item.DepartmentName);
}
//Output:
Jello-Product Department
Taffy-No Department
Tom-Product Department
Lily-Product Department
Right outer join的实现方式和Left outer join一样。
Full outer join其实Left outer join和Right outer join做Union的结果。
Cross join是一种隐式的行为,就是序列间做笛卡尔乘积:
var query = from x in developers
from d in departments
select new
{
DeveloperName = x.Name,
DepartmentName = d.Name
};
foreach (var item in query)
{
Console.WriteLine("{0}-{1}", item.DeveloperName, item.DepartmentName);
}
//Output:
Jello-Product Department
Jello-Project Department
Taffy-Product Department
Taffy-Project Department
Tom-Product Department
Tom-Project Department
Lily-Product Department
Lily-Project Department
Group join,分组联接其实我们在Left outer join中已经用到了:
var query = from d in departments
join x in developers
on d.ID equals x.DepartmentID into groupedDeveloper
select new
{
Department = d,
GroupedDeveloper = groupedDeveloper,
Count = groupedDeveloper.Count()
};
foreach (var item in query)
{
Console.WriteLine("DepartmentName:{0},Count:{1}", item.Department.Name, item.Count);
foreach (var gd in item.GroupedDeveloper)
{
Console.WriteLine(" DeveloperID:{0},DeveloperName:{1}", gd.ID, gd.Name);
}
}
//Output:
DepartmentName:Product Department,Count:3
DeveloperID:1,DeveloperName:Jello
DeveloperID:3,DeveloperName:Tom
DeveloperID:4,DeveloperName:Lily
DepartmentName:Project Department,Count:1
DeveloperID:2,DeveloperName:Taffy
在左联接查询表达式中,由于LINQ的延迟加载特性,在第一次迭代from字句序列时,会先缓存join字句序列,然后做匹配,之后的from字句序列迭代将从缓存匹配,所以,当一个大序列join一个小序列时,应该将大序列作为from字句序列,而将小序列作为join字句序列,这样小序列被缓存,而大序列由于延迟,内存消耗更小。
3.3 let字句
let字句,用于进行中间计算:
var query = from x in developers
let nameLength = x.Name.Length
orderby nameLength
select new
{
Developer = x,
NameLength = nameLength
};
foreach (var item in query)
{
Console.WriteLine("{0}--{1}", item.Developer.Name, item.NameLength);
}
//Output:
Tom--3
Lily--4
Jello--5
Taffy--5
看起来很神奇,转化为等价的扩展方法就一目了然:
var list = developers.Select(x => new { Developer = x, nameLength = x.Name.Length })
.OrderBy(x => x.nameLength)
.Select(x => new { Developer = x.Developer, NameLength = x.nameLength });
3.4 Group分组
Group分组,使用 group [投影] by [分组] 字句进行分组:
var query = from x in developers
group new { x.ID, x.Name } by x.DepartmentID;
foreach (var group in query)
{
Console.WriteLine("DepartmentID:{0}", group.Key);
foreach (var item in group)
{
Console.WriteLine(" {0}--{1}", item.ID, item.Name);
}
}
//Output:
DepartmentID:1
1--Jello
3--Tom
4--Lily
DepartmentID:2
2--Taffy
当我们想要使用另外一个投影来延续分组结果的时候,就要用到group by into来进行查询延续了。
var query = from x in developers
group x by x.DepartmentID into groupedDeveloper
select new
{
DepartmentID = groupedDeveloper.Key,
Count = groupedDeveloper.Count()
};
foreach (var item in query)
{
Console.WriteLine("DepartmentID:{0},Count:{1}", item.DepartmentID, item.Count);
}
//Output:
DepartmentID:1,Count:3
DepartmentID:2,Count:1
最后举一个稍微复杂点儿的例子:开发人员按部门分组,哪组人多哪组靠前排列,然后同部门姓名短的靠前排列。
var query = from x in developers
group x by x.DepartmentID into groupDeveloper
let count = groupDeveloper.Count()
from g in groupDeveloper
let length = g.Name.Length
orderby count descending,length
select g;
foreach (var item in query)
{
Console.WriteLine("[{0}]{1}--{2}", item.DepartmentID, item.ID, item.Name);
}
//Output:
[1]3--Tom
[1]4--Lily
[1]1--Jello
[2]2--Taffy