LINQ To Objects
记录 LINQ to Objects 的学习
LINQ to Objects 是指可将Linq查询用于继承 IEnumerable 或 IEnumerable<T> 的集合类型,包括框架本身定义的 List、Array、Dictionary,也可以是通过实现上面枚举接口的自定义集合类型。Linq 查询应用在字符串集合上,使得处理文本文件中的半结构化数据时非常有用。
对某个词在字符串上出现的次数统计。(ToLowerInvariant 返回时使用固定区域性的大小写规则。)
string text = @"Historically, the world of data and the world of objects" + @" have not been well integrated. Programmers work in C# or Visual Basic" + @" and also in SQL or XQuery. On the one side are concepts such as classes," + @" objects, fields, inheritance, and .NET Framework APIs. On the other side" + @" are tables, columns, rows, nodes, and separate languages for dealing with" + @" them. Data types often require translation between the two worlds; there are" + @" different standard functions. Because the object world has no notion of query, a" + @" query can only be represented as a string without compile-time type checking or" + @" IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to" + @" objects in memory is often tedious and error-prone."; string searchTerm = "data"; //Convert the string into an array of words string[] source = text.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries); // Create the query. Use ToLowerInvariant to match "data" and "Data" var matchQuery = from word in source where word.ToLowerInvariant() == searchTerm.ToLowerInvariant() select word; // Count the matches, which executes the query. int wordCount = matchQuery.Count(); Console.WriteLine("{0} occurrences(s) of the search term \"{1}\" were found.", wordCount, searchTerm);
查询一组包含指定单词的句子。
string text = @"Historically, the world of data and the world of objects" + @" have not been well integrated. Programmers work in C# or Visual Basic" + @" and also in SQL or XQuery. On the one side are concepts such as classes," + @" objects, fields, inheritance, and .NET Framework APIs. On the other side" + @" are tables, columns, rows, nodes, and separate languages for dealing with" + @" them. Data types often require translation between the two worlds; there are" + @" different standard functions. Because the object world has no notion of query, a" + @" query can only be represented as a string without compile-time type checking or" + @" IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to" + @" objects in memory is often tedious and error-prone."; //句子分隔 string[] sentences = text.Split(new char[] { '.', '?', '!' }); //句子中的单词分隔符 var wordSeperator = new char[] { ' ', ';', ':', ',' }; //句子包含的单词 string[] wordsToMatch = { "Historically", "data", "integrated" }; var sentenceQuery = from sentence in sentences let words = sentence.Split(wordSeperator, StringSplitOptions.RemoveEmptyEntries) where words.Intersect(wordsToMatch).Count() == wordsToMatch.Count() select sentence; foreach (string str in sentenceQuery) { Console.WriteLine(str); }
查询字符串中的字符(String实现了IEnumerable<char>, IEnumerable)
string aString = "ABCDE99F-J74-12-89A"; IEnumerable<char> stringQuery = from ch in aString where char.IsDigit(ch) select ch; // Execute the query foreach (char c in stringQuery) Console.Write(c + " "); // Call the Count method on the existing query. int count = stringQuery.Count(); Console.WriteLine("Count = {0}", count);
LINQ查询与正则表达式合并
namespace ConsoleApp4 { class Program { static void Main(string[] args) { //文件夹路径,注意最后的斜杠 string startFolder = @"C:\Program Files (x86)\Microsoft Visual Studio 14.0\"; IEnumerable<System.IO.FileInfo> fileList = GetFiles(startFolder); Regex searchTerm = new Regex(@"Visual (Basic|C#|C\+\+|Studio)"); var queryMatchingFiles = from file in fileList where file.Extension == ".htm" let fileText = System.IO.File.ReadAllText(file.FullName) let matches = searchTerm.Matches(fileText) //正则匹配文件内容 where matches.Count > 0 select new { path = file.FullName, matcheValues = matches.Select(x => x.Value) }; } static IEnumerable<System.IO.FileInfo> GetFiles(string path) { if (!System.IO.Directory.Exists(path)) throw new System.IO.DirectoryNotFoundException(); string[] fileNames = null; List<System.IO.FileInfo> files = new List<System.IO.FileInfo>(); fileNames = System.IO.Directory.GetFiles(path, "*.*", System.IO.SearchOption.AllDirectories); foreach (string name in fileNames) { files.Add(new System.IO.FileInfo(name)); } return files; } } }
列表求差值
string[] names1 = { "aa","bb","cc"}; string[] names2 = { "aa", "bb", "cc","dd","ee" }; IEnumerable<string> differenceQuery = names2.Except(names1); //names1.Except(names2)结果为空序列 foreach (string s in differenceQuery) Console.WriteLine(s); //ouput: //dd //ee
按任意字段对结构化的文本数据进行排序(结构化的文本,是指数据排列有一定的规律,例如dbf、csv)
假设有一个文件,score.csv。内容如下:
名字,语文,数学,英语
小敏,34,45,56
小希,56,65,77
小花,99,99,99
string[] scores = System.IO.File.ReadAllLines(@"../../../scores.csv"); Console.WriteLine("语文成绩从高到底排序:"); var scoreQuery = from score in scores let fields = score.Split(",") orderby fields[1] //语文成绩在第一列 select score; foreach (string str in scoreQuery) { Console.WriteLine(str); }
合并和比较字符串集合(Union是数学意义上的合并,Concat是简单的联结)
string[] fileA = System.IO.File.ReadAllLines(@"../../../names1.txt"); string[] fileB = System.IO.File.ReadAllLines(@"../../../names2.txt"); IEnumerable<string> concatQuery = fileA.Concat(fileB).OrderBy(s => s); //Concat 合并 IEnumerable<string> uniqueNamesQuery = fileA.Union(fileB).OrderBy(s => s); //Union 合并,使用比较器的合并,不进行重复的合并。
多个数据源填充一个集合
假设names.csv内容如下:
firstname,lastname,id
Omelchenko,Svetlana,111
mingming,chen,112
假设scores.csv内容如下:
id,语,数,英
111,98,23,67,
112,34,90,99
namespace ConsoleApp4 { class Program { static void Main(string[] args) { string[] names = System.IO.File.ReadAllLines(@"../../../names.csv"); string[] scores = System.IO.File.ReadAllLines(@"../../../scores.csv"); IEnumerable<Student> queryNamesScores = from nameLine in names let splitName = nameLine.Split(',') from scoreLine in scores let splitScore = scoreLine.Split(',') where Convert.ToInt32(splitName[2]) == Convert.ToInt32(splitScore[0]) select new Student() { FirstName = splitName[0], LastName = splitName[1], ID = Convert.ToInt32(splitName[2]), ExamScores = splitScore.Skip(1).Select(x => Convert.ToInt32(x)).ToList() }; } } class Student { public string FirstName { get; set; } public string LastName { get; set; } public int ID { get; set; } public List<int> ExamScores { get; set; } } }
使用“分组”与“合并”将一个文件拆成多个文件
假设:names1.txt
Bankov, Peter
Holm, Michael
Garcia, Hugo
Potra, Cristina
Noriega, Fabricio
Aw, Kam Foo
Beebe, Ann
Toyoshima, Tim
Guy, Wey Yuan
Garcia, Debra
names2.txt
Liu, Jinghao
Bankov, Peter
Holm, Michael
Garcia, Hugo
Beebe, Ann
Gilchrist, Beth
Myrcha, Jacek
Giakoumakis, Leo
McLin, Nkenge
El Yassir, Mehdi
string[] fileA = System.IO.File.ReadAllLines(@"../../../names1.txt"); string[] fileB = System.IO.File.ReadAllLines(@"../../../names2.txt"); var mergeQuery = fileA.Union(fileA); var groupQuery = from name in mergeQuery group name by name[0] into g orderby g.Key select g; foreach (var g in groupQuery) { string fileName = @"../../../testFile_" + g.Key + ".txt"; Console.WriteLine(g.Key); using (System.IO.StreamWriter sw = new System.IO.StreamWriter(fileName)) { foreach (var item in g) { sw.WriteLine(item); Console.WriteLine(" {0}", item); } } }
结果:
CSV文本文件计算多列的值
创建score.csv,内容如下:
111, 97, 92, 81, 60 112, 75, 84, 91, 39 113, 88, 94, 65, 91 114, 97, 89, 85, 82 115, 35, 72, 91, 70 116, 99, 86, 90, 94 117, 93, 92, 80, 87 118, 92, 90, 83, 78 119, 68, 79, 88, 92 120, 99, 82, 81, 79 121, 96, 85, 91, 60 122, 94, 92, 91, 91
using System; using System.Collections.Generic; using System.Data; using System.Linq; namespace ConsoleApp4 { class Program { static void Main(string[] args) { string[] lines = System.IO.File.ReadAllLines(@"../../../scores.txt"); int exam = 3; MultiColumns(lines); } private static void MultiColumns(string[] lines) { Console.WriteLine("Multi Column Query:"); IEnumerable<IEnumerable<int>> multiColQuery = from line in lines let elements = line.Split(',') let scores = elements.Skip(1) select (from str in scores select Convert.ToInt32(str)); int columnCount = multiColQuery.First().Count(); for (int column = 0; column < columnCount; column++) { var results2 = from row in multiColQuery select row.ElementAt(column); double average = results2.Average(); int max = results2.Max(); int min = results2.Min(); Console.WriteLine("Exam #{0} Average: {1:##.##} High Score: {2} Low Score: {3}", column + 1, average, max, min); } } } }
量变会引起质变。