模式匹配文本处理 - 超薄

使用正则表达式在System.TextRegularExpression

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

Regex reg = new Regex("the");

string str1 = "the quick brown fox jumped over the lazy dog";

Match matchSet;

int matchPos;

matchSet = reg.Match(str1);

if (matchSet.Success)

{

matchPos = matchSet.Index;

Console.WriteLine("found match at position:" + matchPos);

}

if (Regex.IsMatch(str1, "the"))

{

Match aMatch;

aMatch = reg.Match(str1);

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

Regex reg = new Regex("the");

string str1 = "the quick brown fox jumped over the lazy dog";

MatchCollection matchSet;

matchSet = reg.Matches(str1);

if (matchSet.Count > 0)

foreach (Match aMatch in matchSet)

Console.WriteLine("found a match at: " + aMatch.Index);

Console.Read();

}

数量词

(+)这个数量词说明正则表达式应该匹配一个或者多次紧接其前的字符。

(*)这个数量词说明正则表达式应该匹配零个或者多次紧接其前的字符。//实践中非常难用，会导致匹配太多

(?)这个数量词说明正则表达式应该匹配零次或者多次紧接其前的字符。

{N} 这个数量词指定要匹配的数量。

{m,n}这个数量词指定最小，做大匹配数量。也可以{m,},{,n}只指定最大和最小。

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string[] words = new string[]{"Part", "of", "this","string", "is", "bold"};

string regExp = "<.*>"; // 应该修改成 <.+?> +仅使用这个是不行的。. (.)句点表示与任意字符匹配

MatchCollection aMatch;

foreach (string word in words)

{

if (Regex.IsMatch(word, regExp))

{

aMatch = Regex.Matches(word, regExp);

for (int i = 0; i < aMatch.Count; i++)

Console.WriteLine(aMatch[i].Value);

}

原本期望这个程序就返回两个标签：和 但由于贪心，正则返回了string 。利用惰性量词(?) 可以解决 <.+?> 仅适用+ 是不行的，必须加惰性量词?

使用字符类

句点(.)的通常是用它在字符内部定义字符范围，也就是用来限定字符串的开始/结束字符。

句点匹配任意字符。

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string str1 = "the quick brown fox jumped over the lazy dog one time";

MatchCollection matchSet;

matchSet = Regex.Matches(str1, "t.e");

foreach (Match aMatch in matchSet)

Console.WriteLine("Matches at: " + aMatch.Index);

}

检查字符组的模式，([])。在方括号内的字符称为“字符类”

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string str1 = "THE quick BROWN fox JUMPED over THE lazy DOG";

MatchCollection matchSet;

matchSet = Regex.Matches(str1, "[a-z]");

foreach (Match aMatch in matchSet)

Console.WriteLine("Matches at: " + aMatch.Index);

}

[]A-Za-z] 所有英文字母大小写

字符类前面放(^) 表示字符类的反或者否定如[aeiou] 表示元音，那么[^aeiou]表示非元音。

[]A-Za-z0-9 ]表示单词，也可以用\w表示,用\W表示\w的反向即非单词。

[0-9]可以用\d 表示

[^0-9] 表示\D

\s 表示空格 \S表示非空格。

断言

(^) 在开始处匹配

($) 在结束处匹配

\b 在开始结束匹配

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string[] words = new string[] { "heal", "heel", "noah", "techno" };

string regExp = "^h";

Match aMatch;

foreach (string word in words)

if (Regex.IsMatch(word, regExp))

{

aMatch = Regex.Match(word, regExp);

Console.WriteLine("Matched: " + word + " at position: " + aMatch.Index);

}

string regExp = "h$";

string words = "hark, what doth thou say, Harold? ";

string regExp = "\\bh";

使用分组构造

1 匿名分组

通过括号内围绕的正则表达式就可以组成组

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string words = "08/14/57 46 02/25/59 45 06/05/85 18" + "03/12/88 16 09/09/90 13";

string regExp1 = "(\\s\\d{2}\\s)";

MatchCollection matchSet = Regex.Matches(words,regExp1);

foreach (Match aMatch in matchSet)

Console.WriteLine(aMatch.Groups[0].Captures[0]);

}

2 命名组

命名组通过在正则表达式前缀的问号和一对尖括号扩着的名字组成。

例如 "ages "中的组名

正则如下(?<ages>\\s\\d{2}\\s)

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string words = "08/14/57 46 02/25/59 45 06/05/85 18 " + "03/12/88 16 09/09/90 13";

string regExp1 = "(?<dates>(\\d{2}/\\d{2}/\\d{2}))\\s";

MatchCollection matchSet = Regex.Matches(words,regExp1);

foreach (Match aMatch in matchSet)

Console.WriteLine("Date: {0}", aMatch.Groups["dates"]);

}

零宽度正向预搜索断言和零宽度反向预搜索断言

断言还可以用来确定正则表达式向前或者向后匹配程度，这些断言可能是正（匹配模式），也能是负的(非匹配模式)。

(?=reg-exp-char)

string words = "lions lion tigers tiger bears,bear";

string regExp1 = "\\w+(?=\\s)"; \\ 只匹配当前子表达式在指定位置右侧，那么匹配就继续。

负的正向预搜索断言，只要搜索到不匹配的当前表达式的指定位置右侧，那么断言就继续。

string words = "subroutine routine subprocedure procedure";

string regExp1 = "\\b(?!sub)\\w+\\b";

反向预搜索断言

只要字表达式不匹配在位置左侧，那么负的反向与搜索断言就继续。

string words = "subroutines routine subprocedures

procedure";

string regExp1 = "\\b\\w+(?<=s)\\b";

string regExp1 = "\\b\\w+(?<!s)\\b";

CaptureCollection 类

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string dates = "08/14/57 46 02/25/59 45 06/05/85 18 " + "03/12/88 16 09/09/90 13";

string regExp = "(?<dates>(\\d{2}/\\d{2}/\\d{2}))\\s(?<ages>(\\d{2}))\\s";

MatchCollection matchSet;

matchSet = Regex.Matches(dates, regExp);

Console.WriteLine();

foreach (Match aMatch in matchSet)

{

foreach (Capture aCapture in aMatch.Groups["dates"].Captures)

Console.WriteLine("date capture: " + aCapture.ToString());

foreach (Capture aCapture in aMatch.Groups["ages"].Captures)

Console.WriteLine("age capture: " + aCapture.ToString());

}

正则表达式选项

matchSet = Regex.Matches(dates, regexp, RegexOptions.Multiline);

发表于 2012-02-11 20:15 超薄阅读(977) 评论(0) 编辑收藏举报