asp.net中正则表达式使用(二)
Regex
Regex 类表示不可变(只读)正则表达式类。它还包含各种静态方法,允许在不显式创建其他类的实例的情况下使用其他正则表达式类。
以下代码示例创建了 Regex 类的实例并在初始化对象时定义一个简单的正则表达式。请注意,使用了附加的反斜杠作为转义字符,它将 \s
匹配字符类中的反斜杠指定为原义字符。
[Visual Basic] ' Declare object variable of type Regex. Dim r As Regex ' Create a Regex object and define its regular expression. r = New Regex("\s2000")
[C#] // Declare object variable of type Regex. Regex r; // Create a Regex object and define its regular expression. r = new Regex("\\s2000");
Match
Match 类表示正则表达式匹配操作的结果。以下示例使用 Regex 类的 Match 方法返回 Match 类型的对象,以便找到输入字符串中的第一个匹配项。此示例使用 Match 类的 Match.Success 属性来指示是否已找到匹配。
[Visual Basic] ' cCreate a new Regex object. Dim r As New Regex("abc") ' Find a single match in the input string. Dim m As Match = r.Match("123abc456") If m.Success Then ' Print out the character position where a match was found. ' (Character position 3 in this case.) Console.WriteLine("Found match at position " & m.Index.ToString()) End If
[C#] // Create a new Regex object. Regex r = new Regex("abc"); // Find a single match in the string. Match m = r.Match("123abc456"); if (m.Success) { // Print out the character position where a match was found. // (Character position 3 in this case.) Console.WriteLine("Found match at position " + m.Index); }
MatchCollection
MatchCollection 类表示成功的非重叠匹配的序列。该集合为不可变(只读)的,并且没有公共构造函数。MatchCollection 的实例是由 Regex.Matches 属性返回的。
以下示例使用 Regex 类的 Matches 方法,通过在输入字符串中找到的所有匹配填充 MatchCollection。该示例将此集合复制到一个字符串数组和一个整数数组中,其中字符串数组用以保存每个匹配项,整数数组用以指示每个匹配项的位置。
[Visual Basic] Dim mc As MatchCollection Dim results(20) As String Dim matchposition(20) As Integer ' Create a new Regex object and define the regular expression. Dim r As New Regex("abc") ' Use the Matches method to find all matches in the input string. mc = r.Matches("123abc4abcd") ' Loop through the match collection to retrieve all ' matches and positions. Dim i As Integer For i = 0 To mc.Count - 1 ' Add the match string to the string array. results(i) = mc(i).Value ' Record the character position where the match was found. matchposition(i) = mc(i).Index Next i
[C#] MatchCollection mc; String[] results = new String[20]; int[] matchposition = new int[20]; // Create a new Regex object and define the regular expression. Regex r = new Regex("abc"); // Use the Matches method to find all matches in the input string. mc = r.Matches("123abc4abcd"); // Loop through the match collection to retrieve all // matches and positions. for (int i = 0; i < mc.Count; i++) { // Add the match string to the string array. results[i] = mc[i].Value; // Record the character position where the match was found. matchposition[i] = mc[i].Index; }
GroupCollection
GroupCollection 类表示捕获的组的集合并返回单个匹配中捕获的组的集合。该集合为不可变(只读)的,并且没有公共构造函数。GroupCollection 的实例在 Match.Groups 属性返回的集合中返回。
以下控制台应用程序示例查找并输出由正则表达式捕获的组的数目。有关如何提取组集合的每一成员中的各个捕获项的示例,请参见下面一节的 Capture Collection 示例。
[Visual Basic] Imports System Imports System.Text.RegularExpressions Public Class RegexTest Public Shared Sub RunTest() ' Define groups "abc", "ab", and "b". Dim r As New Regex("(a(b))c") Dim m As Match = r.Match("abdabc") Console.WriteLine("Number of groups found = " _ & m.Groups.Count.ToString()) End Sub Public Shared Sub Main() RunTest() End Sub End Class
[C#] using System; using System.Text.RegularExpressions; public class RegexTest { public static void RunTest() { // Define groups "abc", "ab", and "b". Regex r = new Regex("(a(b))c"); Match m = r.Match("abdabc"); Console.WriteLine("Number of groups found = " + m.Groups.Count); } public static void Main() { RunTest(); } }
该示例产生下面的输出。
[Visual Basic] Number of groups found = 3
[C#] Number of groups found = 3
CaptureCollection
CaptureCollection 类表示捕获的子字符串的序列,并且返回由单个捕获组执行的捕获的集合。由于限定符,捕获组可以在单个匹配中捕获多个字符串。Captures 属性(CaptureCollection 类的对象)是作为 Match 和 group 类的成员提供的,以便于对捕获的子字符串的集合的访问。
例如,如果使用正则表达式 ((a(b))c)+
(其中 + 限定符指定一个或多个匹配)从字符串“abcabcabc”中捕获匹配,则子字符串的每一匹配的 Group 的 CaptureCollection 将包含三个成员。
以下控制台应用程序示例使用正则表达式 (Abc)+
来查找字符串“XYZAbcAbcAbcXYZAbcAb”中的一个或多个匹配。该示例阐释了使用 Captures 属性来返回多组捕获的子字符串。
[Visual Basic] Imports System Imports System.Text.RegularExpressions Public Class RegexTest Public Shared Sub RunTest() Dim counter As Integer Dim m As Match Dim cc As CaptureCollection Dim gc As GroupCollection ' Look for groupings of "Abc". Dim r As New Regex("(Abc)+") ' Define the string to search. m = r.Match("XYZAbcAbcAbcXYZAbcAb") gc = m.Groups ' Print the number of groups. Console.WriteLine("Captured groups = " & gc.Count.ToString()) ' Loop through each group. Dim i, ii As Integer For i = 0 To gc.Count - 1 cc = gc(i).Captures counter = cc.Count ' Print number of captures in this group. Console.WriteLine("Captures count = " & counter.ToString()) ' Loop through each capture in group. For ii = 0 To counter - 1 ' Print capture and position. Console.WriteLine(cc(ii).ToString() _ & " Starts at character " & cc(ii).Index.ToString()) Next ii Next i End Sub Public Shared Sub Main() RunTest() End Sub End Class
[C#] using System; using System.Text.RegularExpressions; public class RegexTest { public static void RunTest() { int counter; Match m; CaptureCollection cc; GroupCollection gc; // Look for groupings of "Abc". Regex r = new Regex("(Abc)+"); // Define the string to search. m = r.Match("XYZAbcAbcAbcXYZAbcAb"); gc = m.Groups; // Print the number of groups. Console.WriteLine("Captured groups = " + gc.Count.ToString()); // Loop through each group. for (int i=0; i < gc.Count; i++) { cc = gc[i].Captures; counter = cc.Count; // Print number of captures in this group. Console.WriteLine("Captures count = " + counter.ToString()); // Loop through each capture in group. for (int ii = 0; ii < counter; ii++) { // Print capture and position. Console.WriteLine(cc[ii] + " Starts at character " + cc[ii].Index); } } } public static void Main() { RunTest(); } }
此示例返回下面的输出结果。
[Visual Basic] Captured groups = 2 Captures count = 1 AbcAbcAbc Starts at character 3 Captures count = 3 Abc Starts at character 3 Abc Starts at character 6 Abc Starts at character 9
[C#] Captured groups = 2 Captures count = 1 AbcAbcAbc Starts at character 3 Captures count = 3 Abc Starts at character 3 Abc Starts at character 6 Abc Starts at character 9
Group
group 类表示来自单个捕获组的结果。因为 Group 可以在单个匹配中捕获零个、一个或更多的字符串(使用限定符),所以它包含 Capture 对象的集合。因为 Group 继承自 Capture,所以可以直接访问最后捕获的子字符串(Group 实例本身等价于 Captures 属性返回的集合的最后一项)。
Group 的实例是由 Match.Groups(groupnum) 属性返回的,或者在使用“(?<groupname>)”分组构造的情况下,是由 Match.Groups("groupname") 属性返回的。
以下代码示例使用嵌套的分组构造来将子字符串捕获到组中。
[Visual Basic] Dim matchposition(20) As Integer Dim results(20) As String ' Define substrings abc, ab, b. Dim r As New Regex("(a(b))c") Dim m As Match = r.Match("abdabc") Dim i As Integer = 0 While Not (m.Groups(i).Value = "") ' Copy groups to string array. results(i) = m.Groups(i).Value ' Record character position. matchposition(i) = m.Groups(i).Index i = i + 1 End While
[C#] int[] matchposition = new int[20]; String[] results = new String[20]; // Define substrings abc, ab, b. Regex r = new Regex("(a(b))c"); Match m = r.Match("abdabc"); for (int i = 0; m.Groups[i].Value != ""; i++) { // Copy groups to string array. results[i]=m.Groups[i].Value; // Record character position. matchposition[i] = m.Groups[i].Index; }
此示例返回下面的输出结果。
[Visual Basic] results(0) = "abc" matchposition(0) = 3 results(1) = "ab" matchposition(1) = 3 results(2) = "b" matchposition(2) = 4
[C#] results[0] = "abc" matchposition[0] = 3 results[1] = "ab" matchposition[1] = 3 results[2] = "b" matchposition[2] = 4
以下代码示例使用命名的分组构造,从包含“DATANAME:VALUE”格式的数据的字符串中捕获子字符串,正则表达式通过冒号“:”拆分数据。
[Visual Basic] Dim r As New Regex("^(?<name>\w+):(?<value>\w+)") Dim m As Match = r.Match("Section1:119900")
[C#] Regex r = new Regex("^(?<name>\\w+):(?<value>\\w+)"); Match m = r.Match("Section1:119900");
此正则表达式返回下面的输出结果。
[Visual Basic] m.Groups("name").Value = "Section1" m.Groups("value").Value = "119900"
[C#] m.Groups["name"].Value = "Section1" m.Groups["value"].Value = "119900"
Capture
Capture 类包含来自单个子表达式捕获的结果。
以下示例在 Group 集合中循环,从 Group 的每一成员中提取 Capture 集合,并且将变量 posn 和 length 分别分配给找到每一字符串的初始字符串中的字符位置,以及每一字符串的长度。
[Visual Basic] Dim r As Regex Dim m As Match Dim cc As CaptureCollection Dim posn, length As Integer r = New Regex("(abc)*") m = r.Match("bcabcabc") Dim i, j As Integer i = 0 While m.Groups(i).Value <> "" ' Grab the Collection for Group(i). cc = m.Groups(i).Captures For j = 0 To cc.Count - 1 ' Position of Capture object. posn = cc(j).Index ' Length of Capture object. length = cc(j).Length Next j i += 1 End While
[C#] Regex r; Match m; CaptureCollection cc; int posn, length; r = new Regex("(abc)*"); m = r.Match("bcabcabc"); for (int i=0; m.Groups[i].Value != ""; i++) { // Capture the Collection for Group(i). cc = m.Groups[i].Captures; for (int j = 0; j < cc.Count; j++) { // Position of Capture object. posn = cc[j].Index; // Length of Capture object. length = cc[j].Length; } }
private void button1_Click(object sender, System.EventArgs e)
{
Regex regexTest = new Regex(@"[0-9]H",RegexOptions.IgnoreCase);
Match match = regexTest.Match(this.textBox1.Text);
System.Text.StringBuilder strb = new System.Text.StringBuilder();
while (match.Success)
{
if (strb.Length == 0)
{
strb.Append("Match Succeeded!");
}
for (int index = 0;index < groups.Count;index++)
{
strb.Append(Environment.NewLine);
strb.Append(groups[index].Value);
}
match = match.NextMatch();
}
#endregion
{
strb.Append("Match Failed!");
}
this.textBox2.Text = strb.ToString();