C#中的正则表达式 - 路见不平

      正则表达式是我喜欢的东西，正则表达式在字符串处理方面，除了性能问题外，它有着无可比拟的优势。C#中引入了正则表达式的类库，给我们带来了很大的方便。
      要在C#中使用正则表达式，需要引用如下的名字空间：

      using System.Text;
      using System.Text.RegularExpressions;

      在C#中使用正则表达式，最重要的类是Regex。Regex对象的构造函数参数中最常见的一个是正则表达式的串，另外一个经常使用的是RegexOptions。正则表达式的字符串和正则表达式的语法这里就不罗嗦了，大家有兴趣的话，可以给我写邮件和我讨论讨论。关于RegexOptions，下面是其中一些主要参数的介绍：
      RegexOptions.Compiled         让C#把正则表达式编译成一个Assembly，这样可以在执行正则的时候启动的更快。这个参数需要注意的事情是，使用这个选项时，正则表达式一定是静态的字符串，而不能使动态的字符串（可以想象，动态字符串是不会有任何效果的）。
      RegexOptions.IgnoreCase      让正则表达式匹配的时候忽略大小写。
      RegexOptions.Multiline           多行模式正则匹配。
      RegexOptions.None               不指定任何的选项。
      RegexOptions.RightToLeft      从右向左开始匹配。
      RegexOptions.SingleLine        单行模式正则匹配。

      下面讨论正则表达式的使用：
       Regex token = new Regex(@"((?<protocol>[a-zA-Z]*?)://)?(?<domain>[^/]*)(?<path>.*)", RegexOptions.Compiled);
       Match matchList = token.Match("http://www.sina.com.cn/index.html");

       if (matchList.Success)
       {
                Console.WriteLine( "Protocol:{0},Domain:{1},Path:{2}" ,
                        matchList.Groups["protocol"] ,
                        matchList.Groups["domain"] ,
                        matchList.Groups["path"]) ;
       }

      上面的是匹配单行的情况，下面的例子是演示的匹配多个结果的情况：

       Regex token = new Regex(@"\s*((?<protocol>\w*?)://)?(?<domain>[^\/\s]+)(?<path>[^\s]*)", RegexOptions.Compiled);
       MatchCollection matches = token.Matches("http://www.sina.com.cn/index.html http://www.microsoft.com ftp://ftp.cav.com/info.zip");

       if (matches.Count != 0)
       {
                foreach (Match matchList in matches)
                {
                    Console.WriteLine("Protocol:{0},Domain:{1},Path:{2}",
                            matchList.Groups["protocol"],
                            matchList.Groups["domain"],
                            matchList.Groups["path"]);

                    foreach (Capture c in matchList.Captures)
                    {
                        Console.WriteLine("Capture:[" + c + "]");
                    }
                }
       }

      下面是一个Replace的用法，主要目的是为了演示MatchEvaluator.
       public static string ReplaceEva(Match match)
       {
            return match.Groups["protocol"].Value + "://" + "www.google.com/" + match.Groups["path"].Value;
       }

      Regex token = new Regex(@"\s*((?<protocol>\w*?)://)?(?<domain>[^\/\s]+)(?<path>[^\s]*)", RegexOptions.Compiled);
      string strNew = token.Replace("www.sohu.com/index.htm",new MatchEvaluator(ReplaceEva));
      Console.WriteLine("New Line:{0}", strNew);

      MatchEvaluator是一个函数代理，用来处理替换过程中得到的每一个Match。返回的字符串就是替换后的字符串。

      最后演示以下最简单的使用正则进行Splite的操作，例子代码如下：

      Regex r = new Regex("(-)");
      string[] s = r.Split("one-two-banana");

OK, That's all.希望能够对大家有所帮助。

发表于 2007-04-17 11:26 路见不平阅读(759) 评论(0) 编辑收藏举报