曹永思

导航

asp.net正则表达式学习例子

asp.net 获取网页Document时常会用到

edited by:曹永思-博客园

1、获取某个class的div内的标签

获取<div class="imgList2">****</div>内的标签

方法一:

 string g = " <div.*?class=\"imgList2\">(?<html>[\\s\\S]*?)</div>";
            Regex reg = new Regex(g, RegexOptions.None);
            MatchCollection mc = reg.Matches(strResult);
            string v = "";
            foreach (Match m in mc)
            {
                v += m.Value + "\r\n";
            }
View Code

方法二(通用方法,获取指定前后内容之间的内容):

string list_a_group_str = GetValue(strResult.Trim(), "<div class=\"imgList2\">", "</div>");
  public static string GetValue(string str, string start, string end)
        {
            Regex regex = new Regex(string.Concat(new string[]    {
        "(?<=(",
        start,
        "))[.\\s\\S]*?(?=(",
        end,
        "))"
    }), RegexOptions.Multiline | RegexOptions.Singleline);
            return regex.Match(str).Value;
        }
View Code

2、获取所有a标签的href和text

获取<div class="page both\"></div>里所有a标签的href和text

string list_page_group_str = GetValue(strResult.Trim(), "<div class=\"page both\">", "</div>");
            Regex reg = new Regex(@"(?is)<a(?:(?!href=).)*href=(['""]?)(?<url>[^""\s>]*)\1[^>]*>(?<text>(?:(?!</?a\b).)*)</a>");
            MatchCollection mc = reg.Matches(list_page_group_str);
            foreach (Match m in mc)
            {
                string url = m.Groups["url"].Value + "\n";
                string text = m.Groups["text"].Value + "\n";
            }
View Code

 

posted on 2015-09-02 10:00  曹永思  阅读(607)  评论(0编辑  收藏  举报