正则表达式-捕获组和反向引用

一、捕获组

捕获组是正则中分组的一个概念,若是要对一段字符进行重复,就须要有用到分组,分组在正则中用"()"表示.而后后面能够对这个组进行重复引用。

捕获组分为两类:普通捕获组和命名捕获组。

我们可以通过以下两个简单的demo来体会:

1. 普通捕获组

从正则表达式左侧开始,每出现一个左括号“(”记作一个分组,分组编号从1开始。0表明整个表达式。

public void test8() {
        String DATE_STRING = "2021-07-01";
        String P_COMM = "(\\d{4})-((\\d{2})-(\\d{2}))";
        Pattern pattern = Pattern.compile(P_COMM);
        Matcher matcher = pattern.matcher(DATE_STRING);
        matcher.find();//必需要有这句
        System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0));
        System.out.printf("\nmatcher.group(1) value:%s", matcher.group(1));
        System.out.printf("\nmatcher.group(2) value:%s", matcher.group(2));
        System.out.printf("\nmatcher.group(3) value:%s", matcher.group(3));
        System.out.printf("\nmatcher.group(4) value:%s", matcher.group(4));
    }

打印结果:

matcher.group(0) value:2021-07-01
matcher.group(1) value:2021
matcher.group(2) value:07-01
matcher.group(3) value:07
matcher.group(4) value:01
Process finished with exit code 0

2. 命名捕获组

每一个以左括号开始的捕获组,都紧跟着“?”,然后才是正则表达式。

public void test9() {
        String P_NAMED = "(?<year>\\d{4})-(?<md>(?<month>\\d{2})-(?<date>\\d{2}))";
        String DATE_STRING = "2021-07-01";

        Pattern pattern = Pattern.compile(P_NAMED);
        Matcher matcher = pattern.matcher(DATE_STRING);
        matcher.find();
        System.out.printf("\n===========使用名称获取=============");
        System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0));
        System.out.printf("\n matcher.group('year') value:%s", matcher.group("year"));
        System.out.printf("\nmatcher.group('md') value:%s", matcher.group("md"));
        System.out.printf("\nmatcher.group('month') value:%s", matcher.group("month"));
        System.out.printf("\nmatcher.group('date') value:%s", matcher.group("date"));
        matcher.reset();
        System.out.printf("\n===========使用编号获取=============");
        matcher.find();
        System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0));
        System.out.printf("\nmatcher.group(1) value:%s", matcher.group(1));
        System.out.printf("\nmatcher.group(2) value:%s", matcher.group(2));
        System.out.printf("\nmatcher.group(3) value:%s", matcher.group(3));
        System.out.printf("\nmatcher.group(4) value:%s", matcher.group(4));
    }

程序结果:

===========使用名称获取=============
matcher.group(0) value:2021-07-01
 matcher.group('year') value:2021
matcher.group('md') value:07-01
matcher.group('month') value:07
matcher.group('date') value:01
===========使用编号获取=============
matcher.group(0) value:2021-07-01
matcher.group(1) value:2021
matcher.group(2) value:07-01
matcher.group(3) value:07
matcher.group(4) value:01

3. 非捕获组

在左括号后紧跟“?:”,然后再加上正则表达式,构成非捕获组(?:Expression)

  public void test10() {
        String P_UNCAP = "(?:\\d{4})-((\\d{2})-(\\d{2}))";
        String DATE_STRING = "2021-07-01";

        Pattern pattern = Pattern.compile(P_UNCAP);
        Matcher matcher = pattern.matcher(DATE_STRING);
        matcher.find();
        System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0));
        System.out.printf("\nmatcher.group(1) value:%s", matcher.group(1));
        System.out.printf("\nmatcher.group(2) value:%s", matcher.group(2));
        System.out.printf("\nmatcher.group(3) value:%s", matcher.group(3));

        // Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 4
        System.out.printf("\nmatcher.group(4) value:%s", matcher.group(4));
    }

运行结果:

matcher.group(0) value:2021-07-01
matcher.group(1) value:07-01
matcher.group(2) value:07
matcher.group(3) value:01
java.lang.IndexOutOfBoundsException: No group 4

二、反向引用

1.反向引用须要使用到分组,分组就是使用()括起来的部分为一个总体,在进行分组匹配时的原则是:由外向内,由左向右3d

2.反向引用如:\1,\2等
     \1:表示的是引用第一次匹配到的()括起来的部分
     \2:表示的是引用第二次匹配到的()括起来的部分
 
例:String regex = "^(\\d)\\1$";
       首先这里是匹配两位,\d一位,\1又引用\d一位 这里的\1会去引用(\d)匹配到的内容,由于(\d)是第一次匹配到的内容。
       如:str = "22"时,(\\d)匹配到2,因此\1引用(\\d)的值也为2,因此str="22"能匹配
              str = "23"时,(\\d)匹配到2,由于\1引用(\\d)的值2,而这里是3,因此str="23"不能匹配
 
下面通过一些demo来体会下:
    @Test
    public void test1() {
        String reg = "([a-z]{3}[1-9]{3})[a-z]{3}[1-9]{3}";
        String str = "asd123asd123";
        //常规写法 true
        System.out.println(Pattern.matches(reg, str));
    }

    @Test
    public void test2() {
        String reg = "([a-z]{3}[1-9]{3})\\1";
        String str = "asd123asd123";
        //使用到反向引用 true
        System.out.println(Pattern.matches(reg, str));
    }

    @Test
    public void test3() {
        String str = "1234567123123123";
        // 只能匹配“123123”
        Pattern p = Pattern.compile("(\\d\\d\\d)\\1");
        Matcher m = p.matcher(str);
        // 1
        System.out.println(m.groupCount());
        while (m.find()) {
            String word = m.group();
            // 123123 7 13
            System.out.println(word + " " + m.start() + " " + m.end());
        }
    }

    @Test
    public void test4() {
        String pattern = "\\b(\\w+)\\b[\\w\\W]*\\b\\1\\b";
        Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
        String phrase = "unique is not duplicate but unique, Duplicate is duplicate.";
        Matcher m = p.matcher(phrase);
        while (m.find()) {
            String val = m.group();
            System.out.println("Matching subsequence is \"" + val + "\"");
            System.out.println("Duplicate word: " + m.group(1) + "\n");
        }
    }

    @Test
    public void test5() {
        String reg = "(\\w)(\\w)\\2\\1";
        String str = "abba";
        // true
        System.out.println(Pattern.matches(reg, str));
    }

    @Test
    public void test6() {
        String reg = "(\\w)(\\w)\\2\\1";
        String str = "abba";
        // true
        System.out.println(Pattern.matches(reg, str));
    }

    @Test
    public void test7() {
        String reg = "([a-z]{3})([1-9]{3})\\1\\2";
        String str = "asd123asd123";
        // true
        System.out.println(Pattern.matches(reg, str));

        String reg1 = "([a-z]{3})([1-9]{3})\\2\\1";
        String str1 = "asd123123asd";
        // true
        System.out.println(Pattern.matches(reg1, str1));
    }

 

posted @ 2021-06-30 19:17  未知的九月  阅读(425)  评论(0编辑  收藏  举报