正则表达式-捕获组和反向引用
一、捕获组
捕获组是正则中分组的一个概念,若是要对一段字符进行重复,就须要有用到分组,分组在正则中用"()"表示.而后后面能够对这个组进行重复引用。
捕获组分为两类:普通捕获组和命名捕获组。
我们可以通过以下两个简单的demo来体会:
1. 普通捕获组
从正则表达式左侧开始,每出现一个左括号“(”记作一个分组,分组编号从1开始。0表明整个表达式。
public void test8() { String DATE_STRING = "2021-07-01"; String P_COMM = "(\\d{4})-((\\d{2})-(\\d{2}))"; Pattern pattern = Pattern.compile(P_COMM); Matcher matcher = pattern.matcher(DATE_STRING); matcher.find();//必需要有这句 System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0)); System.out.printf("\nmatcher.group(1) value:%s", matcher.group(1)); System.out.printf("\nmatcher.group(2) value:%s", matcher.group(2)); System.out.printf("\nmatcher.group(3) value:%s", matcher.group(3)); System.out.printf("\nmatcher.group(4) value:%s", matcher.group(4)); }
打印结果:
matcher.group(0) value:2021-07-01 matcher.group(1) value:2021 matcher.group(2) value:07-01 matcher.group(3) value:07 matcher.group(4) value:01 Process finished with exit code 0
2. 命名捕获组
每一个以左括号开始的捕获组,都紧跟着“?”,然后才是正则表达式。
public void test9() { String P_NAMED = "(?<year>\\d{4})-(?<md>(?<month>\\d{2})-(?<date>\\d{2}))"; String DATE_STRING = "2021-07-01"; Pattern pattern = Pattern.compile(P_NAMED); Matcher matcher = pattern.matcher(DATE_STRING); matcher.find(); System.out.printf("\n===========使用名称获取============="); System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0)); System.out.printf("\n matcher.group('year') value:%s", matcher.group("year")); System.out.printf("\nmatcher.group('md') value:%s", matcher.group("md")); System.out.printf("\nmatcher.group('month') value:%s", matcher.group("month")); System.out.printf("\nmatcher.group('date') value:%s", matcher.group("date")); matcher.reset(); System.out.printf("\n===========使用编号获取============="); matcher.find(); System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0)); System.out.printf("\nmatcher.group(1) value:%s", matcher.group(1)); System.out.printf("\nmatcher.group(2) value:%s", matcher.group(2)); System.out.printf("\nmatcher.group(3) value:%s", matcher.group(3)); System.out.printf("\nmatcher.group(4) value:%s", matcher.group(4)); }
程序结果:
===========使用名称获取============= matcher.group(0) value:2021-07-01 matcher.group('year') value:2021 matcher.group('md') value:07-01 matcher.group('month') value:07 matcher.group('date') value:01 ===========使用编号获取============= matcher.group(0) value:2021-07-01 matcher.group(1) value:2021 matcher.group(2) value:07-01 matcher.group(3) value:07 matcher.group(4) value:01
3. 非捕获组
在左括号后紧跟“?:”,然后再加上正则表达式,构成非捕获组(?:Expression)
public void test10() { String P_UNCAP = "(?:\\d{4})-((\\d{2})-(\\d{2}))"; String DATE_STRING = "2021-07-01"; Pattern pattern = Pattern.compile(P_UNCAP); Matcher matcher = pattern.matcher(DATE_STRING); matcher.find(); System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0)); System.out.printf("\nmatcher.group(1) value:%s", matcher.group(1)); System.out.printf("\nmatcher.group(2) value:%s", matcher.group(2)); System.out.printf("\nmatcher.group(3) value:%s", matcher.group(3)); // Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 4 System.out.printf("\nmatcher.group(4) value:%s", matcher.group(4)); }
运行结果:
matcher.group(0) value:2021-07-01 matcher.group(1) value:07-01 matcher.group(2) value:07 matcher.group(3) value:01 java.lang.IndexOutOfBoundsException: No group 4
二、反向引用
1.反向引用须要使用到分组,分组就是使用()括起来的部分为一个总体,在进行分组匹配时的原则是:由外向内,由左向右3d
2.反向引用如:\1,\2等
\1:表示的是引用第一次匹配到的()括起来的部分
\2:表示的是引用第二次匹配到的()括起来的部分
例:String regex = "^(\\d)\\1$";
首先这里是匹配两位,\d一位,\1又引用\d一位 这里的\1会去引用(\d)匹配到的内容,由于(\d)是第一次匹配到的内容。
如:str = "22"时,(\\d)匹配到2,因此\1引用(\\d)的值也为2,因此str="22"能匹配
str = "23"时,(\\d)匹配到2,由于\1引用(\\d)的值2,而这里是3,因此str="23"不能匹配
下面通过一些demo来体会下:
@Test public void test1() { String reg = "([a-z]{3}[1-9]{3})[a-z]{3}[1-9]{3}"; String str = "asd123asd123"; //常规写法 true System.out.println(Pattern.matches(reg, str)); } @Test public void test2() { String reg = "([a-z]{3}[1-9]{3})\\1"; String str = "asd123asd123"; //使用到反向引用 true System.out.println(Pattern.matches(reg, str)); } @Test public void test3() { String str = "1234567123123123"; // 只能匹配“123123” Pattern p = Pattern.compile("(\\d\\d\\d)\\1"); Matcher m = p.matcher(str); // 1 System.out.println(m.groupCount()); while (m.find()) { String word = m.group(); // 123123 7 13 System.out.println(word + " " + m.start() + " " + m.end()); } } @Test public void test4() { String pattern = "\\b(\\w+)\\b[\\w\\W]*\\b\\1\\b"; Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE); String phrase = "unique is not duplicate but unique, Duplicate is duplicate."; Matcher m = p.matcher(phrase); while (m.find()) { String val = m.group(); System.out.println("Matching subsequence is \"" + val + "\""); System.out.println("Duplicate word: " + m.group(1) + "\n"); } } @Test public void test5() { String reg = "(\\w)(\\w)\\2\\1"; String str = "abba"; // true System.out.println(Pattern.matches(reg, str)); } @Test public void test6() { String reg = "(\\w)(\\w)\\2\\1"; String str = "abba"; // true System.out.println(Pattern.matches(reg, str)); } @Test public void test7() { String reg = "([a-z]{3})([1-9]{3})\\1\\2"; String str = "asd123asd123"; // true System.out.println(Pattern.matches(reg, str)); String reg1 = "([a-z]{3})([1-9]{3})\\2\\1"; String str1 = "asd123123asd"; // true System.out.println(Pattern.matches(reg1, str1)); }