正则表达式 Regex Java 案例
目录
正则表达式 Regex Java 案例
实用案例
- 查找中文:
[^\x00-\xff]
- 去除多余空行,两个段落之间仅保留一个空行:多次将
\n\n
替换为\n
- MarkDown 格式的换行:
- 要求:两个
中文段落
中间如果没有空行
,则加空行;英文段落因为都是代码,所以不加 - 将
([^\x00-\xff]\n)([^\x00-\xff])
替换为$1\n$2
- 要求:两个
Java 中的反斜杠
在其他语言的正则表达式中,\\
表示:我想要在正则表达式中插入一个普通的(字面上的)反斜杠
,请不要给它任何特殊的意义。
而在 Java 的正则表达式中,\\
表示:我要插入一个正则表达式的反斜线
,所以其后的字符具有特殊的意义。
所以,在其他的语言中,一个反斜杠(\
)就足以具有转义的作用,而在 Java 的正则表达式中则需要有两个反斜杠(\\
)才能被解析为其他语言中的转义作用。也可以简单的理解在 Java 的正则表达式中,两个反斜杠(\\
)代表其他语言中的一个反斜杠(\
),例如,表示一位数字的正则表达式是\\d
,而表示一个普通的反斜杠是\\\\
。
//在字符串中需要用【\\】表示一个普通的反斜杠【\】,而在正则表达式中需要用【\\\\】表示一个转义后的、普通的反斜杠【\】
String string = "a\\b\\c";
System.out.println(string); //【a\b\c】
System.out.println(string.replace("\\", "_\\\\_")); //【a_\\_b_\\_c】
System.out.println(string.replaceAll("\\\\", "_\\\\\\\\_")); //【a_\\_b_\\_c】
String 类中的方法
常用方法总结:
contains
:普通查找(非正则)matches
:判断是否完全匹配(正则)split
:字符串切割(正则)replace
:替换所有匹配的字串(非正则)replaceAll
:替换所有匹配的字串(正则)replaceFirst
:替换首个匹配的字串(正则)
contains 普通查找
//Returns true if and only if this string contains the specified sequence of char values.
public boolean contains(CharSequence s) {
return indexOf(s.toString()) > -1;
}
String string = "abcd";
System.out.println(string.contains("ab") + ", " + string.contains("ab.*")); //true, false
matches 正则完全匹配
功能:判断当前 String 是否完全匹配
指定的正则表达式
源码
//String
//Tells whether or not this string matches the given regular expression
public boolean matches(String regex) {
return Pattern.matches(regex, this);
}
//Pattern
//Compiles the given regular expression and attempts to match the given input against it.
public static boolean matches(String regex, CharSequence input) {
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
return m.matches();
}
//Matcher
//return true if, and only if, the entire region sequence matches this matcher's pattern
public boolean matches() {
return match(from, ENDANCHOR);
}
由此可见,String#matches
方法的功能是:判断当前String是否完全匹配
指定的正则表达式(而不是判断是否包含
匹配指定正则表达式的字串)。其和直接调用Pattern#matches
方法或Matcher#matches
方法类似。
案例
String string = "abcd";
System.out.println(string.matches("bc") + ", " + string.matches("bc.*") + ", " + string.matches(".*bc.*")); //false, false, true
//匹配手机号码是否正确
String string = "13512345678";
String regex = "1[358]\\d{9}";//【1[358]\d{9}】
boolean isMatch = string.matches(regex) && Pattern.matches(regex, string) && Pattern.compile(regex).matcher(string).matches();
System.out.println(isMatch + ", " + "12512345678".matches(regex));//true, false
split 正则切割
其和直接调用Pattern#split
方法类似。
源码
//String
//return the array of strings computed by splitting this string around matches of the given regular expression
//返回通过将字符串拆分为给定正则表达式的匹配项而计算出的字符串数组
public String[] split(String regex) {
return split(regex, 0);
}
public String[] split(String regex, int limit) {
//...
return Pattern.compile(regex).split(this, limit);
}
//Pattern
public String[] split(CharSequence input) {
return split(input, 0);
}
public String[] split(CharSequence input, int limit) {
//ArrayList<String> matchList = new ArrayList<>();
//...
return matchList.subList(0, resultSize).toArray(result);
}
案例
String string = "000|成功|100";
String regex = "\\|";
String[] splitStrs = Pattern.compile(regex).split(string);
String[] splitStrs2 = string.split(regex);
System.out.println(Arrays.equals(splitStrs, splitStrs2));//true
System.out.println(Arrays.toString(splitStrs));//[000, 成功, 100]
String regex = "(.)\\1{2,}";//【(.)\1{2,}】注意引用分组时要加转义字符
String string = "zhanggangtttxiaoqiangmmmmmmzhaoliu";
String[] splitStrs = string.split(regex);
System.out.println(Arrays.toString(splitStrs));//[zhanggang, xiaoqiang, zhaoliu]
探究参数 limit
测试参数 limit
对结果的影响:
- If the limit n is
greater than zero
then the pattern will be applied at mostn-1
times, the array's length will beno greater than n
, and the array's last entry will contain all input beyond the last matched delimiter. 除了最后匹配的定界符 - If n is
non-positive
then the pattern will be applied as many times as possible and the array can haveany
length. - If n is
zero
then the pattern will be applied as many times as possible, the array can haveany
length, and trailing empty strings will be discarded. 尾部的空字符串将被丢弃
String string = "boo:and:foo";
String regex = "o";
for (int i = -1; i <= string.length(); i++) {
String[] results = string.split(regex, i);
String result = new Gson().toJson(results);
System.out.println("| " + regex + " | " + i + " | " + results.length + " | " + result + " |");
}
Regex | Limit | Length | Result |
---|---|---|---|
: | -1 | 3 | ["boo","and","foo"] |
: | 0 | 3 | ["boo","and","foo"] |
: | 1 | 1 | ["boo:and:foo"] |
: | 2 | 2 | ["boo","and:foo"] |
: | 3 | 3 | ["boo","and","foo"] |
: | 4+ | 3 | ["boo","and","foo"] |
o | -1 | 5 | ["b","",":and:f","",""] |
o | 0 | 3 | ["b","",":and:f"] |
o | 1 | 1 | ["boo:and:foo"] |
o | 2 | 2 | ["b","o:and:foo"] |
o | 3 | 3 | ["b","",":and:foo"] |
o | 4 | 4 | ["b","",":and:f","o"] |
o | 5 | 5 | ["b","",":and:f","",""] |
o | 6+ | 5 | ["b","",":and:f","",""] |
replace 普通替换全部
源码
//Returns a string resulting from replacing all occurrences of oldChar in this string with newChar.
public String replace(char oldChar, char newChar) {}
//Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end.
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL) //使用 Pattern.LITERAL 模式
.matcher(this)
.replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
//Matcher
//The String produced will match the sequence of characters in s treated as a literal sequence. Slashes('\') and dollar-signs('$') will be given no special meaning.
public static String quoteReplacement(String s) {
if ((s.indexOf('\\') == -1) && (s.indexOf('$') == -1))
return s; //不包含\或$时直接返回
StringBuilder sb = new StringBuilder();
for (int i=0; i<s.length(); i++) {
char c = s.charAt(i);
if (c == '\\' || c == '$') {
sb.append('\\'); //把字符串中出现的\或$的前面再添加一个\,目的是将\和$转义为普通字符
}
sb.append(c);
}
return sb.toString();
}
这里使用的是 Pattern.LITERAL
模式:
- When this
flag
is specified then the input string that specifies the pattern is treated as a sequence ofliteral characters
. Metacharacters元字符 or escape-sequences转义序列 in the input sequence will be given no special meaning没有特殊含义. - The flags
CASE_INSENSITIVE
andUNICODE_CASE
retain their impact on matching保留对匹配的影响 when used in conjunction一起 with this flag. The other flags become superfluous多余.
案例
System.out.println("aa-aaa".replace("aa", "b")); //b-ba
String str = "普通替换\\和$以及\\$和*";//【普通替换\和$以及\$和*】
System.out.println(str.replace("\\", "-")); //【普通替换-和$以及-$和*】
System.out.println(str.replace("$", "-")); //【普通替换\和-以及\-和*】
System.out.println(str.replace("\\$", "-")); //【普通替换\和$以及-和*】
System.out.println(str.replace("*", "$")); //【普通替换\和$以及\$和$】
replaceAll 正则替换全部
源码
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
Note that backslashes(反斜杠
\
) and dollar-signs(美元符号$
) in thereplacement
string may cause the results to be different不同 than if it were being treated as a literal文字replacement
string.
Dollar-signs may be treated as references to captured-subsequences捕获的子序列, and backslashes are used to escape转义 literal characters in the
replacement
string.
案例
//特殊符号【\和$】
String str = "注意\\和$呀"; //【注意\和$呀】
System.out.println(str.replaceAll("注意", "注\\\\意")); //【注\意\和$呀】
System.out.println(str.replaceAll("\\$", "\\\\")); //【注意\和\呀】
System.out.println(str.replaceAll("$$$$$$$$$", "\\\\")); //【注意\和$呀\】这个怎么解释?
System.out.println(str.replaceAll("\\\\", "\\$")); //【注意$和$呀】
System.out.println("aabfooaabfooabfoob".replaceAll("a*b", "-"));//-foo-foo-foo-
//电话号码脱敏
String regex = "(\\d{3})(\\d{4})(\\d{4})"; //【(\d{3})(\d{4})(\d{4})】$1代表引用第一组中的内容
System.out.println("15800001111".replaceAll(regex, "$1$2****"));//1580000****
System.out.println("15800001111".replaceAll(regex, "$1****$3"));//158****1111
//去除叠词
String str = "我我.我我要...要要要要...要.学学学..学学编编...编编..编..程程...程程...";
System.out.println(str.replaceAll("\\.+", "").replaceAll("(.)\\1+", "$1")); //我要学编程
//让IP地址的每一段的位数相同(后续可以用来排序)
String ipString = "192.168.0.11 3.0.25.3";
String ipStringFillZeros = ipString.replaceAll("(\\d+)", "00$1");//先补2个零
System.out.println(ipStringFillZeros);//00192.00168.000.0011 003.000.0025.003
String ipStringCertainLength = ipStringFillZeros.replaceAll("0*(\\d{3})", "$1"); //然后每一段保留数字3位
System.out.println(ipStringCertainLength);//192.168.000.011 003.000.025.003
String ipStringResult = ipStringCertainLength.replaceAll("0*(\\d+)", "$1"); //处理完之后再去掉多余的0
System.out.println(ipStringResult); //192.168.0.11 3.0.25.3
replaceFirst 正则替换首个
//Replaces the first substring of this string that matches the given regular expression with the given replacement.
public String replaceFirst(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceFirst(replacement);
}
java.util.regex 包简介
java.util.regex
包主要包括以下三个类:
Pattern
:是一个正则表达式的编译表示。Pattern 类没有公共构造方法,必须通过调用其公共静态compile
方法返回一个 Pattern 对象。Matcher
:是对输入字符串进行解释和匹配操作的引擎。Matcher 类也没有公共构造方法,必须通过调用 Pattern 对象的matcher
方法来获得一个 Matcher 对象。PatternSyntaxException
:一个非强制异常类,它表示一个正则表达式模式中的语法错误。
案例1:字符串匹配
String regex = "\\b[a-z]{3}\\b";//【\b[a-z]{3}\b】匹配由三个字母组成的单词
String str = "da jia hao, ming tian bu fang jia!";
Matcher m = Pattern.compile(regex).matcher(str);
while (m.find()) {
System.out.println(m.group());
}
jia
hao
jia
案例2:分组
String regex = "([a-z]+)(\\d+)"; //【([a-z]+)(\d+)】匹配所有 字母+数字
String line = "bqt20094哈哈abc789";
Matcher m = Pattern.compile(regex).matcher(line);
System.out.println("【" + regex + "】 groupCount = " + m.groupCount());
while (m.find()) {
System.out.println("成功匹配到:" + m.group() + ",子串位置:[" + m.start() + "," + m.end()+"]");
for (int i = 0; i <= m.groupCount(); i++) {
System.out.println("\tgroup " + i + " : " + m.group(i));
}
}
【([a-z]*)(\d+)】 groupCount = 2
成功匹配到:bqt20094,子串位置:[0,8]
group 0 : bqt20094
group 1 : bqt
group 2 : 20094
成功匹配到:abc789,子串位置:[10,16]
group 0 : abc789
group 1 : abc
group 2 : 789
Pattern
静态方法
Pattern compile(String regex) //将给定的正则表达式编译到模式中
Pattern compile(String regex, int flags) //将给定的正则表达式编译到具有给定`标志`的模式中
boolean matches(String regex, CharSequence input) //编译给定正则表达式并尝试将给定输入与其匹配。
String quote(String s) //返回指定 String 的字面值模式 String
普通方法
Predicate<String> asPredicate()
int flags() //返回此模式的匹配标志
Matcher matcher(CharSequence input) //创建匹配给定输入与此模式的匹配器
String pattern() //返回在其中编译过此模式的正则表达式
String[] split(CharSequence input) //围绕此模式的匹配拆分给定输入序列
String[] split(CharSequence input, int limit) //围绕此模式的匹配拆分给定输入序列
Stream<String> splitAsStream(final CharSequence input)
String toString() //返回此模式的字符串表示形式
Matcher
API
静态方法
String quoteReplacement(String s) //返回指定 String 的字面值替换 String
boolean
boolean find() //尝试查找与该模式匹配的输入序列的下一个子序列
boolean find(int start) //重置此匹配器,然后尝试查找匹配该模式、从指定索引开始的输入序列的下一个子序列
boolean hasAnchoringBounds() //查询此匹配器区域界限的定位
boolean hasTransparentBounds() //查询此匹配器区域边界的透明度
boolean hitEnd() //如果匹配器执行的最后匹配操作中搜索引擎遇到输入结尾,则返回 true
boolean lookingAt() //尝试将从区域开头开始的输入序列与该模式匹配
boolean matches() //尝试将整个区域与模式匹配
boolean requireEnd() //如果很多输入都可以将正匹配更改为负匹配,则返回 true
int
int start() //返回以前匹配的初始索引
int start(int group) //返回在以前的匹配操作期间,由给定组所捕获的子序列的初始索引
int start(String name)
int end() //返回最后匹配字符之后的偏移量
int end(int group) //返回在以前的匹配操作期间,由给定组所捕获子序列的最后字符之后的偏移量
int end(String name)
int groupCount() //返回此匹配器模式中的捕获组数
int regionStart() //报告此匹配器区域的开始索引
int regionEnd() //报告此匹配器区域的结束索引(不包括)
String
StringBuffer appendTail(StringBuffer sb) //实现终端添加和替换步骤
String group() //返回由以前匹配操作所匹配的输入子序列
String group(int group) //返回在以前匹配操作期间由给定组捕获的输入子序列
String group(String name)
String replaceAll(String replacement) //替换模式与给定替换字符串相匹配的输入序列的每个子序列
String replaceFirst(String replacement) //替换模式与给定替换字符串匹配的输入序列的第一个子序列
String toString() //返回匹配器的字符串表示形式
Matcher
Matcher appendReplacement(StringBuffer sb, String replacement) //实现非终端添加和替换步骤
Matcher region(int start, int end) //设置此匹配器的区域限制
Matcher reset() //重置匹配器
Matcher reset(CharSequence input) //重置此具有新输入序列的匹配器
Matcher useAnchoringBounds(boolean b) //设置匹配器区域界限的定位
Matcher usePattern(Pattern newPattern) //更改此 Matcher 用于查找匹配项的 Pattern
Matcher useTransparentBounds(boolean b) //设置此匹配器区域边界的透明度
其他
Pattern pattern() //返回由此匹配器解释的模式
MatchResult toMatchResult() //作为 MatchResult 返回此匹配器的匹配状态
matches 和 lookingAt 方法
matches 和 lookingAt 方法都用来尝试匹配一个输入序列模式。不同的是,matches 要求整个序列都匹配,而 lookingAt 不需要整句都匹配,只需要从第一个字符开始匹配。
这两个方法经常在输入字符串的开始
使用。
String regex = "foo";
String input = "fooo";
String input2 = "ofoo";
Matcher matcher = Pattern.compile(regex).matcher(input);
Matcher matcher2 = Pattern.compile(regex).matcher(input2);
//lookingAt():对前面的字符串进行匹配,只要最前面的字符串能匹配到就返回 true
System.out.println(matcher.matches() + ", " + matcher.lookingAt()); //false, true
System.out.println(matcher2.matches() + ", " + matcher2.lookingAt()); //false, false
start end group 方法调用条件
find()
方法的注释:
- Attempts to find the next subsequence of the input sequence that matches the pattern.
- This method starts at the beginning of this matcher's region, or, if a previous invocation调用 of the method was successful and the matcher has not since been reset, at the first character not matched by the previous match.
- If the match succeeds then more information can be obtained via the
start
,end
, andgroup
methods. - Returns: true if, and only if, a subsequence of the input sequence matches this matcher's pattern
matches
方法的注释:
- Attempts to match the entire region against the pattern.
- If the match succeeds then more information can be obtained via the
start
,end
, andgroup
methods.
lookingAt
方法的注释:
- Attempts to match the input sequence, starting at the beginning of the region, against the pattern.
- Like the matches method, this method always starts at the beginning of the region; unlike that method, it does not require that the entire region be matched.
- If the match succeeds then more information can be obtained via the
start
,end
, andgroup
methods.
所以,调用start
, end
, group
方法之前,一定要确保调用了find
、matches
、lookingAt
方法之一、且返回值为true,否则会抛 IllegalStateException: No match available
!
String regex = "foo";
String input = "fooo";
Matcher matcher = Pattern.compile(regex).matcher(input);
System.out.println(matcher.find() + ", " + matcher.find() + ", " + matcher.find()); //true, false, false
System.out.println(matcher.matches() + ", " + matcher.lookingAt()); //false, true
System.out.println(matcher.find() + ", " + matcher.find() + ", " + matcher.find()); //false, false, false
if (matcher.lookingAt()) {
System.out.println(matcher.group() + ": " + matcher.start() + "-" + matcher.end());//foo: 0-3
}
append* 方法
Matcher 类也提供了 appendReplacement
和 appendTail
方法用于文本替换,可以先后使用这两个方法将结果收集到现有的字符串缓冲区中。
String regex = "a*b";
String input = "aabfooabfoobkkk";
String replace = "-";
Matcher m = Pattern.compile(regex).matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, replace); //实现非终端添加和替换步骤
System.out.println(m.group() + ", " + sb.toString()); //[aab, ab, b]
}
m.appendTail(sb); //实现终端添加和替换步骤
System.out.println(sb.toString()); //-foo-foo-kkk
aab, -
ab, -foo-
b, -foo-foo-
-foo-foo-kkk
2020-03-31
本文来自博客园,作者:白乾涛,转载请注明原文链接:https://www.cnblogs.com/baiqiantao/p/12609701.html