正则表达式(java)
概念:
正则表达式,又称规则表达式。(英语:Regular Expression,在代码中常简写为regex、regexp或RE),计算机科学的一个概念。
正则表通常被用来检索、替换那些符合某个模式(规则)的文本。
用途:
通常用于判断语句,检查字符串是否满足某一格式(匹配)。字符串查找、替换等。
正则表达式是含有一些特殊意义的字符的字符串,这些特殊字符称为正则表达式的元字符。
涉及的类
java.lang.String
java.util.regex.Pattern----模式
java.util.regex.Matcher---结果
示例:"."代表任何一个字符。“abc”用“...”匹配
public class RegExp { public static void main(String[] args){ //简单介绍正则表达式 System.out.println("abc".matches("...")); } }
"\d"---0-9任意数字,java正则表达式在元字符基础上需要加"\"区分转义字符,所以写成“\\d”
public class RegExp { public static void main(String[] args){ //简单介绍正则表达式 p("abc".matches("..."));//匹配 //"\d"---匹配数字 p("d1234w".replaceAll("\\d", "-"));//替换,采用的是反斜杠 } public static void p(Object o){ System.out.println(o); } }
类的介绍:
Pattern
定义:
A compiled representation of a regular expression.
A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher
object that can match arbitrary character sequences against the regular expression. All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern.
A typical invocation sequence is thus
Pattern p = Pattern.compile
("a*b"); Matcher m = p.matcher
("aaaaab"); boolean b = m.matches
();
A matches
method is defined by this class as a convenience for when a regular expression is used just once. This method compiles an expression and matches an input sequence against it in a single invocation. The statement
boolean b = Pattern.matches("a*b", "aaaaab");
is equivalent to the three statements above, though for repeated matches it is less efficient since it does not allow the compiled pattern to be reused.
下面的写法更有效率efficient ,同时Pattern和Matcher提供了更多的方法。
Pattern p = Pattern.compile("a*b"); Matcher m = p.matcher("aaaaab"); boolean b = m.matches();
[a-z]代表一个在a-z范围内的字母
[]代表范围;
限定修饰符
?---0次或者多次
*----0次或者多次
+---一次或者多次
{n}---正好出现{n}次
{n,}--至少出现n次
{n,m}出现n~m次
//范围
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegExp { public static void main(String[] args){ //范围 p("a".matches("[abc]")); p("a".matches("[^abc]"));//除了abc之外的都可以 p("A".matches("[a-zA-Z]"));//任意字母都可以 p("A".matches("[a-z]|[A-Z]"));//a-z或者A-Z,任意字母都可以 p("A".matches("[a-z[A-Z]]"));//一样 p("A".matches("[A-Z]&&[REG]"));//属于A-Z而且是EEG中的一个 } public static void p(Object o){ System.out.println(o); } }
//Predefined character classes
"\\".matches("\\\\")----匹配一个反斜线要写4个,前面写一个就会认为是转义,后面写两个会出错,三个转义,四个正确(暂时不清楚原理)
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegExp { public static void main(String[] args){ //认识\s \w \d p(" \n\r\t".matches("\\s{4}")); p(" ".matches("\\S")); p("a_8".matches("\\w{3}")); p("abc888&^%".matches("[a-z]{1,3}\\d+[&^#%]+")); p("\\".matches("\\\\")); } public static void p(Object o){ System.out.println(o); } }
Predefined character classes | |
---|---|
. | Any character (may or may not match line terminators) |
\d | A digit: [0-9] |
\D | A non-digit: [^0-9] |
\h | A horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000] |
\H | A non-horizontal whitespace character: [^\h] |
\s | A whitespace character: [ \t\n\x0B\f\r] |
\S | A non-whitespace character: [^\s] |
\v | A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029] |
\V | A non-vertical whitespace character: [^\v] |
\w | A word character: [a-zA-Z_0-9] |
\W | A non-word character: [^\w] |
find()
Attempts to find the next subsequence(子序列) of the input sequence that matches the pattern.
reset()
Resetting a matcher discards all of its explicit state information and sets its append position to zero.
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegExp { public static void main(String[] args){ //matches find looking Pattern p = Pattern.compile("\\d{3,5}"); String s = "123-45623-789-00"; Matcher m = p.matcher(s); p(m.matches()); m.reset();//matches方法和find方法会造成冲突,记得要调用reset方法 p(m.find()); p(m.start()+"-"+ m.end()); p(m.find()); p(m.start()+"-"+ m.end()); p(m.find()); p(m.start()+"-"+ m.end()); p(m.lookingAt()); p(m.lookingAt()); p(m.lookingAt()); p(m.lookingAt()); } public static void p(Object o){ System.out.println(o); } }
查找替代
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegExp { public static void main(String[] args){ //replacement 可以参考appendReplacement()在API文档里面的描述 Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE); Matcher m = p.matcher("java Java Java I love Java u hate JAVA sfarwwfr"); // p(m.replaceAll("JAVA"));//所有都替换成JAVA StringBuffer buf = new StringBuffer(); int i = 0; while(m.find()){ //寻找 i++; if (i%2 == 0) { //单数替换为java双数替换成JAVA m.appendReplacement(buf, "java"); } else { m.appendReplacement(buf, "JAVA"); } } m.appendTail(buf);//appendReplacement()多次调用后用此方法补全尾部 p(buf); } public static void p(Object o){ System.out.println(o); } }
分组
Matcher.group()-----Returns the input subsequence matched by the previous match.
1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)
group运用括号可以得到不同的分组,eg:group(1);group(2)
public class RegExp { public static void main(String[] args){ //groupregex Pattern p = Pattern.compile("(\\d{3,5})|([a-z]{2})"); String s = "123aa-34345bb-234cc-00"; Matcher m = p.matcher(s); while (m.find()) { p(m.group(2)); } } public static void p(Object o){ System.out.println(o); } }
总结几个重要的知识点: