正则表达式(java)

概念:

正则表达式,又称规则表达式。(英语:Regular Expression,在代码中常简写为regex、regexp或RE),计算机科学的一个概念。

正则表通常被用来检索、替换那些符合某个模式(规则)的文本。

用途:

通常用于判断语句,检查字符串是否满足某一格式(匹配)。字符串查找、替换等。

 

正则表达式是含有一些特殊意义的字符的字符串,这些特殊字符称为正则表达式的元字符。

涉及的类

java.lang.String

java.util.regex.Pattern----模式

java.util.regex.Matcher---结果

示例:"."代表任何一个字符。“abc”用“...”匹配

public class RegExp {
    public static void main(String[] args){
        //简单介绍正则表达式
        System.out.println("abc".matches("..."));
    }
}

"\d"---0-9任意数字,java正则表达式在元字符基础上需要加"\"区分转义字符,所以写成“\\d”

public class RegExp {
    public static void main(String[] args){
        //简单介绍正则表达式
        p("abc".matches("..."));//匹配
        //"\d"---匹配数字
        p("d1234w".replaceAll("\\d", "-"));//替换,采用的是反斜杠
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

 

类的介绍:

Pattern

定义:

A compiled representation of a regular expression.

A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression. All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern.

A typical invocation sequence is thus

 Pattern p = Pattern.compile("a*b");
 Matcher m = p.matcher("aaaaab");
 boolean b = m.matches();

matches method is defined by this class as a convenience for when a regular expression is used just once. This method compiles an expression and matches an input sequence against it in a single invocation. The statement

 boolean b = Pattern.matches("a*b", "aaaaab");

is equivalent to the three statements above, though for repeated matches it is less efficient since it does not allow the compiled pattern to be reused.

 

下面的写法更有效率efficient ,同时Pattern和Matcher提供了更多的方法。

Pattern p = Pattern.compile("a*b");
 Matcher m = p.matcher("aaaaab");
 boolean b = m.matches();

[a-z]代表一个在a-z范围内的字母

[]代表范围;

限定修饰符

?---0次或者多次

*----0次或者多次

+---一次或者多次

{n}---正好出现{n}次

{n,}--至少出现n次

{n,m}出现n~m次

 

//范围

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
        
        //范围
        p("a".matches("[abc]"));
        p("a".matches("[^abc]"));//除了abc之外的都可以
        p("A".matches("[a-zA-Z]"));//任意字母都可以
        p("A".matches("[a-z]|[A-Z]"));//a-z或者A-Z,任意字母都可以
        p("A".matches("[a-z[A-Z]]"));//一样
        p("A".matches("[A-Z]&&[REG]"));//属于A-Z而且是EEG中的一个
        
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

 

//Predefined character classes

"\\".matches("\\\\")----匹配一个反斜线要写4个,前面写一个就会认为是转义,后面写两个会出错,三个转义,四个正确(暂时不清楚原理)
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
    
        //认识\s \w \d
        p(" \n\r\t".matches("\\s{4}"));
        p(" ".matches("\\S"));
        p("a_8".matches("\\w{3}"));
        p("abc888&^%".matches("[a-z]{1,3}\\d+[&^#%]+"));
        p("\\".matches("\\\\"));
        
    }
    public static void p(Object o){
        System.out.println(o);
    }
}
Predefined character classes
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\h A horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]
\H A non-horizontal whitespace character: [^\h]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\v A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029]
\V A non-vertical whitespace character: [^\v]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]

 find()

Attempts to find the next subsequence(子序列) of the input sequence that matches the pattern.

reset()

Resetting a matcher discards all of its explicit state information and sets its append position to zero.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
        
        //matches find looking
        Pattern p = Pattern.compile("\\d{3,5}");
        String s = "123-45623-789-00";
        Matcher m = p.matcher(s);
        p(m.matches());
        m.reset();//matches方法和find方法会造成冲突,记得要调用reset方法
        p(m.find());
        p(m.start()+"-"+ m.end());
        p(m.find());
        p(m.start()+"-"+ m.end());
        p(m.find());
        p(m.start()+"-"+ m.end());
        p(m.lookingAt());
        p(m.lookingAt());
        p(m.lookingAt());
        p(m.lookingAt());
        
        
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

查找替代

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
        
        //replacement   可以参考appendReplacement()在API文档里面的描述
        Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher("java Java Java I love Java  u hate JAVA sfarwwfr");
       // p(m.replaceAll("JAVA"));//所有都替换成JAVA
        StringBuffer buf = new StringBuffer();
        int i = 0;
        while(m.find()){  //寻找
            i++;
            if (i%2 == 0) { //单数替换为java双数替换成JAVA
                m.appendReplacement(buf, "java");
            } else {
                m.appendReplacement(buf, "JAVA");
            }
        }
        m.appendTail(buf);//appendReplacement()多次调用后用此方法补全尾部
       p(buf);     
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

分组

Matcher.group()-----Returns the input subsequence matched by the previous match.

1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)

group运用括号可以得到不同的分组,eg:group(1);group(2)

public class RegExp {
    public static void main(String[] args){
    
        
        //groupregex
        Pattern p = Pattern.compile("(\\d{3,5})|([a-z]{2})");
        String s = "123aa-34345bb-234cc-00";
        Matcher m = p.matcher(s);
        while (m.find()) {
            p(m.group(2));
        }
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

总结几个重要的知识点:

 

posted @ 2017-06-12 21:07  lamsey16  阅读(244)  评论(0编辑  收藏  举报