正则表达式（java）

概念：

正则表达式，又称规则表达式。（英语：Regular Expression，在代码中常简写为regex、regexp或RE），计算机科学的一个概念。

正则表通常被用来检索、替换那些符合某个模式(规则)的文本。

用途：

通常用于判断语句，检查字符串是否满足某一格式(匹配)。字符串查找、替换等。

正则表达式是含有一些特殊意义的字符的字符串，这些特殊字符称为正则表达式的元字符。

涉及的类

java.lang.String

java.util.regex.Pattern----模式

java.util.regex.Matcher---结果

示例："."代表任何一个字符。“abc”用“...”匹配

public class RegExp {
    public static void main(String[] args){
        //简单介绍正则表达式
        System.out.println("abc".matches("..."));
    }
}

"\d"---0-9任意数字，java正则表达式在元字符基础上需要加"\"区分转义字符，所以写成“\\d”

public class RegExp {
    public static void main(String[] args){
        //简单介绍正则表达式
        p("abc".matches("..."));//匹配
        //"\d"---匹配数字
        p("d1234w".replaceAll("\\d", "-"));//替换，采用的是反斜杠
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

类的介绍：

Pattern

定义：

A compiled representation of a regular expression.

A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression. All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern.

A typical invocation sequence is thus

 Pattern p = Pattern.compile("a*b");
 Matcher m = p.matcher("aaaaab");
 boolean b = m.matches();

A matches method is defined by this class as a convenience for when a regular expression is used just once. This method compiles an expression and matches an input sequence against it in a single invocation. The statement

 boolean b = Pattern.matches("a*b", "aaaaab");

is equivalent to the three statements above, though for repeated matches it is less efficient since it does not allow the compiled pattern to be reused.

下面的写法更有效率efficient ，同时Pattern和Matcher提供了更多的方法。

Pattern p = Pattern.compile("a*b");
 Matcher m = p.matcher("aaaaab");
 boolean b = m.matches();

[a-z]代表一个在a-z范围内的字母

[]代表范围；

限定修饰符

？---0次或者多次

*----0次或者多次

+---一次或者多次

{n}---正好出现{n}次

{n,}--至少出现n次

{n,m}出现n~m次

//范围

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
        
        //范围
        p("a".matches("[abc]"));
        p("a".matches("[^abc]"));//除了abc之外的都可以
        p("A".matches("[a-zA-Z]"));//任意字母都可以
        p("A".matches("[a-z]|[A-Z]"));//a-z或者A-Z，任意字母都可以
        p("A".matches("[a-z[A-Z]]"));//一样
        p("A".matches("[A-Z]&&[REG]"));//属于A-Z而且是EEG中的一个
        
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

//Predefined character classes

"\\".matches("\\\\")----匹配一个反斜线要写4个，前面写一个就会认为是转义，后面写两个会出错，三个转义，四个正确（暂时不清楚原理）

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
    
        //认识\s \w \d
        p(" \n\r\t".matches("\\s{4}"));
        p(" ".matches("\\S"));
        p("a_8".matches("\\w{3}"));
        p("abc888&^%".matches("[a-z]{1,3}\\d+[&^#%]+"));
        p("\\".matches("\\\\"));
        
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

Predefined character classes
`.`	Any character (may or may not match line terminators)
`\d`	A digit: `[0-9]`
`\D`	A non-digit: `[^0-9]`
`\h`	A horizontal whitespace character: `[ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]`
`\H`	A non-horizontal whitespace character: `[^\h]`
`\s`	A whitespace character: `[ \t\n\x0B\f\r]`
`\S`	A non-whitespace character: `[^\s]`
`\v`	A vertical whitespace character: `[\n\x0B\f\r\x85\u2028\u2029]`
`\V`	A non-vertical whitespace character: `[^\v]`
`\w`	A word character: `[a-zA-Z_0-9]`
`\W`	A non-word character: `[^\w]`

find()

Attempts to find the next subsequence（子序列） of the input sequence that matches the pattern.

reset()

Resetting a matcher discards all of its explicit state information and sets its append position to zero.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
        
        //matches find looking
        Pattern p = Pattern.compile("\\d{3,5}");
        String s = "123-45623-789-00";
        Matcher m = p.matcher(s);
        p(m.matches());
        m.reset();//matches方法和find方法会造成冲突,记得要调用reset方法
        p(m.find());
        p(m.start()+"-"+ m.end());
        p(m.find());
        p(m.start()+"-"+ m.end());
        p(m.find());
        p(m.start()+"-"+ m.end());
        p(m.lookingAt());
        p(m.lookingAt());
        p(m.lookingAt());
        p(m.lookingAt());
        
        
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

查找替代

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
        
        //replacement   可以参考appendReplacement()在API文档里面的描述
        Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher("java Java Java I love Java  u hate JAVA sfarwwfr");
       // p(m.replaceAll("JAVA"));//所有都替换成JAVA
        StringBuffer buf = new StringBuffer();
        int i = 0;
        while(m.find()){  //寻找
            i++;
            if (i%2 == 0) { //单数替换为java双数替换成JAVA
                m.appendReplacement(buf, "java");
            } else {
                m.appendReplacement(buf, "JAVA");
            }
        }
        m.appendTail(buf);//appendReplacement()多次调用后用此方法补全尾部
       p(buf);     
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

分组

Matcher.group（)-----Returns the input subsequence matched by the previous match.

1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)

group运用括号可以得到不同的分组，eg:group(1);group(2)

public class RegExp {
    public static void main(String[] args){
    
        
        //groupregex
        Pattern p = Pattern.compile("(\\d{3,5})|([a-z]{2})");
        String s = "123aa-34345bb-234cc-00";
        Matcher m = p.matcher(s);
        while (m.find()) {
            p(m.group(2));
        }
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

总结几个重要的知识点：

posted @ 2017-06-12 21:07 lamsey16 阅读(244) 评论(0) 编辑收藏举报

刷新页面返回顶部

lamsey16

愿你出走半生，归来仍是少年

正则表达式（java）

公告