浅析正则表达式用法：匹配字符，数量，边界

　　正则表达式之前学了好容易忘，使用的时候经常需要再查语法，所以准备再研究下：

1、单字符匹配

字符	功能
.	匹配任意1个字符（除了\n)
[]	匹配[]中列举的字符
\d	匹配数字，即0-9
\D	匹配非数字，即不是数字
\s	匹配空白，即空格，tab键，\n
\S	匹配非空白
\w	匹配单词字符，即a-z A-Z 0-9 _
\W	匹配非单词字符

1）. 匹配单字符

let res = /../
'a'.match(res)
// null
let res = /./
'a'.match(res)
// ["a", index: 0, input: "a", groups: undefined]
let res = /.../
'abcd'.match(res)
// ["abc", index: 0, input: "abcd", groups: undefined]
let res = /.../
'\n'.match(res)
// null

　　两个点，表示两个字符，'a'一个字符，未匹配到；三个点，三个字符，匹配到'abc'；无法匹配 '\n'

2）\d，\D匹配

let res = /\d/
'1a3'.match(res)
// ["1", index: 0, input: "1a3", groups: undefined]
let res = /\d*2/
'1a3'.match(res)
// null
let res = /\d\d/
'123a3'.match(res)
// ["12", index: 0, input: "123a3", groups: undefined]
let res = /\d\D/
'123a3'.match(res)
// ["3a", index: 2, input: "123a3", groups: undefined]

　　一个数字字符，从左到右匹配到1；两个连续数字字符，从左到右未匹配到；2个连续数字，匹配到12；1个数字1个非数字，匹配到3a

3）\s、\S

let res = /\s/
'12 3a3'.match(res)
// [" ", index: 2, input: "12 3a3", groups: undefined]
let res = /\s/
'12\t3a3'.match(res)
// ["    ", index: 2, input: "12    3a3", groups: undefined]
let res = /\s/
'12\n3a3'.match(res)
// ["↵", index: 2, input: "12↵3a3", groups: undefined]
let res = /\S/
'12\n3a3'.match(res)
// ["1", index: 0, input: "12↵3a3", groups: undefined]

4）\w、\W 单词字符

let res = /\w\w/
'12_3a3'.match(res)
// ["12", index: 0, input: "12_3a3", groups: undefined]
let res = /\w\w/
'2_3a3'.match(res)
// ["2_", index: 0, input: "2_3a3", groups: undefined]
let res = /\w\W/
'2_3a3'.match(res)
// null

　　问题1：那么问题来了，比如在某个市内，手机号只能是1开头的，第二位数字只能是0-3，那么\d已经不能精确限制了，这时可使用[]

5）[]

let res = /1[0-3]/
'137'.match(res)
// ["13", index: 0, input: "137", groups: undefined]
let res = /1[0-3]/
'147'.match(res)
// null
let res = /1[^0-3]/
'147'.match(res)
// ["14", index: 0, input: "147", groups: undefined]
let res = /1[^0-3]/
'127'.match(res)
// null
let res = /1[0-3a-z]/
'1f7'.match(res)
// ["1f", index: 0, input: "1f7", groups: undefined]

　　第一位为1，第二位为0-3；^表示[]内取反；第二位为0-3 或者 a-z

　　注意 ^ 的用法，表示取反的意思，因此：

\d == [0-9]
\D == [^0-9]

\w == [a-zA-Z0-9_]
\W == [^a-zA-Z0-9_]

　　问题2：还是手机号校验问题，手机号有11位，那么就需要像这样re.match('1[0-3]\d\d\d\d\d\d\d\d\d\','13758265698')，写好多个\d用来匹配么，如果有一种方法可以表示数量的话，那不就perfect了么？答案当然是：of course，继续往下看吧

2、多字符匹配（表示数量）

　　匹配多个字符的相关格式

字符	功能
*	匹配前一个字符出现0次或者无限次，即可有可无
+	匹配前一个字符出现1次或者无限次，即至少1次
?	匹配前一个字符出现1次或者0次，即至多1次
{m}	匹配前一个字符出现m次
{m,}	匹配前一个字符至少出现m次
{m,n}	匹配前一个字符出现从m到n次

1) * 可有可无：0个及任意个

let res = /\d*/
''.match(res)
// ["", index: 0, input: "", groups: undefined]
let res = /\d*/
'adds'.match(res)
// ["", index: 0, input: "adds", groups: undefined]
let res = /\d*/
'12323adds'.match(res)
// ["12323", index: 0, input: "12323adds", groups: undefined]

　　* 表示可不出现数字，或任意次， 空''或者'adds'，匹配到不出现数字，所以'12323adds'匹配到'12323'；
　　'abc'没有数字，匹配不出现数字，返回的'' 可看做'abc' == '''abc'

2）+ 至少一次

let res = /\d+/
'123'.match(res)
// ["123", index: 0, input: "123", groups: undefined]
let res = /\d+/
'abc'.match(res)
// null
let res = /\d+/
'123abc'.match(res)
// ["123", index: 0, input: "123abc", groups: undefined]

　　至少1次数字， '123'有3个数字，匹配；'abc'没有1个数字，不匹配；'123abc'有3个数字，匹配

3）？至多1次

let res = /\d?/
'abc'.match(res)
// ["", index: 0, input: "abc", groups: undefined]
let res = /\d?/
'1abc'.match(res)
// ["1", index: 0, input: "1abc", groups: undefined]
let res = /\d?/
'12abc'.match(res)
// ["1", index: 0, input: "12abc", groups: undefined]

　　'abc'未出现数字，匹配；出现1次数字，匹配；出现1次数字，匹配。注意，这里\d值描述了一位信息，2是没有限定的。

let res = /\d?[a-z]/
'12abc'.match(res)
// ["2a", index: 1, input: "12abc", groups: undefined]
let res = /\d*[a-z]/
'12abc'.match(res)
// ["12a", index: 0, input: "12abc", groups: undefined]
let res = /\d+[a-z]/
'12345abc'.match(res)
// ["12345a", index: 0, input: "12345abc", groups: undefined]

　　限定第二位为[a-z]，匹配；出现任意次数字后，限定其后为[a-z]；至少出现1次数字后，其后为[a-z]；

4）{m} {m,}

let res = /\d{4}[a-z]/
'1234abc'.match(res)
// ["1234a", index: 0, input: "1234abc", groups: undefined]
let res = /\d{5}[a-z]/
'1234abc'.match(res)
// null
let res = /\d{3}[a-z]/
'1234abc'.match(res)
// ["234a", index: 1, input: "1234abc", groups: undefined]
let res = /\d{3}[a-z]/
'12abc'.match(res)
// null
let res = /\d{3,}[a-z]/
'12456abc'.match(res)
// ["12456a", index: 0, input: "12456abc", groups: undefined]

　　4个数字后，跟上[a-z]，匹配；5个数字后，跟上[a-z]，不匹配；3个数字后，跟上[a-z]，匹配；下面不匹配；3个以上数字后，跟上[a-z]，匹配；

　　那么，{}就可以表示数量符了：

　　{1,} == +

　　{0,} == *

　　{0,1} == ?

　　那么手机号就可以表示为：'18155825579'.match(/1[3-8]\d{9}/)

　　问题3：边界问题，看下面这样

let res = /1[3-8]\d{9}/
'18155825579abcd'.match(res)
// ["18155825579", index: 0, input: "18155825579abcd", groups: undefined]

　　也是显示匹配的，后面的'abcd'还未排除，我们还未解决边界问题，继续来看吧......

3、边界

字符	功能
^	匹配字符串开头
$	匹配字符串结尾
\b	匹配一个单词的边界
\B	匹配非单词边界

1) $ 匹配结尾

　　继续以之前的手机号末尾多出来的‘abcd’问题为例

let res = /1[3-8]\d{9}$/
'18155825579abcd'.match(res)
// null
let res = /1[3-8]\d{9}$/
'18155825579'.match(res)
// ["18155825579", index: 0, input: "18155825579", groups: undefined]

　　这样，加了个结尾$就能限定边界，一共11位数字了

2）^ 匹配开头

let res = /^1[3-8]\d{9}$/
'18155825579'.match(res)
// ["18155825579", index: 0, input: "18155825579", groups: undefined]
let res = /^1[3-8]\d{9}$/
'28155825579'.match(res)
// null
let res = /^[12][3-8]\d{9}$/
'28155825579'.match(res)
// ["28155825579", index: 0, input: "28155825579", groups: undefined]

　　以1开头，匹配；不匹配；以1或2开头，匹配；
　　这个效果在match中不是很明显，原因match就是从左到右开始匹配的。

3）\b 单词边界

let res = /^\w+ve/
'hover'.match(res)
// ["hove", index: 0, input: "hover", groups: undefined]
let res = /^\w+ve/
'hver'.match(res)
// ["hve", index: 0, input: "hver", groups: undefined]
let res = /^\w+ve/
'ver'.match(res)
// null

　　至少1个单词开头，加ve

let res = /^\w+ve\b/
'hover'.match(res)
// null

　　单词以 ve 结尾，不匹配

　　注意：注意 \b 不代表字符，也不代表空格，加空格 \s'''

let res = /^\w+\bve\b/
// 以至少1个字符为单词，再以ve为单词，明显不符合实际：单词划分以空格为基础，既然没有空格，必然没有两个单词
'hover'.match(res)
// null

let res = /^\w+\b\sve\b/
// 加上空格，成为两个单词，空格也要显式写上去\s
'hover ve'.match(res)
// ["hover ve", index: 0, input: "hover ve", groups: undefined]

let res = /^\w+\bve\b/
'ho ve r'.match(res)
null
let res = /^\w+\s\bve\b/  // 同理需加上空格
'ho ve r'.match(res)
["ho ve", index: 0, input: "ho ve r", groups: undefined]

// 看看这个区别
let res = /^.+\b\sve\b/
'ho ve r'.match(res)
// ["ho ve", index: 0, input: "ho ve r", groups: undefined]
// 表示任意字符为边界，多了个空格，此时解析为：ho为1个单词，ve为1个

let res = /^.+\bve\b/
'ho ve r'.match(res)
// ["ho ve", index: 0, input: "ho ve r", groups: undefined]
// 表示任意字符为边界，没有空格，此时解析为：ho 为1个单词（有个空格），ve为1个

let res = /^.+\b\sve\b/
'hove r'.match(res)
// null

4）\B 非单词边界

let res = /^.+ve\B/
'ho ve r'.match(res)
// null   ve是单词边界，未匹配
let res = /^.+ve\B/
'ho ver'.match(res)
// ["ho ve", index: 0, input: "ho ver", groups: undefined] ve不是单词边界，匹配
let res = /^.+\Bve\B/
'ho ver'.match(res)
// null  ho是单词边界，未匹配
let res = /^.+\Bve\B/
'hover'.match(res)
// ["hove", index: 0, input: "hover", groups: undefined]  ho及ve不是单词边界，匹配

　　总结下：

　　^ 和 $ 是描述整个字符串的边界

　　\b 和 \B 是描述字符串中的单词边界

posted @ 2020-11-18 18:57 古兰精阅读(958) 评论(0) 编辑收藏举报

刷新页面返回顶部

古兰精

浅析正则表达式用法：匹配字符，数量，边界

1、单字符匹配

2、多字符匹配（表示数量）

3、边界

公告