python--正则表达式

正则表达式模式：

模式	描述
^	匹配的开始的
$	匹配行尾
.	匹配除换行符的任何单个字符。使用-m选项允许其匹配换行符也是如此。
[...]	匹配括号内任何单个字符
[^...]	匹配非单个字符集中的单个字符
*	匹配0个或多个匹配前面表达式。
+	匹配1个或多个先前出现的表达式。
?	匹配0或1前面出现的表达式。
{ n}	精确匹配n个前面表达式的数量。
{ n,}	匹配n或多次出现上述表达式。
{ n, m}	匹配至少n次和前面表达式的大多数出现m次。
a\| b	匹配a或b。
(re)	组正则表达式并记住匹配的文本。
(?imx)	暂时切换上 i, m 或 x正则表达式的选项。如果括号中，仅该区域受到影响。
(?-imx)	暂时关闭切换 i, m, 或 x 正则表达式的选项。如果括号中，仅该区域受到影响。
(?: re)	组正则表达式而不匹配的记住文字。
(?imx: re)	暂时切换上i, m, 或 x 括号内的选项。
(?-imx: re)	暂时关闭切换i, m, 或 x 括号内的选项。
(?#...)	注释
(?= re)	指定使用的模式位置，没有一个范围。
(?! re)	指定使用模式取反位置，没有一个范围。
(?> re)	匹配独立的模式而不反向追踪。
\w	匹配单词字符。
\W	匹配非单词字符
\s	匹配的空白，等价于[\ t\ñ\ r\ F]
\S	匹配非空白
\d	匹配的数字。等价于[0-9]
\D	匹配非数字
\A	匹配字符串的开始
\Z	匹配字符串的结尾。如果一个换行符的存在，它只是换行之前匹配
\z	匹配字符串的结尾
\G	匹配点，最后一次匹配结束
\b	匹配单词边界之外时，括号内。匹配退格键（0×08），括号里面的时候
\B	匹配非单词边界
\n, \t, etc.	匹配换行符，回车符，制表符等
\1...\9	匹配第n个分组的子表达式。
\10	匹配，如果它已经匹配第n个分组的子表达式。否则指的是一个字符码的八进制表示。

正则表达式的例子

字符串开头、结尾、边界匹配

the 任何包含有the的字符

\bthe 任何以the开头的字符

\bthe\b 仅匹配单词the

\Bthe 任何包含the，但不以the开头的字符

创建字符类[]

b[ae]t bat,bet

[ac][ef] ae,af,ce,cf

指定范围或否定

z.[0-9] 字符z，后面跟任意一个字符，然后是一个十进制数字

[^aeiou] 一个非元音字符

[^\t\n] 除TAB制表符和换行符外的任意一个字符

[0-9]{15,16} 15位或16位数字

重复匹配 (*,+, ?, {})

[dn]ot? 字符d或n,后面是一个o,最后是最多一个字符t do,no,dot,not

0?[1-9] 1到9中任意一个数字，前面可能还有一个0

特殊字符

使用圆括号建组（）

1、对正则表达式进行分组

2、匹配子组

mathc 与 search

>>> import re    # 导入re模块
>>> m = re.match('foo','afood')     # match 从字符串开头进行匹配，未匹配则返回None值 
>>> if m is not None: m.group()  
  
>>> m = re.match('foo','afood')
>>> if m is not None: m.group()

>>> m = re.search('foo','afood')    # search 查找字符串中模式首次出现的位置，而不是从开头匹配
>>> if m is not None: m.group()

'foo'

匹配多个字符串(|)

>>> bt = 'bat|bet|bit'       # 正则表达式模式  bat,bet,bit
>>> m = re.match(bt,'bat')
>>> if m is not None: m.group()

'bat'
>>> m = re.match(bt,'bit')
>>> if m is not None: m.group()

'bit'
>>> m = re.search(bt,'bit')    # 搜索到bit
>>> if m is not None: m.group()

'bit'

匹配任意单个字符(.)

>>> anyend = '.end'
>>> m = re.match(anyend,'bend')    # 点号匹配 b
>>> if m is not None: m.group()

'bend'
>>> m = re.match(anyend,'end')     # 没有字符匹配
>>> if m is not None: m.group()

>>> m = re.match(anyend,'\nend')   # 匹配字符（\n除外）
>>> if m is not None: m.group()

>>> m = re.search('.end','The end.')   # 匹配
>>> if m is not None: m.group()

' end'

创建字符集合（[]）

>>> m = re.match('[cr][23][dp]','c3p')    # 匹配'c3p'
>>> if m is not None: m.group()

'c3p'
>>> m = re.match('[cr][23][dp]','r2d')   # 匹配'r2d'
>>> if m is not None: m.group()

'r2d'
>>> m = re.match('c3p|r2d','c2d')     # 不匹配'c2d'
>>> if m is not None: m.group()

>>> m = re.match('c3p|r2d','r2d')    # 正则表达式模式 'c3p','r2d'
>>> if m is not None: m.group()      # 匹配'r2d'

'r2d'

重复、特殊字符和子组

>>> import re
>>> patt = '\w+@(\w+\.)?\w+\.com'
>>> re.match(patt,'nobody@xxx.com').group()
'nobody@xxx.com'
>>> 
>>> re.match(patt,'noboby@www.xxx.com').group()
'noboby@www.xxx.com'
>>> 
>>> patt = '\w+@(\w+\.)*\w+\.com'
>>> re.match(patt,'nobody@www.xxx.yyy.zzz.com').group()
'nobody@www.xxx.yyy.zzz.com'
>>> m = re.match('\w\w\w-\d\d\d','abc-123')
>>> if m is not None: m.group()

'abc-123'

>>> m = re.match('(\w\w\w)-(\d\d\d)','abc-123')
>>> m.group()    # 所有匹配部分
'abc-123'
>>> m.group(1)   # 匹配子组1
'abc'
>>> m.group(2)   # 匹配子组2
'123'
>>> m.groups()   # 所有匹配子组
('abc', '123')

>>> m = re.match('(a)(b)(c)','abc')   # 3个子组
>>> m.group()
'abc'
>>> m.group(1)   # 匹配的子组1
'a'
>>> m.groups()   # 所有匹配子组的元组
('a', 'b', 'c')

group()通常用来显示所有匹配部分，也可用来获取个别匹配的子组。

groups()方法获得一个包含所有匹配子组的元组。

从字符串的开头、结尾及单词边界上的匹配

>>> m = re.search('^The','The end.')    # 匹配
>>> if m is not None: m.group()   

'The'
>>> m = re.search('^The','end.The')    # 不在开头
>>> if m is not None: m.group()

>>> m = re.search(r'\bthe', 'bite the dog')   # 在词边界
>>> if m is not None: m.group()

'the'
>>> m = re.search(r'\bthe', 'bitethe dog')    # 不在词边界
>>> if m is not None: m.group()

用sub()、subn().进行搜索和替换

>>> re.sub('[abc]','o','rock')
'rook'
>>> re.sub('[abc]','o','Mark')  # 找出 a、b或者c，并以o取代之，Mark就变成Mork了
'Mork'

>>> re.subn('[abc]','o','Mark')
('Mork', 1)
>>> re.subn('[abcr]','o','Mark')  # a,r 都替换成o，总计替换2次
('Mook', 2)

sub()和subn()一样，都是将某字符串中所有匹配正则表达式模式的部分进行替换。
但subn()还返回一个表示替换次数的数字，替换后的字符串和表示替换次数的数字作为一个元组的元素返回。

split()分割

Python split()通过指定分隔符对字符串进行切片

>>> re.split(':','str1:str2:str3')
['str1', 'str2', 'str3']
>>> 
>>> re.split('|','name:wakey,age:20|name:ethon,age:22')
['name:wakey,age:20|name:ethon,age:22']

posted @ 2017-11-16 20:24 ccdh 阅读(242) 评论(0) 编辑收藏举报

求知cvip

python--正则表达式

正则表达式模式：

正则表达式的例子

公告