Python_正则(match&search&findall的引申)
match&search&findall的引申
打印出匹配的内容
>>> re.match(r"\d","123").group() '1' >>> re.match(r"\D","a123").group() 'a' >>> re.match(r"\D+","a123").group() 'a' >>> re.match(r"\D+","abc123").group() 'abc
匹配空白/非空白
s(小写):匹配空白
S(大写):匹配非空白
>>> re.search(r"\s","ab cd")#匹配空格 <_sre.SRE_Match object; span=(2, 3), match=' '> >>> re.search(r"\s+","ab\t \r\ncd") <_sre.SRE_Match object; span=(2, 9), match='\t \r\n'> >>> re.findall(r"\S+","ab cd\t ef\nhi") ['ab', 'cd', 'ef', 'hi']
>>> "".join(re.findall(r"\S+","ab cd\t ef\nhi"))
'abcdefhi'
w:匹配字符(大小写字母、_、数字)
>>> re.search(r"\w+","aaaZAW0123_")#匹配字符大小写、_ <_sre.SRE_Match object; span=(0, 11), match='aaaZAW0123_'> >>> re.search(r"\w+","aaaZAW0123_").group() 'aaaZAW0123_' >>> re.search(r"\W+","aaaZAW0123_-").group() '-'
限制贪婪的处理
>>> import re >>> re.search(r"\d?","a7").group() '' >>> re.search(r"\d?","7").group() '7' >>> re.search(r"\d+","a7").group() '7'
分组的引用处理
>>> re.search(r"\d{3}","123456789").group()#匹配3个 '123'
>>> re.search(r"\d{1,3}","123456789").group()#匹配1到3个 '123'
>>> re.search(r"\d{1,3}?","123456789").group()#0个或1个 '1' >>> re.search(r"\d{0,3}?","123456789").group() ''
^:开头匹配
>>> re.search(r"^abc","dddabc")#结果为空 >>> re.search(r”\d*?”,”7”)#匹配了0个 >>> re.search(r"^abc","abcdddabc") <_sre.SRE_Match object; span=(0, 3), match='abc'> >>> re.search(r"^\d+","133dddabc") <_sre.SRE_Match object; span=(0, 3), match='133'>
加上$表示匹配结尾的数字
>>> re.search(r"\d+$","133dddabc5555")
<_sre.SRE_Match object; span=(9, 13), match='5555'>
"^1XXX$"掐头去尾的匹配
# "^123$"掐头去尾匹配的结果都是123,且只能是“123”
>>> re.search(r"^123$","123") <_sre.SRE_Match object; span=(0, 3), match='123'> >>> re.search(r"^123$","123sss")#匹配结果为空 >>> re.search(r"^123$","ss123")#匹配结果为空
#:等价于掐头去尾
>>> re.search(r"\A123\Z","123") <_sre.SRE_Match object; span=(0, 3), match='123'> >>> re.search(r"\d(\D+)\d","1abc3").group(1) 'abc' >>> re.search(r"(\d)(\D+)(\d)","1abc3").group(1) '1'
组合匹配
>>> re.search(r"(\d)(\D+)(\d)","1abc3").group(2) 'abc' >>> re.search(r"(\d)(\D+)(\d)","1abc3").group(2) 'abc' >>> re.search(r"(\d)(\D+)(\d)","1abc3").group(3) '3'
细节规则
re.I表示忽略大小写
re.M表示将字符串视为多行,从而^匹配每一行的行首,$匹配每一行的行尾
re.S (不包含外侧双引号,下同)的作用扩展到整个字符串,包括“\n”。
?限制贪婪
*0个或一个
+一个或多个