re.match(r"hello","hello")
re.match(正则表达式,需要处理的字符串)
ipython3: In [12]: re.match(r"hello","hello world") Out[12]: <_sre.SRE_Match object; span=(0, 5), match='hello'> In [14]: re.match(r"[hH]ello","Hello world") Out[14]: <_sre.SRE_Match object; span=(0, 5), match='Hello'> In [15]: re.match(r"[hH]ello","hello world") Out[15]: <_sre.SRE_Match object; span=(0, 5), match='hello'> In [17]: re.match(r"速度与激情\d","速度与激情1") Out[17]: <_sre.SRE_Match object; span=(0, 6), match='速度与激情1'> In [18]: re.match(r"速度与激情\d","速度与激情2") Out[18]: <_sre.SRE_Match object; span=(0, 6), match='速度与激情2') \d 表示匹配一位数字 0-9 In [19]: ret = re.match(r"速度与激情[12345678]","速度与激情2") In [20]: ret.group() Out[20]: '速度与激情2' [] 表示里面所列的 In [22]: re.match(r"速度与激情[1-8]","速度与激情2").group() Out[22]: '速度与激情2' In [23]: re.match(r"速度与激情[1-8]","速度与激情9").group() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-23-9c4f3f67ebee> in <module>() ----> 1 re.match(r"速度与激情[1-8]","速度与激情9").group() AttributeError: 'NoneType' object has no attribute 'group' In [25]: re.match(r"速度与激情[123678]","速度与激情3").group() Out[25]: '速度与激情3' In [26]: re.match(r"速度与激情[1-36-8]","速度与激情3").group() Out[26]: '速度与激情3'
In [28]: re.match(r"速度与激情[1-8abcd]","速度与激情a").group() Out[28]: '速度与激情a' In [29]: re.match(r"速度与激情[1-8abcd]","速度与激情e").group() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-29-27b5df124b44> in <module>() ----> 1 re.match(r"速度与激情[1-8abcd]","速度与激情e").group() AttributeError: 'NoneType' object has no attribute 'group' In [30]: re.match(r"速度与激情[1-8abcdA-Z]","速度与激情F").group() Out[30]: '速度与激情F' In [31]: re.match(r"速度与激情\w","速度与激情F").group() Out[31]: '速度与激情F' In [32]: re.match(r"速度与激情\w","速度与激情f").group() Out[32]: '速度与激情f' In [33]: re.match(r"速度与激情\w","速度与激情3").group() Out[33]: '速度与激情3' In [34]: re.match(r"速度与激情\w","速度与激情哈").group() Out[34]: '速度与激情哈' \w 匹配单词字符,即a-z,A-Z,0-9,下划线,中文
In [37]: re.match(r"速度与激情\s\d","速度与激情 3").group() Out[37]: '速度与激情 3' In [38]: re.match(r"速度与激情\s\d","速度与激情\t1").group() Out[38]: '速度与激情\t1' In [39]: re.match(r"速度与激情\s\d","速度与激情\n1").group() Out[39]: '速度与激情\n1' \s 匹配空白,即空格,tab键,\n
\大写,正好与小写相反。
In [40]: re.match(r"速度与激情.","速度与激情1").group() Out[40]: '速度与激情1' In [41]: re.match(r"速度与激情.","速度与激情a").group() Out[41]: '速度与激情a' In [42]: re.match(r"速度与激情.","速度与激情A").group() Out[42]: '速度与激情A' In [43]: re.match(r"速度与激情.","速度与激情_").group() Out[43]: '速度与激情_' In [44]: re.match(r"速度与激情.","速度与激情哈").group() Out[44]: '速度与激情哈' In [45]: re.match(r"速度与激情.","速度与激情!").group() Out[45]: '速度与激情!' In [46]: re.match(r"速度与激情.","速度与激情#").group() Out[46]: '速度与激情#' . 匹配任意1个字符(除了\n)
匹配多个字符
In [47]: re.match(r"速度与激情\d{1,2}","速度与激情1").group() Out[47]: '速度与激情1' In [48]: re.match(r"速度与激情\d{1,2}","速度与激情12").group() Out[48]: '速度与激情12' In [49]: re.match(r"速度与激情\d{1,3}","速度与激情123").group() Out[49]: '速度与激情123' In [50]: re.match(r"速度与激情\d{1,3}","速度与激情12").group() Out[50]: '速度与激情12' In [5]: re.match(r"\d{11}","123456789012").group() Out[5]: '12345678901' In [6]: re.match(r"\d{11}","12345678901").group() Out[6]: '12345678901' In [7]: re.match(r"\d{11}","1234567890").group() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-7-24536d62af88> in <module>() ----> 1 re.match(r"\d{11}","1234567890").group() AttributeError: 'NoneType' object has no attribute 'group' In [8]: re.match(r"021-\d{8}","021-12345678").group() Out[8]: '021-12345678' In [9]: re.match(r"021-?\d{8}","021-12345678").group() Out[9]: '021-12345678' In [10]: re.match(r"021-?\d{8}","02112345678").group() Out[10]: '02112345678' ?前的可以有1个也可以没有 In [11]: re.match(r"\d{3,4}-?\d{8}","02112345678").group() Out[11]: '02112345678' In [12]: re.match(r"\d{3,4}-?\d{8}","0310-12345678").group() Out[12]: '0310-12345678' In [13]: re.match(r"\d{3,4}-?\d{7,8}","0310-1234567").group() Out[13]: '0310-1234567'
In [14]: html_content = """abcd ....: kasdjfdskjfdksj ....: fjdkslfs ....: fdjsk""" In [15]: re.match(r".*",html_content).group() Out[15]: 'abcd' In [16]: re.match(r".*",html_content,re.S).group() Out[16]: 'abcd\nkasdjfdskjfdksj\nfjdkslfs\nfdjsk' 加上re.S 能够匹配到 \n In [17]: re.match(r".*","fdsklfjdklfjekwljfwe").group() Out[17]: 'fdsklfjdklfjekwljfwe' In [18]: re.match(r".*","").group() Out[18]: '' In [19]: re.match(r".+","fdsfsdfsfew").group() Out[19]: 'fdsfsdfsfew' In [20]: re.match(r".+","").group() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-20-7831e7fb364c> in <module>() ----> 1 re.match(r".+","").group() AttributeError: 'NoneType' object has no attribute 'group'
import re def main(): names = ["age","_age","1age","age1","a_age","age_1_","age!","a#123"] for name in names: ret = re.match(r"[a-zA-Z_][a-zA-Z0-9_]*",name) if ret: print("变量名:%s 符合要求...通过正则匹配到的数据是%s"%(name,ret.group())) else: print("变量名:%s 不符合要求..."%name) if __name__ =="__main__": main() 变量名:age 符合要求...通过正则匹配到的数据是age 变量名:_age 符合要求...通过正则匹配到的数据是_age 变量名:1age 不符合要求... 变量名:age1 符合要求...通过正则匹配到的数据是age1 变量名:a_age 符合要求...通过正则匹配到的数据是a_age 变量名:age_1_ 符合要求...通过正则匹配到的数据是age_1_ 变量名:age! 符合要求...通过正则匹配到的数据是age 变量名:a#123 符合要求...通过正则匹配到的数据是a
import re def main(): names = ["age","_age","1age","age1","a_age","age_1_","age!","a#123"] for name in names: ret = re.match(r"[a-zA-Z_][a-zA-Z0-9_]*$",name) if ret: print("变量名:%s 符合要求...通过正则匹配到的数据是%s"%(name,ret.group())) else: print("变量名:%s 不符合要求..."%name) if __name__ =="__main__": main() 变量名:age 符合要求...通过正则匹配到的数据是age 变量名:_age 符合要求...通过正则匹配到的数据是_age 变量名:1age 不符合要求... 变量名:age1 符合要求...通过正则匹配到的数据是age1 变量名:a_age 符合要求...通过正则匹配到的数据是a_age 变量名:age_1_ 符合要求...通过正则匹配到的数据是age_1_ 变量名:age! 不符合要求... 变量名:a#123 不符合要求...
import re def main(): email = input("请输入一个邮箱地址:") #如果在正则表达式中需要用到了某些普通的字符,比如,?等,仅仅需要在他们前面添加一个反斜杠进行转义 ret = re.match(r"[a-zA-Z0-9_]{4,20}@163\.com$",email) if ret: print("%s符合要求...."%email) else: print("%s不符合要求...."%email) if __name__ =="__main__": main()
In [7]: re.match(r"[a-zA-Z0-9]{4,20}@(163|126)\.com$","laowang@126.com").group()
Out[7]: 'laowang@126.com'
分组:
In [11]: re.match(r"([a-zA-Z0-9]{4,20})@(163|126)\.com$","laowang@126.com").group(1) Out[11]: 'laowang' In [12]: re.match(r"([a-zA-Z0-9]{4,20})@(163|126)\.com$","laowang@126.com").group(2) Out[12]: '126'
In [13]: html_str = "<h1>hahahah</h1>" In [14]: re.match(r"<\w*>.*</\w*>",html_str) Out[14]: <_sre.SRE_Match object; span=(0, 16), match='<h1>hahahah</h1>'> In [15]: re.match(r"<(\w*)>.*</\1>",html_str) Out[15]: <_sre.SRE_Match object; span=(0, 16), match='<h1>hahahah</h1>'> In [16]: re.match(r"<(\w*)>.*</\1>",html_str).group() Out[16]: '<h1>hahahah</h1>' In [19]: html_str = "<body><h1>hahahah</h1></body>" In [20]: re.match(r"<(\w*)><(\w*)>.*</\2></\1>",html_str).group() Out[20]: '<body><h1>hahahah</h1></body>'
(?P<name>) 分组起别名
(?P=name) 引用别名为name分组匹配到的字符串
In [21]: html_str = "<body><h1>hahahah</h1></body>" In [22]: re.match(r"<(?P<p1>\w*)><(?P<p2>\w*)>.*</(?P=p2)></(?P=p1)>",html_str).group() Out[22]: '<body><h1>hahahah</h1></body>'
search 不从头匹配,但匹配出某个需求的数据
In [23]: re.search(r"\d+","阅读次数为:9999").group() Out[23]: '9999' In [24]: re.search(r"\d+","阅读次数为:9999,点赞数为:100").group() Out[24]: '9999'
In [25]: re.search(r"^\d+","阅读次数为:9999,点赞数为:100").group() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-25-46f1edc190fb> in <module>() ----> 1 re.search(r"^\d+","阅读次数为:9999,点赞数为:100").group() AttributeError: 'NoneType' object has no attribute 'group' 加上 ^ 达到match的功能,从开头开始匹配
In [27]: re.findall(r"\d+","阅读次数为:9999,点赞数为:100") Out[27]: ['9999', '100'] In [28]: re.findall(r"\d+","python = 9999,c = 7890,c++ = 12345") Out[28]: ['9999', '7890', '12345']
替换:
In [30]: re.sub(r"\d+","1024","python = 997,c++ = 12345") Out[30]: 'python = 1024,c++ = 1024'
import re def add(temp): strNum = temp.group() num = int(strNum) + 1 return str(num) ret = re.sub(r"\d+",add,"python = 997") print(ret) ret = re.sub(r"\d+",add,"python = 99") print(ret) python = 998 python = 100
split 切割
import re ret = re.split(r":| ","info:xiaoZhang 33 shandong") print(ret) ['info', 'xiaoZhang', '33', 'shandong']