zhenggaoxiong

Python正则表达式语法

单个字符匹配

字符 匹配
.   匹配任意字符(除了\n)
[...] 匹配字符集
\d/\D 匹配数字/非数字
\s/\S 匹配空白/非空白字符
\w/\W 匹配单词字符[a-zA-A0-9]

 

 

 

 

 

 

 

 

 


 

 


In [8]: ma = re.match(r'.','b')

In [9]: ma.gro
ma.group      ma.groupdict  ma.groups     

In [9]: ma.group()
Out[9]: 'b'

In [10]: ma = re.match(r'.','0')

In [11]: ma.grou
ma.group      ma.groupdict  ma.groups     

In [11]: ma.group()
Out[11]: '0'

In [12]: clear


In [13]: ma = re.match(r'{.}','{a}')

In [14]: ma.group()
Out[14]: '{a}'

In [15]: ma = re.match(r'{.}','{0}')

In [16]: ma.grou
ma.group      ma.groupdict  ma.groups     

In [16]: ma.group()
Out[16]: '{0}'

In [17]: ma = re.match(r'{..}','{01}')

In [18]: ma.group()
Out[18]: '{01}'

In [19]: ma = re.match(r'{[abc]}','{a}')

In [20]: ma.group()
Out[20]: '{a}'

In [21]: ma = re.match(r'{[a-z]}','{d}')

In [22]: ma.group()
Out[22]: '{d}'

In [23]: ma = re.match(r'{[a-zA-Z]}','{A}')

In [24]: ma.group()
Out[24]: '{A}'

In [25]: ma = re.match(r'{[a-zA-Z0-9]}','{0}')

In [26]: ma.group()
Out[26]: '{0}'

In [27]: ma = re.match(r'{[\w]}','{ }')

In [28]: ma

In [29]: ma = re.match(r'{[\W]}','{ }')

In [30]: ma
Out[30]: <_sre.SRE_Match object; span=(0, 3), match='{ }'>

In [31]: ma.group()
Out[31]: '{ }'

In [32]: ma = re.match(r'{[\W]}','{9}')

In [33]: ma.group()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-33-7c62fc675aee> in <module>()
----> 1 ma.group()

AttributeError: 'NoneType' object has no attribute 'group'

In [34]: ma

In [35]: ma = re.match(r'[[\w]]','[a]')

In [36]: ma

In [37]: ma = re.match(r'\[[\w]\]','[a]')

In [38]: ma.group()
Out[38]: '[a]'

In [39]: ma = re.match(r'\[[\w]\]','[0]')

In [40]: ma.group()
Out[40]: '[0]'

In [41]: 

 


 

多个字符匹配

 

 

字符 匹配
* 匹配前一个字符0次或者无限次
+ 匹配前一个字符1次或者无限次
? 匹配前一个字符0次或者1次
{m}/{m,n} 匹配前一个字符m次或者n次
*?/+?/?? 匹配模式变为非贪婪(尽可能少匹配字符)

 

 

 

 

 

 

 

 

 

 


 

 

In [1]: import re

In [2]: ma = re.match(r'[A-Z][a-z]','Aa')

In [3]: ma.grou
ma.group      ma.groupdict  ma.groups     

In [3]: ma.group()
Out[3]: 'Aa'

In [4]: ma = re.match(r'[A-Z][a-z]','A')

In [6]: ma

In [8]: 

In [8]: ma = re.match(r'[A-Z][a-z]*','A')

In [9]: ma
Out[9]: <_sre.SRE_Match object; span=(0, 1), match='A'>

In [10]: ma.group()
Out[10]: 'A'
In [12]: ma = re.match(r'[A-Z][a-z]*','Asdsdwqass')

In [14]: ma.
ma.end        ma.group      ma.lastgroup  ma.re         ma.start
ma.endpos     ma.groupdict  ma.lastindex  ma.regs       ma.string
ma.expand     ma.groups     ma.pos        ma.span       

In [14]: ma.group()
Out[14]: 'Asdsdwqass'

In [15]: ma = re.match(r'[A-Z][a-z]*','1Asdsdwqass')

In [16]: ma

In [17]: ma = re.match(r'[A-Z][a-z]*','Asd1sdwqass')

In [18]: ma.group()
Out[18]: 'Asd'

In [19]: ma = re.match(r'[_a-zA-Z]+[_\w]*','10')

In [20]: ma

In [21]: ma = re.match(r'[_a-zA-Z]+[_\w]*','_ht11')

In [22]: ma.group()
Out[22]: '_ht11'

In [23]: ma = re.match(r'[1-9]?[0-9]','99')

In [24]: ma.group()
Out[24]: '99'

In [25]: ma = re.match(r'[1-9]?[0-9]','90')

In [26]: ma.group()
Out[26]: '90'

In [27]: ma = re.match(r'[1-9]?[0-9]','9')

In [28]: ma.group()
Out[28]: '9'

In [29]: ma = re.match(r'[1-9]?[0-9]','0')

In [30]: ma.group()
Out[30]: '0'

In [31]: ma = re.match(r'[1-9]?[0-9]','09')

In [32]: ma.group()
Out[32]: '0'

In [33]: ma = re.match(r'[[a-zA-Z0-9]{6}','abc123')

In [34]: ma.group()
Out[34]: 'abc123'

In [35]: ma = re.match(r'[[a-zA-Z0-9]{6}','abc1234')

In [36]: ma.group()
Out[36]: 'abc123'

In [37]: ma = re.match(r'[[a-zA-Z0-9]{6}','abc1__')

In [38]: ma

In [39]: ma = re.match(r'[[a-zA-Z0-9]{6}@163.com','abc123@163.com')

In [40]: ma.group()
Out[40]: 'abc123@163.com'

In [41]: ma = re.match(r'[[a-zA-Z0-9]{6,10}@163.com','abc1234@163.com')

In [42]: ma.grou
ma.group      ma.groupdict  ma.groups     

In [42]: ma.group()
Out[42]: 'abc1234@163.com'

In [43]: ma = re.match(r'[0-9][a-z]*?','1bc')

In [44]: ma.group()
Out[44]: '1'

In [45]: ma = re.match(r'[0-9][a-z]*','1bc')

In [46]: ma.group()
Out[46]: '1bc'

 

边界匹配

字符 匹配
^   匹配字符串开头
$ 匹配字符串结尾
\A/\Z 指定的字符串匹必须出现在开头/结尾

 

 

 

 

 

 

 


 

In [48]: ma = re.match(r'[[a-zA-Z0-9]{6,10}@163.com','abc1234@163.comabc') 

In [49]: ma.group()
Out[49]: 'abc1234@163.com'

In [50]: ma = re.match(r'[[a-zA-Z0-9]{6,10}@163.com$','abc1234@163.comabc')

In [51]: ma

In [52]: ma = re.match(r'^[[a-zA-Z0-9]{6,10}@163.com$','abc1234@163.com')
In [53]: ma.group()
Out[53]: 'abc1234@163.com'

In [54]: ma = re.match(r'\Aimooc[\w]*','imoocpython')

In [55]: ma.group()
Out[55]: 'imoocpython'

In [56]: ma = re.match(r'\Aimooc[\w]*','iimooc')

In [57]: ma.group()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-57-7c62fc675aee> in <module>()
----> 1 ma.group()

AttributeError: 'NoneType' object has no attribute 'group'

 

分组匹配

字符 匹配
| 匹配左右任意一个表达式
(a,b) 括号中表达式作为一个分组
\<number> 引用编号为num的分组匹配到的字符串
(?P<name>) 分组起一个别名
(?P=name) 引用别名为name的分组匹配字符串

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 

In [59]: ma = re.match(r'abc|d','abc')

In [60]: ma.group()
Out[60]: 'abc'

In [61]: ma = re.match(r'abc|d','d')

In [62]: ma.group()
Out[62]: 'd'

In [63]: ma = re.match(r'[1-9]?\d$','9')

In [64]: ma.group()
Out[64]: '9'

In [65]: ma = re.match(r'[1-9]?\d$','99')

In [66]: ma.group()
Out[66]: '99'

In [67]: ma = re.match(r'[1-9]?\d$','09')

In [68]: ma

In [69]: ma = re.match(r'[1-9]?\d$','100')

In [70]: ma.group()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-70-7c62fc675aee> in <module>()
----> 1 ma.group()

AttributeError: 'NoneType' object has no attribute 'group'

In [71]: ma = re.match(r'[1-9]?\d$|100','100')

In [72]: ma.group()
Out[72]: '100'

In [73]: ma = re.match(r'[1-9]?\d$|100','99')

In [74]: ma.group()
Out[74]: '99'

In [75]: ma = re.match(r'[\w]{4,6}@163.com','imooc@163.com')

In [76]: ma.group()
Out[76]: 'imooc@163.com'

In [77]: ma = re.match(r'[\w]{4,6}@(163,123).com','imooc@163.com')

In [78]: ma = re.match(r'[\w]{4,6}@(163,123).com','imooc@123.com')

In [79]: ma.group()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-79-7c62fc675aee> in <module>()
----> 1 ma.group()

AttributeError: 'NoneType' object has no attribute 'group'

In [80]: ma = re.match(r'[\w]{4,6}@(163|123).com','imooc@123.com')

In [81]: ma.group()
Out[81]: 'imooc@123.com'

In [82]: ma = re.match(r'<[\w]+>','<book>')

In [83]: ma.group()
Out[83]: '<book>'

In [84]: ma = re.match(r'<([\w]+>)','<book>')

In [85]: ma.group()
Out[85]: '<book>'

In [86]: ma = re.match(r'<([\w]+>)\1','<book>')

In [87]: ma.groups()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-87-f4e4ca66607d> in <module>()
----> 1 ma.groups()

AttributeError: 'NoneType' object has no attribute 'groups'

In [88]: ma = re.match(r'<([\w]+>)\1','<book>book>')

In [89]: ma.groups()
Out[89]: ('book>',)

In [90]: ma.group()
Out[90]: '<book>book>'

In [91]: ma = re.match(r'<([\w]+>\1','<book>book>')



In [3]: ma = re.match(r'<([\w]+>)[\w]+</\1','<book>python</book>')

In [4]: ma.group()
Out[4]: '<book>python</book>'

In [5]: ma = re.match(r'<([\w]+>)[\w]+</\1','<book>python</book1>')

In [6]: ma


In [9]: ma = re.match(r'<(?P<mark>[\w]+>)[\w]+</(?P=mark)','<book>python</book>')

In [10]: ma.group()
Out[10]: '<book>python</book>'

 

posted @ 2018-07-29 09:15  zhenggaoxiong  阅读(2104)  评论(0编辑  收藏  举报