python之re正则简单够用
0.
1.参考
https://docs.python.org/2/library/re.html
https://docs.python.org/2/howto/regex.html
https://docs.python.org/3/library/re.html
string | re | 备注 |
re.match(pattern, string, flags=0) | at the start of the string | |
S.find(sub [,start [,end]]) -> int | re.search(pattern, string, flags=0) | Scan through string looking for a match |
S.replace(old, new[, count]) -> string | re.findall(pattern, string, flags=0) | re.finditer |
2.分组 m.group()
xx
In [560]: m.group? Docstring: group([group1, ...]) -> str or tuple. Return subgroup(s) of the match by indices or names. For 0 returns the entire match. Type: builtin_function_or_method In [542]: m=re.search(r'(-{1,2}(gr))','pro---gram-files') In [543]: m.group() #自带 Out[543]: '--gr' In [544]: m.group(0) #自带,返回整个匹配到的字符串 For 0 returns the entire match. 注意 m.string 是被检索的完整原文。。。 Out[544]: '--gr' In [545]: m.group(1) Out[545]: '--gr' In [546]: m.group(2) Out[546]: 'gr' In [547]: m.group(3) #加的 ( 不满足则报错 --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-547-71a2c7935517> in <module>() ----> 1 m.group(3) IndexError: no such group In [548]: m.group(1,2) #选择多个分组,返回tuple Out[548]: ('--gr', 'gr') In [549]: m.groups() #选择所有分组 Out[549]: ('--gr', 'gr')
m.groupdict 用于命名分组
In [557]: m.groupdict? Docstring: groupdict([default=None]) -> dict. Return a dictionary containing all the named subgroups of the match, keyed by the subgroup name. The default argument is used for groups that did not participate in the match Type: builtin_function_or_method In [558]: m=re.search(r'(-{1,2}(?P<GR>gr))','pro---gram-files') In [559]: m.groupdict() Out[559]: {'GR': 'gr'}
3.提取 re.findall()
re.findall(pattern, string, flags=0)
In [97]: text = "He was carefully disguised but captured quickly by police." In [98]: re.findall(r"\w+ly", text) #相当于 m.group(0) Out[98]: ['carefully', 'quickly'] In [99]: re.findall(r"(\w+)ly", text) #手动加单个括号限定内容,相当于返回 m.group(1) Out[99]: ['careful', 'quick'] In [100]: re.findall(r"((\w+)(ly))", text) #多个括号,从左到右数 (,相当于返回 m.groups() Out[100]: [('carefully', 'careful', 'ly'), ('quickly', 'quick', 'ly')]
In [102]: re.findall(r"((1\w+)(ly))", text)
Out[102]: []
4.替换 re.sub()
re.sub(pattern, repl, string, count=0, flags=0)
repl 里面的 前向引用 Backreferences, such as \6
, are replaced with the substring matched by group 6 in the pattern. 也可以通过 func 实现。
注意 mysql regexp 不支持 \1
https://stackoverflow.com/questions/4122393/negative-backreferences-in-mysql-regexp 提到 unless you can install/use LIB_MYSQLUDF_PREG.
https://stackoverflow.com/questions/7058209/reference-to-groups-in-a-mysql-regex
In [158]: def func(m): ...: return m.group('DEF')+' '+m.group(2) #别名 ...: In [159]: re.sub(r'(?P<DEF>def)\s+([a-z]+)\s*\(\s*\):', func, 'def func(): def f():') Out[159]: 'def func def f' In [160]: re.sub(r'(?P<DEF>def)\s+([a-z]+)\s*\(\s*\):', r'\1 \2', 'def func(): def f():') #不支持 \别名 Out[160]: 'def func def f'
5. Backreferences 前向引用在 pattern
5.1扑克牌找对子
In [204]: re.search(r'(.).*\1','ab123') In [205]: re.search(r'(.).*\1','ab121') Out[205]: <_sre.SRE_Match at 0x71ca120> In [206]: _.group() Out[206]: '121'
5.2连续多个相同
In [207]: re.search(r'.{3}','1122') #错误 Out[207]: <_sre.SRE_Match at 0x71b94a8> In [208]: re.search(r'(.){3}','1122') #错误 Out[208]: <_sre.SRE_Match at 0x71ca198> In [209]: re.search(r'(.)\1\1','1122') #正确 In [210]: re.search(r'(.)\1\1','1112') Out[210]: <_sre.SRE_Match at 0x71ca210> In [211]: re.search(r'(.)\1{2}','1112') Out[211]: <_sre.SRE_Match at 0x71ca288> In [212]: _.group() Out[212]: '111'