Python入门(06) -- 正则表达式
1 原子
(1) 普通字符作为原子
import re
pattern = "baidu"
string = "www.baidu.com"
result = re.search(pattern, string)
print(result)
打印结果:
<_sre.SRE_Match object; span=(4, 9), match='baidu'>
(2) 非打印字符作为原子
import re
pattern = '\n'
string = """www.baidu.com
2017-12-16
"""
result = re.search(pattern, string)
print(result)
打印结果:
<_sre.SRE_Match object; span=(13, 14), match='\n'>
(3) 通用字符作为原子表
import re
pattern = "\w\dpython\w"
string = "abc333python_py"
result = re.search(pattern, string)
print(result)
打印结果:
<_sre.SRE_Match object; span=(4, 13), match='33python_'>
说明:
字符 | 解释 |
---|---|
\w | 匹配字母、数字及下划 |
\W | 匹配非字母、数字及下划线 |
\s | 匹配任意非打印字符,等价于 [\t\n\r\f] |
\S | 匹配任意非空字符 |
\d | 匹配任意数字,等价于 [0-9] |
\D | 匹配任意非数字 |
\A | 匹配字符串开始 |
\Z | 匹配字符串结束,如果是存在换行,只匹配到换行前的结束字符串 |
\z | 匹配字符串结束 |
\G | 匹配最后匹配完成的位置 |
\b | 匹配一个单词边界,也就是指单词和空格间的位置 |
\B | 匹配非单词边界。’er\B’ 能匹配 “verb” 中的 ‘er’,但不能匹配 “never” 中的 ‘er’ |
\n、\t等 | 匹配一个非打印字符 |
\1…\9 | 匹配第n个分组的内容 |
\10 | 匹配第n个分组的内容,如果它经匹配。否则指的是八进制字符码的表达式。 |
(4) 原子表
import re
string = "abc123pythonp_py"
pattern1 = "\w\dpython[a-z]\w"
pattern2 = "\w\dpython[^a-z]\w"
pattern3 = "\w\dpython[a-z]\W"
result1 = re.search(pattern1, string)
result2 = re.search(pattern2, string)
result3 = re.search(pattern3, string)
print(result1)
print(result2)
print(result3)
打印结果:
<_sre.SRE_Match object; span=(4, 14), match='23pythonp_'>
None
None
2 元字符
(1) 任意匹配元字符
import re
pattern = "...Python."
string = "ILove123Python_py"
print(re.search(pattern, string))
(2) 边界限定元字符
import re
"""匹配以ILove开始的字符串"""
pattern1 = "^ILove"
"""匹配以Love开始的字符串"""
pattern2 = "^Love"
"""匹配以py结束的字符串"""
pattern3 = "py$"
"""匹配以ny结束的字符串"""
pattern4 = "ny$"
string = "ILove123Python_py"
result1 = re.search(pattern1, string)
result2 = re.search(pattern2, string)
result3 = re.search(pattern3, string)
result4 = re.search(pattern4, string)
print(result1)
print(result2)
print(result3)
print(result4)
打印结果:
<_sre.SRE_Match object; span=(0, 5), match='ILove'>
None
<_sre.SRE_Match object; span=(15, 17), match='py'>
None
(3) 限定符
import re
string = "ILoveAndccccc123Python_py"
pattern1 = "Py.*n"
"""匹配从o后的两个v"""
pattern2 = "dc{2}"
"""匹配从o后的三个v"""
pattern3 = "dc{3}"
"""匹配从o后的最少两个v"""
pattern4 = "dc{2,}"
result1 = re.search(pattern1, string)
result2 = re.search(pattern2, string)
result3 = re.search(pattern3, string)
result4 = re.search(pattern4, string)
print(result1)
print(result2)
print(result3)
print(result4)
打印结果:
<_sre.SRE_Match object; span=(16, 22), match='Python'>
<_sre.SRE_Match object; span=(7, 10), match='dcc'>
<_sre.SRE_Match object; span=(7, 11), match='dccc'>
<_sre.SRE_Match object; span=(7, 13), match='dccccc'>
(4) 模式选择符
import re
string = "ILoveAndccccc123Python_py"
pattern = "Love|Python"
print(re.search(pattern, string))
打印结果:
<_sre.SRE_Match object; span=(1, 5), match='Love'>
(5) 模式单元符
import re
pattern1 = "(cd){1,}"
pattern2 = "cd{1,}"
string = "abcdcdcdePython_py"
result1 = re.search(pattern1, string)
result2 = re.search(pattern2, string)
print(result1)
print(result2)
打印结果:
<_sre.SRE_Match object; span=(2, 8), match='cdcdcd'>
<_sre.SRE_Match object; span=(2, 4), match='cd'>
3 模式修正
import re
pattern1 = "python"
pattern2 = "python"
string = "abcdcdcdePython_py"
result1 = re.search(pattern1, string)
result2 = re.search(pattern2, string, re.I)
print(result1)
print(result2)
打印结果:
None
<_sre.SRE_Match object; span=(9, 15), match='Python'>
4 贪婪模式和懒惰模式
1) 贪婪模式:尽可能多的匹配, 找到最后一个y为止
2) 懒惰模式:尽可能少的匹配,找到第一个y为止
import re
pattern1 = "P.*y" #贪婪模式
pattern2 = "P.*?y" #懒惰模式
string = "abcdcdcdePython_py"
result1 = re.search(pattern1, string)
result2 = re.search(pattern2, string)
print(result1)
print(result2)
打印结果:
<_sre.SRE_Match object; span=(9, 18), match='Python_py'>
<_sre.SRE_Match object; span=(9, 11), match='Py'>