正则表达式

正则表达式，又称正规表示式、正规表示法、正规表达式、规则表达式、常规表示法（英语：Regular Expression，在代码中常简写为regex、regexp或RE），计算机科学的一个概念。正则表达式使用单个字符串来描述、匹配一系列匹配某个句法规则的字符串。在很多文本编辑器里，正则表达式通常被用来检索、替换那些匹配某个模式的文本。

许多程序设计语言都支持利用正则表达式进行字符串操作。例如，在Perl中就内建了一个功能强大的正则表达式引擎。正则表达式这个概念最初是由Unix中的工具软件（例如sed和grep）普及开的。正则表达式通常缩写成“regex”，单数有regexp、regex，复数有regexps、regexes、regexen。
引用自维基百科https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F

以上来自https://www.cnblogs.com/chuxiuhong/p/5885073.html

正则表达式用于匹配字符串。

re模块的match()方法是从开头匹配

import re
d=re.match('abc','abcdfaff')
print(d)
#返回：<_sre.SRE_Match object; span=(0, 3), match='abc'>

View Code

#要想知道匹配的是什么，就在匹配的返回值变量后加.group(0)
import re
d=re.match('abc','abcdfaff')
print(d.group(0))
#返回：abc

View Code

re的findall()方法可以从任意位置处匹配：

#匹配数字0到10次，1到10次的运行结果：
import re
d=re.findall('[0-9]{0,10}','123456ab789cdfGFFDaff')
#d=re.findall('[0-9]{1，10}','987868969354465766776ab6cdfaff')
if d:
    print(d)
#the running result:['123456', '', '', '789', '', '', '', '', '', '', '', '', '', '', '']
#the running result:['123456', '789']

View Code

#匹配小写与大写字母0到10次，1到10次的运行结果：
import re
d=re.findall('[a-zA-Z]{1,10}','123456ab789cdfGFFDaff')
if d:
    print(d)
#the running result:['', '', '', '', '', '', 'ab', '', '', '', 'cdfGFFDaff', '']
#the running result:['ab', 'cdfGFFDaff']

View Code

#匹配一个或者多个字符串：
import re
d=re.findall('[a-zA-Z]+','123_456ab7.89c~dfGFFDaff')
if d:
    print(d)
#the running result:['ab', 'c', 'dfGFFDaff']

View Code

re的search()方法：

#匹配一个或者多个数字，从头开始找，直到找到第一个字符串为止：
import re
d=re.search('\d+','def123_456ab7.89c~dfGFFDaff')
if d:
    print(d.group())
#the running result:123

View Code

re的sub()方法，用于替换的：

#把所有的数字替换成'<',下面分别展示的是'd'和'd+'方法：
import re
d=re.sub('\d+','<','def123_456ab7.89c~dfGFFDaff')
if d:
    print(d)
#the running result:def<<<_<<<ab<.<<c~dfGFFDaff
#the running result:def<_<ab<.<c~dfGFFDaff

View Code

re的sub()方法，用于部分替换的：

#只替换前两个数字字符串：
import re
d=re.sub('\d+','<','def123_456ab7.89c~dfGFFDaff',count=2)
if d:
    print(d)
#the running result:def<_<ab7.89c~dfGFFDaff

View Code

查找以数字开头，以数字结尾的字符串：

#查找以数字开头的数字字符，返回数字字符：
import re
d=re.search('^\d','987654321ABCdef123_456ab7.89c~dfGFFDaff555')
if d:
    print(d)
#the running result:<_sre.SRE_Match object; span=(0, 1), match='9'>



#查找以数字开头的数字字符串，返回数字字符串：
import re
d=re.search('^\d+','987654321ABCdef123_456ab7.89c~dfGFFDaff555')
if d:
    print(d)
#the running result:<_sre.SRE_Match object; span=(0, 9), match='987654321'>



#查找以数字开头以数字结尾的数字字符串，返回数字字符串：
import re
d=re.search('^\d+$','987654321ABCdef123_456ab7.89c~dfGFFDaff555')
print(d)
#the running result:None
#返回的是None,因为整个字符串不全是数字，而条件中写的是\d+,有一个加号，
#如果是d=re.search('^\d+$','987654321')
#则返回结果是： <_sre.SRE_Match object; span=(0, 9), match='987654321'>

View Code

Something about the function findall():

#the function called findall() of the re return a string in the form of the list
import re
s1 = re.findall('org','https://docs.python.org/3/whatsnew/3.6.html')
print (s1)
#the result:['org']

View Code

#if the sign ^ is placed before a string,
#the function findall() will return a string which is matched to the original string
#in the form of the list
import re
s = re.findall('^https','https://docs.python.org/3/whatsnew/3.6.html')
print(s)
#the result:['https']

View Code

#if the sign $ is placed after a string,
#the function findall() will also return a string which is matched to the original string
#in the form of the list
import re
s = re.findall("html$","https://docs.python.org/3/whatsnew/3.6.html")
print(s)
#the result:['html']

View Code

#the symbol [...] is used to match one of a sigle character from the original string,
#the function findall() will return a series of  strings which are matched to the
# original string in the form of the list
import re
s = re.findall('[t,w]h','https://docs.python.org/3/whatsnew/3.6.html')
print(s)
#the result:['th', 'wh']

View Code

#the symbol 'd' is used to match a digital from the original string,
#the function findall() will return a series of digital character
# which are matched to the original string in the form of the list
#if you place many 'd', it will return a string composed of 
#corresponding numbers of digital character
import re
s1 = re.findall("\d","https://docs.python.org/3/whatsnew/3.6.html")
s2 = re.findall("\d\d\d","https://docs.python.org/3/whatsnew/3.6.html/1234")
print(s1)
print(s2)
#the result:['3', '3', '6']
#the result:['123']

View Code

#the symbol 'D' will shield(屏蔽) all the digitals.
#the function findall() will return single character
# in the form of the list
import re
s = re.findall('\D','good 123_ mornin_g!')
print (s)
#the result:['g', 'o', 'o', 'd', ' ', '_', ' ', 'm', 'o', 'r', 'n', 'i', 'n', '_', 'g', '!']

View Code

小练习：

import re
print(re.match('Liudehua','Liudehua演戏很好！').group())#自身匹配自身
print(re.match('.','Liudehua演戏很好！').group())#匹配任意一个字符
print(re.match('.*','Liudehua演戏很好！').group())#匹配*前一个字符0次或者多次
print(re.match(r'\\','\Liudehua演戏很好！').group())#\,反斜杠\后面跟元字符(\)去掉元字符的特殊功能
print(re.match('的+','的的的LLLLLiudehua演戏很好！').group())#匹配一次或者多次
print(re.match('的?','的的的iudehua演戏很好！').group())#匹配一个字符0次或者1次
print(re.match('^开头','开头Hiudehua演戏很好！').group())#匹配字符串开头
print(re.match('！末尾$','Hiudehua演戏很好！末尾'))#匹配字符串末尾？
print(re.match('的|H','Hiudehua演戏很好！').group())#匹配|两边表达式的任意一个
print(re.match('P{3}','PPPPPPiudehua演戏很好！').group())#匹配三次
print(re.match('.*P{3}','uuu(PPPPPP)dehua演戏很好！').group())#匹配三次
print(re.match('\d+','123nihao').group(0))#\d相当于[0-9]
print(re.match('\D','飞雪123nihao').group())#匹配非数字，相当于^\d
print(re.match('\D*\s\d','月下舞   123nihao').group(0))#\s匹配任何空白字符
print(re.match('\S','月下舞   123nihao').group(0))#相当于^\s，匹配任何非空白字符
print(re.match('\w*','月下舞_987   123nihao').group(0))#匹配字母，数字，下划线
print(re.match('\W*','***** &&月下舞_987   123nihao').group(0))#匹配非字母，数字，下划线
print(re.match('\Aqin','qin月下舞_987   123nihao').group(0))##仅匹配字符串开头，相当于^
print(re.match('hao$','qin月下舞_987  123nihao'))#仅匹配字符串结尾，相当于$?...
print(re.findall('\btina','tian tinaaaa'))
print(re.findall(r'\btina','tian tinaaaa'))#\b匹配单词边界
print(re.findall(r'\btina','tian#tinaaaa'))
print(re.findall(r'\btina\b','tian#tina@aaa'))

View Code

posted on 2018-07-19 11:21 一杯明月阅读(226) 评论(0) 编辑收藏举报