python 内置模块-re

想要在python中使用正则表达式，就需要先导入re模块，正则表达式是一个强大的功能，可以为我们节省很多工作量。

一、元字符：

　　用一些具有特殊含义的符号表示特定种类的字符或位置。

. 匹配除换行符以外的任意字符

\w匹配字母或数字或下划线或汉字

\W匹配任何非字母数字或下划线或汉字

\s匹配任意的空白符

\d匹配数字

\D匹配非数字字符

\b匹配单子的开始或结束

^匹配字符串的开始，如果放在字符串的开头，则表示取非。

$匹配字符串的结束

匹配次数

*重复零次或多次

+重复一次或更多次

？重复零次或一次

{n}重复n次

{n,}重复n次或多次

{n,m}重复n到m次。

范围

［］用来匹配一个指定的字符类别，所谓的字符类别就是你想匹配的一个字符集，对于字符集中的字符可以理解成或的关系。

[0-9] 匹配0~9的数字，同\d

[a-z]匹配所有的小写字母

[A-Z]匹配所有的大写字母

[a-zA-Z] 匹配所有的字母

[a-z0-9A-Z] 等同于\w

字符串转义

如果想匹配元字符本身或者正则中的一些特殊字符，使用\转义。例如匹配*这个字符则使用\*，匹配\这个字符，使用\\。

需要转义的字符：$, (, ), *, +, ., [, ], ?, \, ^, {, }, |

为了避免过多\的使用，python提供了原生字符的方法，也就是在字符串前面加上一个“r”，代表此字符串中的“\”可直接用于正则表达式，而不用再次转义。因此，请养成在python的正则表达式字符串的前面添加一个“r“的好习惯。

二、re模块的方法

1、match

re.match(' 规则','字符串 ') 从字符串的开头进行匹配，匹配单个。

2、search

re.search(' ',' ') 在字符串中进行匹配，并返回第一个匹配到的值。

3、findall

re.findall('','') 在字符串中进行匹配，并以列表的形式返回所有满足的值。

>>> re.findall('\d+','dsg2335dhreh54623grh46fdh57')

['2335', '54623', '46', '57']

4、group，groups

a = "123abc456"
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group()
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(0)
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(1)
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(2)
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).groups()

5、sub

sub(pattern, repl, string, count=0, flags=0)用于替换匹配到的字符串。

>>> import re
>>> a = 'sfgwg323dgw13'
>>> b = re.sub(r'\d+','111',a)
>>> b
'sfgwg111dgw111'

6、split(pattern, string, maxsplit=0, flags=0) 根据指定匹配进行分组

content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"
new_content = re.split('\*', content)
# new_content = re.split('\*', content, 1)
print new_content

content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"
new_content = re.split('[\+\-\*\/]+', content)
# new_content = re.split('\*', content, 1)
print new_content

inpp = '1-2*((60-30 +(-40-5)*(9-2*5/3 + 7 /3*99/4*2998 +10 * 568/14 )) - (-4*3)/ (16-3*2))'
inpp = re.sub('\s*','',inpp)
new_content = re.split('\(([\+\-\*\/]?\d+[\+\-\*\/]?\d+){1}\)', inpp, 1)
print new_content

posted @ 2016-07-01 19:02 张瑞东阅读(3416) 评论(0) 编辑收藏举报

刷新页面返回顶部

张瑞东

python 内置模块-re

一、元字符：

二、re模块的方法

公告