Pyuthon正则re模块

就其本质而言，正则表达式（或 RE）是一种小型的、高度专业化的编程语言，（在Python中）它内嵌在Python中，并通过 re 模块实现。正则表达式模式被编译成一系列的字节码，然后由用 C 编写的匹配引擎执行。

（1）元字符

. ：除换行符以外的任意符号，re.S模式也可以使 . 匹配包括换行在内的所有字符

^：匹配字符串的开头

$：匹配字符串的末尾。

*：匹配0个或多个的表达式。默认贪婪模式

+：匹配1个或多个的表达式。默认贪婪模式

?：匹配0个或1个由前面的正则表达式,默认非贪婪模式

{ n,m}：匹配 n 到 m 次由前面的正则表达式定义的片段，贪婪方式

[ ]：字符集，多个字符选其一，[^...]取反

|：匹配做正则表达式或右边正则表达式

( )：G匹配括号内的表达式，也表示一个组

\：转移符

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58



import re

# (1) . ^ $
ret = re.findall("hello world","hello world")
print(ret)

ret = re.findall("^hello world$","hello python,hello world,hello re")
print(ret)

ret = re.findall("^hello .....$","hello world")
print(ret)

# (2) * + ?
ret = re.findall("^hello .*","hello ")
ret = re.findall("^hello .+","hello ")
ret = re.findall("^hello .?","hello abc")

# (3) {} ()
ret = re.findall("hello .{5}","hello python,hello world,hello re,hello yuan")
print(ret)
ret = re.findall("hello .{2,5}","hello python,hello world,hello re")
print(ret)
ret = re.findall("hello .{5},","hello python,hello world,hello re")
print(ret)
ret = re.findall("hello (.*?),","hello python,hello world,hello re,hello yuan,")
print(ret)
# ret = re.findall("hello (.*?)(?:,|$)","hello python,hello world,hello re,hello yuan")
# print(ret)

# (4) [] |
ret = re.findall("a[bcd]e","abeabaeacdeace")
print(ret)
ret = re.findall("[a-z]","123a45bcd678")
print(ret)
ret = re.findall("[^a-z]","123a45bcd678")
print(ret)
ret = re.findall("www\.([a-z]+)\.(?:com|cn)","www.baidu.com,www.jd.com")
print(ret)

# (5) \
'''
1、反斜杠后边跟元字符去除特殊功能,比如\.
2、反斜杠后边跟普通字符实现特殊功能,比如\d

    \d  匹配任何十进制数；      它相当于类 [0-9]。
    \D  匹配任何非数字字符；    它相当于类 [^0-9]。
    \s  匹配任何空白字符；      它相当于类 [ \t\n\r\f\v]。
    \S  匹配任何非空白字符；    它相当于类 [^ \t\n\r\f\v]。
    \w  匹配任何字母数字字符；   它相当于类 [a-zA-Z0-9_]。
    \W  匹配任何非字母数字字符； 它相当于类 [^a-zA-Z0-9_]
    \b  匹配一个特殊字符边界，比如空格 ，&，＃等
'''

ret = re.findall("\d+","123a45bcd678")
print(ret)
ret = re.findall("(?:\d+)|(?:[a-z]+)","123a45bcd678")
print(ret)

（2）正则方法

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33



import re

# 查找所有符合条件的对象
# re.findall() # 返回列表
# 查找第一个符合条件的匹配对象
s = re.search("\d+","a45bcd678")
print(s)
print(s.group())

# match同search,不过只在字符串开始处进行匹配
s = re.match("\d+","a45bcd678")
# print(s)
# print(s.group())

# 正则分割split
ret = re.split('[ab]', 'abcd')
print(ret)
# 正则替换
def func(match):

    name = match.group()
    print("name",name)
    return "xxx"

# \1代指第一个组匹配的内容  \2第二个组匹配的内容,思考如何能将所有的名字转大写替换
ret = re.sub("(hello )(.*?)(,)","\\1yuan\\3","hello python,hello world,hello re,")
print("ccc",ret)

# 编译再执行
obj=re.compile('\d{3}')
ret=obj.search('abc123ee45ff')
print(ret.group()) # 123

练习：爬虫豆瓣网

1
2
3
4
5
6



com=re.compile(
    '<div class="item">.*?<div class="pic">.*?<em .*?>(?P<id>\d+).*?<span class="title">(?P<title>.*?)</span>'
    '.*?<span class="rating_num" .*?>(?P<rating_num>.*?)</span>.*?<span>(?P<comment_num>.*?)评价</span>',
    re.S)

com.findall(s)

posted @ 2022-04-07 23:56 呼长喜阅读(35) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

呼长喜

让自己变得更好是解决一切问题的关键。 -HCX

Pyuthon正则re模块

（1）元字符

（2）正则方法

公告

呼长喜

让自己变得更好是解决一切问题的关键 。 -HCX

Pyuthon正则re模块

（1）元字符

（2）正则方法

公告

让自己变得更好是解决一切问题的关键。 -HCX