007、【byhy】常见语法—— 贪婪模式和非贪婪模式

一、常见语法—— 贪婪模式和非贪婪模式

我们要把下面的字符串中的所有html标签都提取出来，

source = '<html><head><title>Title</title>'

得到这样的一个列表

['<html>', '<head>', '<title>', '</title>']

很容易想到使用正则表达式 <.*>

写出如下代码

import re

source = '<html><head><title>Title</title>'

p = re.compile(r'<.*>')

print(p.findall(source))

执行结果如下：

['<html><head><title>Title</title>']

Process finished with exit code 0

怎么回事？原来在正则表达式中， ‘*’, ‘+’, ‘?’ 都是贪婪地，使用他们时，会尽可能多的匹配内容，

所以， <.*> 中的星号（表示任意次数的重复），一直匹配到了字符串最后的 </title> 里面的e。

解决这个问题，就需要使用非贪婪模式，也就是在星号后面加上 ? ，变成这样 <.*?>

代码改为：

import re

source = '<html><head><title>Title</title>'
# 注意有比之前多一个 ?  问号
p = re.compile(r'<.*?>')

print(p.findall(source))

执行结果如下：

['<html>', '<head>', '<title>', '</title>']

Process finished with exit code 0

posted @ 2021-09-08 09:05 空-山-新-雨阅读(112) 评论(0) 编辑收藏举报

刷新页面返回顶部