Loading

Python re正则表达式 findall和finditer 之间的区别/ 如何查看所有包含分组在内的匹配信息

finditer可以返回对象,而findall只会返回结果

比如对于如下 content

> **Theorem**
> Strategy game and stackelberg game in zero-sum are essentially identical.

- 竞争状态
  - 维持在一个次优的纳什均衡
- 合作状态
  - 确保合作状态能进行下去

> **Folk Theorem**
> 
...
> 在无限重复博弈中,假设存在单阶段NE$a^{*}$以及一个更优的群体策略$\hat{a}$
> 那么存在 $\delta$ 的某取值,可以使$(\hat{a},\hat{a},\dots,\hat{a})$成为SPNE

- 存在一个策略,使得各个玩家都有比竞争NE更好的收益

findall只返回了结果,并且可以看到,在有分组的情况下,findall只返回了分组的结果

finditer返回了re.Match object,并且包含 span=(0, 92), match='\n> Theorem\n> Strategy game and stackelberg等重要属性,可以获取包含分组在内的完全匹配信息

使用match.group()就可以恢复出包含分组在内的完整的匹配信息


https://stackoverflow.com/questions/3765024/different-behavior-between-re-finditer-and-re-findall

import re
CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
mailbody = open("test.txt").read()
for match in pattern.finditer(mailbody):
    print(match)
print()
for match in pattern.findall(mailbody):
    print(match)

prints

<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>

('790', 'PR. REAL', '21:06', '04m')
('758', 'PORTAS BENFICA', '21:10', '09m')
('790', 'PR. REAL', '21:14', '13m')
('758', 'PORTAS BENFICA', '21:21', '19m')
('790', 'PR. REAL', '21:29', '28m')
('758', 'PORTAS BENFICA', '21:38', '36m')
('758', 'SETE RIOS', '21:49', '47m')
('758', 'SETE RIOS', '22:09', '68m')

If you want the same output from finditer as you're getting from findall, you need

for match in pattern.finditer(mailbody):
    print(tuple(match.groups()))

posted @ 2022-06-11 12:11  ZXYFrank  阅读(169)  评论(0编辑  收藏  举报