python之re模块

1. python的re模块介绍

re是Python内置的正则表达式模块，它提供了大量正则表达式操作的函数。使用re模块可以进行高效快速的字符串匹配、查找、替换等操作。下面介绍re模块的常用功能及相关操作

匹配函数
1. re.match()
2. re.fullmatch()
查找函数
1. re.search()
2. re.findall()
3. re.finditer()
替换函数
1. re.sub()
2. re.subn()
分割函数
1. re.split()

2. 匹配函数

2.1 re.match() 在字符串开头匹配指定的正则表达式

re.match(pattern, string, flags=0)) 在字符串开头匹配指定的正则表达式，如果匹配成功返回一个匹配对象，否则返回None

pattern：正则表达式模式。

string：待匹配的字符串。
flags：匹配模式，如忽略大小写，多行匹配等。详细标识如下：
- re.I # 使匹配对大小写不敏感
- re.M # 多行模式，^可以匹配每行的行首
- re.S # 使.匹配包括换行符在内的所有字符
- re.L # 根据当前环境区域设置解释\w\b等特殊字符
- re.U # 根据Unicode字符集解析字符，\w\b等特殊字符按Unicode字符集解析
- re.X # 忽略正则表达式中的空格和#号后面的注释

若要获得匹配项，则可以使用 group() 或 group(index) 返回。在group()和group(index)中，如果index没有指定，则返回所有匹配的部分。注: index是指正则表达式中的分组索引

import re

pattern = r"^Hello"
string = "Hello, world!"
match = re.match(pattern, string)
if match:
    print("Match Found!") # 匹配成功
else:
    print("Match Not Found!")

# group()
pattern = r"[a-z]+"
string = "Hello, world"
match = re.match(pattern, string)
if match:
    print("Match Found!")
    print(match.group()) # 'ello' 
else:
    print("Match Not Found!")

# group(index)
pattern = r"(\d)+-\d+"
string = "200-1234"
match = re.match(pattern, string)
if match:
    print("Match Found!")
    print(match.group(1)) # '0' 
else:
    print("Match Not Found!")

注意：如果使用group()获取所有匹配组结果，可以使用groups()方法来获取所有结果的元组。

import re

pattern = r"(\d+)-(\d+)"
string = "200-1234"
match = re.match(pattern, string)
if match:
    print(match.group())        # 200-1234
    print(match.group(1,2))     # ('200', '1234')
    print(match.groups())       # ('200', '1234')
else:
    print("Match Not Found!")

2.2 re.fullmatch() 全匹配

尝试将整个字符串与正则表达式模式进行匹配，如果字符串完全匹配正则表达式，则返回一个匹配对象，否则返回None。

pattern：正则表达式模式。
string：待匹配的字符串。
flags：匹配模式。

import re

pattern = r"\d+"
string = "1234"
match = re.fullmatch(pattern, string)
if match:
    print("Match Found!")
else:
    print("Match Not Found!")

3. 查找函数

3.1 re.search()

re.search(pattern, string, flags=0) 在整个字符串中查找指定正则表达式的第一次出现，如果匹配成功，则返回一个匹配对象，否则返回None。

pattern：正则表达式模式。
string：待匹配的字符串。
flags：匹配模式。

import re

pattern = r"world"
string = "hello world"
match = re.search(pattern, string)
if match:
    print("Match Found!")
else:
    print("Match Not Found!")

# 获取匹配的开始和结束位置
print(match.start(), match.end())    # 6, 11

3.2 re.findall()

re.findall(pattern, string, flags=0) 查找字符串中所有与正则表达式匹配的子字符串，并返回一个列表。

pattern：正则表达式模式。
string：待匹配的字符串。
flags：匹配模式。

import re

pattern = r"\d+"
string = "hello 123 world 456"
match = re.findall(pattern, string)
print(match) # ['123', '456']

3.3 re.finditer()

re.finditer(pattern, string, flags=0) 查找字符串中所有与正则表达式匹配的子字符串，并返回一个迭代器，该迭代器返回所有匹配对象。

pattern：正则表达式模式。
string：待匹配的字符串。
flags：匹配模式。

import re

pattern = r"\d+"
string = "hello 123 world 456"
match_iter = re.finditer(pattern, string)
for match in match_iter:
    print(match.group())    # 123  456

MatchObject 是一个包含与正则表达式匹配的文本的对象，一个MatchObject对象包含了找到的模式的信息，如在何处找到它，该组的内容等。这个对象始终是真值。如果匹配失败，则返回的对象为None。

import re

pattern = r"\d+"
string = "hello 123 world 456"
match_iter = re.finditer(pattern, string)
for match in match_iter:
    print(match, type(match))  # <re.Match object; span=(6, 9), match='123'>  <class 're.Match'>

4. 替换函数

4.1 re.sub()

re.sub(pattern, repl, string, count=0, flags=0) 在字符串中查找匹配项，并使用repl替换它们。repl可以是一个字符串或一个函数。如果使用函数，将匹配的对象作为参数传入，并且函数应返回替换的字符串。count指定替换的次数。如果count为0，则所有匹配的项都将被替换。

pattern：正则表达式模式。
repl：用于替换匹配字符串的字符串或函数。
string：待匹配的字符串。
count：替换的最大次数之前，默认为0。
flags：匹配模式。

import re

pattern = r"\s+"
string = "hello    world"
replace_string = re.sub(pattern, "_", string)
print(replace_string) # hello_world

使用函数替换时：

import re

def to_upper(match_obj):
    return match_obj.group().upper()

pattern = r"\b\w+\b"
string = "hello world"
replace_string = re.sub(pattern, to_upper, string)
print(replace_string) # HELLO WORLD

4.2 re.subn()

re.subn(pattern, repl, string, count=0, flags=0) 在字符串中查找匹配项，并使用repl替换它们。它与re.sub()类似，但返回一个元组，其中第一个元素是新字符串，第二个元素是替换发生的次数。

pattern：正则表达式模式。
repl：用于替换匹配字符串的字符串或函数。
string：待匹配的字符串。
count：替换的最大次数之前，默认为0。
flags：匹配模式。

import re

pattern = r"\s+"
string = "hello    world"
replace_string, count = re.subn(pattern, "_", string)
print(replace_string) # hello_world
print(count) # 3

5. 分割函数

5.1 re.split()

re.split(pattern, string, maxsplit=0, flags=0) 根据正则表达式模式对字符串进行分割，并返回一个包含由分割部分组成的列表。maxsplit指定分割的次数。如果指定为0，则所有匹配的部分都将被分割。

pattern：正则表达式模式。
string：待分割的字符串。
maxsplit：可选参数，指定分割次数。如果未提供或为0，则所有匹配项都将被分割。
flags：匹配模式。

import re

pattern = r"\s+"
string = "hello   world"
match = re.split(pattern, string)
print(match) # ['hello', 'world']

6. 正则匹配符

以下是常用的正则表达式符号：

. 匹配任意字符（除了换行符）
^ 匹配字符串的开头
$ 匹配字符串的末尾
* 匹配前面的字符0次或多次 -
+ 匹配前面的字符1次或多次
? 匹配前面的字符0次或1次，表示该字符是可选的
{n} 匹配前面的字符n次
{n,m} 匹配前面的字符至少n次，至多m次
[] 匹配方括号中列举的任意一个字符
| 或，匹配两个或多个正则表达式之一
() 括号内的正则表达式表示一个组
\d 数字，等价于 [0-9]
\w Word字符（字母、数字、下划线），等价于 [a-zA-Z0-9_]
\s 空格、制表符、换行符等空白字符的其中任意一个
\D 非数字字符
\W 非 Word 字符
\S 非空白字符

使用括号可以将一系列字符看作一个整体，使其具有相同的优先级，并可用于提取匹配的子串。例如，表达式 (abc)+ 可以匹配由任意数目的 “abc” 组成的串，但该表达式不匹配 “ab” 或 “ababc” 等。匹配操作可执行子表达式（即括号内的表达式）或者复合表达式（即整个正则表达式）。需要注意的是，一些特殊符号例如 ., *, |, $ 等在正则表达式中有特殊的含义，如果需要匹配它们本身，需要使用反斜杠 \ 进行转义。例如，表达式 \| 可以匹配竖线 |。

posted @ 2021-11-29 04:26 我不知道取什么名字好阅读(107) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· winform 绘制太阳，地球，月球运作规律
· TypeScript + Deepseek 打造卜卦网站：技术与玄学的结合
· AI 智能体引爆开源社区「GitHub 热点速览」
· Manus的开源复刻OpenManus初探
· 写一个简单的SQL生成工具

公告

昵称：我不知道取什么名字好
园龄： 3年3个月
粉丝： 2
关注： 5

+加关注

2025年3月

日

一

二

三

四

五

六

我不知道取什么名字好