Python中的正则表达式用法

正则表达式:
re 模块
import re

re.match(pattern,str): 从左边开始匹配,只要匹配失败,就退出
re.search(pattern,str): 从左边开始匹配,如果匹配到第一个,则不再继续匹配
re.findall(pattern,str): 从左边开始匹配,直到匹配完所有满足条件的,并返回一个满足匹配条件的列表
re.sub(pattern,新内容,str): 替换

基础:
[]: 范围
.: 任意字符
|: 或者
(): 一组

量词:
*: >=0
+: >=1
?: 0,1
{m}: =m
{m,}: >=m
{m,n}: [m,n]

预定义:
\s space
\S not space
\d digit
\D not digit
\w word [0-9a-zA-Z_]
\W not word [^0-9a-zA-Z_]
\b
\B

分组:
() ----> group(1)

number
(\w+)(\d) ----> group(1) group(2)
引用:
(\w+)(\d
) \1 \2 表示引用前面的内容

name
(?\w+) (?P=name)

贪婪匹配:
Python里数量词默认是贪婪的(在少数语言里也可能是默认非贪婪),总是尝试匹配尽可能多的字符;
非贪婪则相反,总是尝试匹配尽可能烧的字符.
在"*","?","+","{m,n}"后面加上?,使贪婪变成非贪婪

# 大写字母 [A-z]
msg = 'FKRITOFLSDKFWWPGVL'
result = re.match(r'[A-Z]+', msg)
print(result)

# 小写字母 [a-z]
msg = 'sdfwsdfsfsf'
result = re.match(r'[a-z]+', msg)
print(result)

# 数字 [0-9]  或者 \d
msg = '334322341098'
result = re.match(r'\d+', msg)
print(result)

# 带区位的电话号码 电话号码是5~11位,且不能是0开头
msg = '020-43948574'
result = re.match(r'(\d{3}|\d{4})-([1-9]\d{4,10})', msg)
print(result)
area_num = result.group(1)
phone_num = result.group(2)
print('区号:{},电话:{}'.format(area_num, phone_num))

# 手机号码  1开始, 3,5,7,8为第二位,11位数字
msg = '18665028070'
result = re.match(r'1[3578]\d{9}$', msg)
print(result)

# 邮箱  qq,126,163,139  4lkjl2lj234l@qq.com
msg = '4223lsds2l_42@139.cn'
result = re.match(r'\w{5,15}@(qq|126|163|139)\.(com|cn)', msg)
print(result)

# HTML标签
# 取名的用法  ?P<name>       ?p=name
msg = '<html><div><a>百度一下就知道了</a></div></html>'
result = re.match(r'(<(?P<tag1>[0-9a-zA-Z]+)>(.*)</(?P=tag1)>)', msg)
print(result)
print(result.group())
print('0---', result.group(0))
print('1---', result.group(1))
print('2---', result.group(2))
print('3---', result.group(3))

# sub 把所有的分数都加1
msg = '001:91,002:99,003:95'


def func(pattern):
    match = pattern.group(1)
    temp1 = pattern.group(2)
    temp2 = int(temp1) + 1
    return match.replace(temp1, str(temp2))


result = re.sub(r'(:(\d+),?)', func, msg)
print(result)

# split 分割
msg = '001:91,002:99,003:95'
result = re.split(r'[:,]', msg)
print(result)

# 贪婪与非贪婪

msg = 'abc1234abc'
result = re.match(r'abc(\d+)', msg)  # 贪婪
result2 = re.match(r'abc(\d+?)', msg)  # 非贪婪
print(result)
print(result2)

posted @ 2021-02-23 10:54  kevin.l  阅读(69)  评论(0编辑  收藏  举报