正则表达式（re）

1、re.match(pattern, str, flag) 从str的第一个字母开始匹配，若不是开头的，尽管属于str内，则无法匹配。

2、贪婪匹配与非贪婪匹配(?)

贪婪匹配：尝试匹配尽可能多的字符
>>> sentence = """You said "why?" and I say "I don't know"."""
>>> re.findall(r'"(.*)"', sentence) 
['why?" and I say "I don\'t know']
本意是选出人物所说的话，但是却由于“贪婪”特性，出现了匹配不当

非贪婪匹配：尝试匹配尽可能**少**的字符
>>> sentence = """You said "why?" and I say "I don't know"."""
>>> re.findall(r'"(.*?)"', sentence)
['why?', "I don't know"]

3、re.search() 扫描整个字符串并返回第一个成功的

4、re.sub() 替换字符串

5、re.compile()是将正则字符串编译成正则表达式对象，便于复用该匹配模式
6、re.S 多行匹配（换行）
7、re.findall(pattern, html) 从html中返回所有符合的pattern正则的结果

pattern = re.compile('<li.*?title="(.*?)".*?href="(.*?)".*?more-meta.*?author">(.*?)</span>.*?year">(.*?)</span>.*?</li>', re.S)
results = re.findall(pattern, html)

posted @ 2018-11-04 15:09 喜喜睡吧阅读(415) 评论(0) 收藏举报

刷新页面返回顶部

狗蛋儿

记录狗蛋儿爬虫之路的点点滴滴。

正则表达式（re）

公告