正则表达式(re)
1、re.match(pattern, str, flag) 从str的第一个字母开始匹配,若不是开头的,尽管属于str内,则无法匹配。
2、贪婪匹配与非贪婪匹配(?)
贪婪匹配:尝试匹配尽可能多的字符
>>> sentence = """You said "why?" and I say "I don't know"."""
>>> re.findall(r'"(.*)"', sentence)
['why?" and I say "I don\'t know']
本意是选出人物所说的话,但是却由于“贪婪”特性,出现了匹配不当
非贪婪匹配:尝试匹配尽可能**少**的字符
>>> sentence = """You said "why?" and I say "I don't know"."""
>>> re.findall(r'"(.*?)"', sentence)
['why?', "I don't know"]
3、re.search() 扫描整个字符串并返回第一个成功的
4、re.sub() 替换字符串
5、re.compile()是将正则字符串编译成正则表达式对象,便于复用该匹配模式
6、re.S 多行匹配(换行)
7、re.findall(pattern, html) 从html中返回所有符合的pattern正则的结果
pattern = re.compile('<li.*?title="(.*?)".*?href="(.*?)".*?more-meta.*?author">(.*?)</span>.*?year">(.*?)</span>.*?</li>', re.S)
results = re.findall(pattern, html)