Python正则表达式里的单行re.S和多行re.M模式

Python正则表达式里的单行re.S和多行re.M模式

Python 的re模块内置函数几乎都有一个flags参数，以位运算的方式将多个标志位相加。其中有两个模式：单行（re.DOTALL, 或者re.S）和多行（re.MULTILINE, 或者re.M）模式。它们初看上去不好理解，但是有时又会非常有用。这两个模式在PHP和JavaScripts里都有。

单行模式 re.DOTALL

在单行模式里，文本被强制当作单行来匹配，什么样的文本不会被当作单行？就是里面包含有换行符的文本，比如：

This is the first line.\nThis is the second line.\nThis is the third line.

点号（.）能匹配所有字符，换行符例外。现在我们希望能匹配出整个字符串，当用点号（.）匹配上面这个字符串时，在换行符的地方，匹配停止。例如：

>>> a = 'This is the first line.\nThis is the second line.\nThis is the third line.'

>>> print a

This is the first line.

This is the second line.

This is the third line.

>>> import re

>>> p = re.match(r'This.*line.' ,a)

>>> p.group(0)

'This is the first line.'

>>>

在上面的例子里，即使是默认贪婪（greedy）的匹配，仍然在第一行的结尾初停止了匹配，而在单行模式下，换行符被当作普通字符，被点号（.）匹配：

>>> q = re.match(r'This.*line.', a, flags=re.DOTALL)

>>> q.group(0)

'This is the first line.\nThis is the second line.\nThis is the third line.'

点号（.）匹配了包括换行符在内的所有字符。所以，更本质的说法是

单行模式改变了点号（.）的匹配行为

多行模式 re.MULTILINE

在多行模式里，文本被强制当作多行来匹配。正如上面单行模式里说的，默认情况下，一个包含换行符的字符串总是被当作多行处理。但是行首符^和行尾符$仅仅匹配整个字符串的起始和结尾。这个时候，包含换行符的字符串又好像被当作一个单行处理。

在下面的例子里，我们希望能将三句话分别匹配出来。用re.findall( )显示所有的匹配项

>>> a = 'This is the first line.\nThis is the second line.\nThis is the third line.'

>>> print a

This is the first line.

This is the second line.

This is the third line.

>>> import re

>>> re.findall(r'^This.*line.$', a)

[]

>>>

默认点号不匹配换行符，我们需要设置re.DOTALL。

>>> re.findall(r'^This.*line.$', a, flags=re.DOTALL)

['This is the first line.\nThis is the second line.\nThis is the third line.']

>>>

匹配出了整句话，因为默认是贪婪模式，用问号切换成非贪婪模式：

>>> re.findall(r'^This.*?line.$', a, flags=re.DOTALL)

['This is the first line.\nThis is the second line.\nThis is the third line.']

>>>

仍然是整句话，这是因为^和$只匹配整个字符串的起始和结束。在多行模式下，^除了匹配整个字符串的起始位置，还匹配换行符后面的位置；$除了匹配整个字符串的结束位置，还匹配换行符前面的位置.

>>> re.findall(r'^This.*?line.$', a, flags=re.DOTALL+re.MULTILINE)

['This is the first line.', 'This is the second line.', 'This is the third line.']

>>>

更本质的说法是

多行模式改变了^和$的匹配行为

本文转自：

https://www.lfhacks.com/tech/python-re-single-multiline

posted @ 2018-12-15 14:16 SolidMango 阅读(18698) 评论(0) 收藏举报

刷新页面返回顶部

SolidMango

Algorithm/Patterns/Languages --1 and 0 are the whole world, but 1 and 0 are not enough..

Python正则表达式里的单行re.S和多行re.M模式

公告