『BASH』——Hadex's brief analysis of "Lookahead and Lookbehind Zero-Length Assertions"

/*为节省时间,本文以汉文撰写*/


 ~前言~


  深入学习正则表达式,可以很好的提高思维逻辑的缜密性;又因正则应用于几乎所有高级编程语言,其重要性不言而喻,是江湖人士必备的内功心法。

  正则表达式概要(object:PCRE)

  「一」匹配方向

  • 横向视图,即按行:从左至右
  • 纵向视图,即按列:自上而下

  「二」基本匹配位移单位

    默认以单个字符为基本位移单位;可通过\b“contents”\b格式指定按“连续字符串”为基本单位进行逐位匹配,\b\b的边界定义可以为blank-空格、Tab制表符、\n-Linux换行符、\r-MS回车符或标点符号

  「三」匹配范围

    默认贪婪特性,即匹配符合条件的最大范围;可在量词后追加一个“?”转换为懒惰模式 


 ~正题~    Zero-Length Assertions


  中文通常译作“零宽断言”,起源于Perl5,very powerful and flexible!为便于理解,可将其与^$\b等归为一类,即:不实际占用任何字符位的虚拟分界线,英文名称即包含“Zero-Length”!

  按其相对匹配目标的位移方向,可分为Lookahead和Lookbehind,按其匹配逻辑取向(True/False),又分为positive和negative

  即:

  • Positive Lookahead Zero-Length Assertions正逻辑向前位移零宽断言——按基本位移单位逐个查找符合条件的目标,然后在目标之前标记虚拟分界线;表达式(?=exp) 
  • Positive Lookbehind Zero-Length Assertions正逻辑向后位移零宽断言——按基本位移单位逐个查找符合条件的目标,然后在目标之后标记虚拟分界线;表达式(?<=exp)向后位移零宽断言,其“exp”不能包含如{1,}*+等量词以及(ab)|(bcde)等形式
  • Negative Lookahead Zero-Length Assertions负逻辑向前位移零宽断言——按基本位移单位逐个查找不符合条件的目标,然后在目标之前标记虚拟分界线;表达式(?!exp)
  • Negative Lookbehind Zero-Length Assertions负逻辑向后位移零宽断言——按基本位移单位逐个查找不符合条件的目标,然后在目标之后标记虚拟分界线;表达式(?<!exp)向后位移零宽断言,其“exp”不能包含如{3,100}+*等量化单位或(\d)|(\s\w)等表达式

  特别注意:Zero-Length Assertions中的匹配条件“exp”仅仅用于确定“虚似分界线”的位置,并不选中或排除任何字符,其意义是缩小匹配范围;最终匹配出的结果是由零宽表达式之外的条件确定的。

 


 如下以“ip addr”的输出为示例分类讲解


 

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 40:8d:5c:e2:87:2f brd ff:ff:ff:ff:ff:ff
    inet 172.18.21.244/24 brd 172.18.21.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::428d:5cff:fee2:872f/64 scope link 
       valid_lft forever preferred_lft forever

实验一:提取所有端口的名称及其MTU值

f@z ~ $ ip addr | grep -oP '(\w+(?=:+\s+<+))|(?<=\smtu\s)\d+'
lo
65536
eth0
1500

实验二「00」:排除“ip addr”输出结果中含有“lft”的行

f@z ~ $ ip addr | grep -oP '^(?!.*lft).*$'
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet6 ::1/128 scope host 
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 40:8d:5c:e2:87:2f brd ff:ff:ff:ff:ff:ff
    inet 172.18.21.244/24 brd 172.18.21.255 scope global eth0
    inet6 fe80::428d:5cff:fee2:872f/64 scope link 

实验二「01」:错误演示

f@z ~ $ ip addr | grep -oP '(?!.*lft).*'
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
ft forever
    inet6 ::1/128 scope host 
ft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 40:8d:5c:e2:87:2f brd ff:ff:ff:ff:ff:ff
    inet 172.18.21.244/24 brd 172.18.21.255 scope global eth0
ft forever
    inet6 fe80::428d:5cff:fee2:872f/64 scope link 
ft forever
f@z ~ $ ip addr | grep -oP '\b(?!.*lft).*\b'
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
 forever
inet6 ::1/128 scope host
 forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 40:8d:5c:e2:87:2f brd ff:ff:ff:ff:ff:ff
inet 172.18.21.244/24 brd 172.18.21.255 scope global eth0
 forever
inet6 fe80::428d:5cff:fee2:872f/64 scope link
 forever
 错误解析:必须“^$”限定基本位移单位为整行,方能达成任意一次匹配结果为false时,即判定排除整行的目的。

REFERENCE: 

http://www.regular-expressions.info/lookaround.html

posted @ 2013-07-28 14:07  范辉  阅读(271)  评论(0编辑  收藏  举报