解读正则表达式
摘自《精通Perl》第2章“高级正则表达式”
解读正则表达式
Deciphering Regular Expressions
当尝试分析一个正则式的时候——不管是别人的代码还是自己(可能是很久以前)写的,我们可以打开Perl正则式的调试模式。Perl的-D开关会打开Perl解析器的调试选项(不是你的程序,见第4章)。这个开关需要一系列字母和数字来表示应该打开的功能。选项-Dr会打开解析和执行正则式时的调试功能。
我们可以用一个小程序来检查一个正则式。程序的第一个参数是待匹配的字符串,第二个是正则式。我们把这个程序保存为explain-regex:
#!/usr/bin/perl
$ARGV[0] =~ /$ARGV[1]/;
我们用字符串Just another Perl hacker和正则式Just another (\S+) hacker试试这个程序,输出的结果主要有两段,文档perldebguts中有详尽的介绍。首先,Perl编译正则式,-Dr会显示出Perl是如何解析正则式的。它会显示出正则式节点,比如EXACT和NSPACE,以及所有的优化措施,比如锚定的"Just another"。然后,Perl会尝试匹配目标串,并逐个节点地显示出匹配的过程。输出中包含了大量的信息,它准确地告诉了我们Perl做的事情:
$ perl -Dr explain-regex 'Just another Perl hacker,' 'Just another (\S+) hacker,'
Omitting $` $& $' support.
EXECUTING...
Compiling REx `Just another (\S+) hacker,'
size 15 Got 124 bytes for offset annotations.
first at 1
rarest char k at 4
rarest char J at 0
1: EXACT <Just another >(6)
6: OPEN1(8)
8: PLUS(10)
9: NSPACE(0)
10: CLOSE1(12)
12: EXACT < hacker,>(15)
15: END(0)
anchored "Just another " at 0 floating " hacker," at 14..2147483647 (checking anchored) minlen 22
Offsets: [15]
1[13] 0[0] 0[0] 0[0] 0[0] 14[1] 0[0] 17[1] 15[2] 18[1] 0[0] 19[8] 0[0] 0[0] 27[0]
Guessing start of match, REx "Just another (\S+) hacker," against "Just another Perl hacker,"...
Found anchored substr "Just another " at offset 0...
Found floating substr " hacker," at offset 17...
Guessed: match at offset 0
Matching REx "Just another (\S+) hacker," against "Just another Perl hacker,"
Setting an EVAL scope, savestack=3
0 <> <Just another> | 1: EXACT <Just another >
13 <ther > <Perl ha> | 6: OPEN1
13 <ther > <Perl ha> | 8: PLUS
NSPACE can match 4 times out of 2147483647...
Setting an EVAL scope, savestack=3
17 < Perl> < hacker> | 10: CLOSE1
17 < Perl> < hacker> | 12: EXACT < hacker,>
25 <Perl hacker,> <> | 15: END
Match successful!
Freeing REx: `"Just another (\\S+) hacker,"'
Perl自带的编译器指令re有一个调试模式,该模式不须要解析器打开-DDEBUGGING选项。用use re 'debug'打开调试模式后,它就会对整个程序有效。它不像大部分编译器指令那样有词义范围。我们修改前面的程序用编译器指令re替换掉命令行开关:
#!/usr/bin/perl
use re 'debug';
$ARGV[0] =~ /$ARGV[1]/;
其实不修改程序也可以用re。我们可以通过命令行参数来指定:
$ perl -Mre=debug explain-regex 'Just another Perl hacker,' 'Just another (\S+) hacker,'
把正则表达式作为参数传给这个程序后,我们得到的结果和前面那个用-Dr的程序几乎完全一样。
尽管有点过时了,模块YAPE::Regex::Explain可以用非常朴实的英语解释正则式。它会解析正则表达式并解释每个部分的功能。它不能解释语义上的目的,但是我们不能奢求太多。用一个小小的程序就可以分析通过命令行指定的正则表达式:
#!/usr/bin/perl
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new( $ARGV[0] )->explain;
即使是一个短小简单的正则式,我们也会得到大量的输出:
$ perl yape-explain 'Just another (\S+) hacker,'
The regular expression:
(?-imsx:Just another (\S+) hacker,)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and # normally):
----------------------------------------------------------------------
Just another 'Just another '
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
hacker, ' hacker,'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
相关阅读:《精通Perl》译者序
“创业&升职”,请看《走出软件作坊》;
“求职&面试”,请看《编程之美——微软技术面试心得》