解读正则表达式

摘自《精通Perl》第2章“高级正则表达式”

解读正则表达式

Deciphering Regular Expressions

当尝试分析一个正则式的时候——不管是别人的代码还是自己（可能是很久以前）写的，我们可以打开Perl正则式的调试模式。Perl的-D开关会打开Perl解析器的调试选项（不是你的程序，见第4章）。这个开关需要一系列字母和数字来表示应该打开的功能。选项-Dr会打开解析和执行正则式时的调试功能。

我们可以用一个小程序来检查一个正则式。程序的第一个参数是待匹配的字符串，第二个是正则式。我们把这个程序保存为explain-regex：

#!/usr/bin/perl

$ARGV[0] =~ /$ARGV[1]/;

我们用字符串Just another Perl hacker和正则式Just another (\S+) hacker试试这个程序，输出的结果主要有两段，文档perldebguts中有详尽的介绍。首先，Perl编译正则式，-Dr会显示出Perl是如何解析正则式的。它会显示出正则式节点，比如EXACT和NSPACE，以及所有的优化措施，比如锚定的"Just another"。然后，Perl会尝试匹配目标串，并逐个节点地显示出匹配的过程。输出中包含了大量的信息，它准确地告诉了我们Perl做的事情：

$ perl -Dr explain-regex 'Just another Perl hacker,' 'Just another (\S+) hacker,'

Omitting $` $& $' support.

EXECUTING...

Compiling REx `Just another (\S+) hacker,'

size 15 Got 124 bytes for offset annotations.

first at 1

rarest char k at 4

rarest char J at 0

1: EXACT <Just another >(6)

6: OPEN1(8)

8: PLUS(10)

9: NSPACE(0)

10: CLOSE1(12)

12: EXACT < hacker,>(15)

15: END(0)

anchored "Just another " at 0 floating " hacker," at 14..2147483647 (checking anchored) minlen 22

Offsets: [15]

1[13] 0[0] 0[0] 0[0] 0[0] 14[1] 0[0] 17[1] 15[2] 18[1] 0[0] 19[8] 0[0] 0[0] 27[0]

Guessing start of match, REx "Just another (\S+) hacker," against "Just another Perl hacker,"...

Found anchored substr "Just another " at offset 0...

Found floating substr " hacker," at offset 17...

Guessed: match at offset 0

Matching REx "Just another (\S+) hacker," against "Just another Perl hacker,"

Setting an EVAL scope, savestack=3

0 <> <Just another> | 1: EXACT <Just another >

13 <ther > <Perl ha> | 6: OPEN1

13 <ther > <Perl ha> | 8: PLUS

NSPACE can match 4 times out of 2147483647...

Setting an EVAL scope, savestack=3

17 < Perl> < hacker> | 10: CLOSE1

17 < Perl> < hacker> | 12: EXACT < hacker,>

25 <Perl hacker,> <> | 15: END

Match successful!

Freeing REx: `"Just another (\\S+) hacker,"'

Perl自带的编译器指令re有一个调试模式，该模式不须要解析器打开-DDEBUGGING选项。用use re 'debug'打开调试模式后，它就会对整个程序有效。它不像大部分编译器指令那样有词义范围。我们修改前面的程序用编译器指令re替换掉命令行开关：

#!/usr/bin/perl

use re 'debug';

$ARGV[0] =~ /$ARGV[1]/;

其实不修改程序也可以用re。我们可以通过命令行参数来指定：

$ perl -Mre=debug explain-regex 'Just another Perl hacker,' 'Just another (\S+) hacker,'

把正则表达式作为参数传给这个程序后，我们得到的结果和前面那个用-Dr的程序几乎完全一样。

尽管有点过时了，模块YAPE::Regex::Explain可以用非常朴实的英语解释正则式。它会解析正则表达式并解释每个部分的功能。它不能解释语义上的目的，但是我们不能奢求太多。用一个小小的程序就可以分析通过命令行指定的正则表达式：

#!/usr/bin/perl

use YAPE::Regex::Explain;

print YAPE::Regex::Explain->new( $ARGV[0] )->explain;

即使是一个短小简单的正则式，我们也会得到大量的输出：

$ perl yape-explain 'Just another (\S+) hacker,'

The regular expression:

(?-imsx:Just another (\S+) hacker,)

matches as follows:

NODE EXPLANATION

----------------------------------------------------------------------

(?-imsx: group, but do not capture (case-sensitive)

(with ^ and $ matching normally) (with . not

matching \n) (matching whitespace and # normally):

----------------------------------------------------------------------

Just another 'Just another '

----------------------------------------------------------------------

( group and capture to \1:

----------------------------------------------------------------------

\S+ non-whitespace (all but \n, \r, \t, \f,

and " ") (1 or more times (matching the most amount possible))

----------------------------------------------------------------------

) end of \1

----------------------------------------------------------------------

hacker, ' hacker,'

----------------------------------------------------------------------

) end of grouping

----------------------------------------------------------------------

相关阅读：《精通Perl》译者序

“创业&升职”，请看《走出软件作坊》；

“求职&面试”，请看《编程之美——微软技术面试心得》

posted @ 2009-02-24 16:59 博文视点阅读(602) 评论(0) 编辑收藏举报

刷新页面返回顶部

博文视点官方博客

技术凝聚实力专业创新出版与向上的心合作共同成长！

解读正则表达式

公告

博文视点官方博客

技术凝聚实力 专业创新出版 与向上的心合作 共同成长！

解读正则表达式

公告

技术凝聚实力专业创新出版与向上的心合作共同成长！