解读正则表达式

摘自《精通Perl》第2章“高级正则表达式

解读正则表达式

Deciphering Regular Expressions

当尝试分析一个正则式的时候——不管是别人的代码还是自己(可能是很久以前)写的,我们可以打开Perl正则式的调试模式Perl-D开关会打开Perl解析器的调试选项(不是你的程序,见第4章)。这个开关需要一系列字母和数字来表示应该打开的功能。选项-Dr会打开解析和执行正则式时的调试功能。

我们可以用一个小程序来检查一个正则式。程序的第一个参数是待匹配的字符串,第二个是正则式。我们把这个程序保存为explain-regex

#!/usr/bin/perl

 

$ARGV[0] =~ /$ARGV[1]/;

 

我们用字符串Just another Perl hacker和正则式Just another (\S+) hacker试试这个程序,输出的结果主要有两段,文档perldebguts中有详尽的介绍。首先,Perl编译正则式,-Dr会显示出Perl是如何解析正则式的。它会显示出正则式节点,比如EXACTNSPACE,以及所有的优化措施,比如锚定的"Just another"。然后,Perl会尝试匹配目标串,并逐个节点地显示出匹配的过程。输出中包含了大量的信息,它准确地告诉了我们Perl做的事情:

$ perl -Dr explain-regex 'Just another Perl hacker,' 'Just another (\S+) hacker,'

Omitting $` $& $' support.

 

EXECUTING...

 

Compiling REx `Just another (\S+) hacker,'

size 15 Got 124 bytes for offset annotations.

first at 1

rarest char k at 4

rarest char J at 0

   1: EXACT <Just another >(6)

   6: OPEN1(8)

   8:   PLUS(10)

   9:     NSPACE(0)

  10: CLOSE1(12)

  12: EXACT < hacker,>(15)

  15: END(0)

anchored "Just another " at 0 floating " hacker," at 14..2147483647 (checking anchored) minlen 22

Offsets: [15]

                1[13] 0[0] 0[0] 0[0] 0[0] 14[1] 0[0] 17[1] 15[2] 18[1] 0[0] 19[8] 0[0] 0[0] 27[0]

Guessing start of match, REx "Just another (\S+) hacker," against "Just another Perl hacker,"...

Found anchored substr "Just another " at offset 0...

Found floating substr " hacker," at offset 17...

Guessed: match at offset 0

Matching REx "Just another (\S+) hacker," against "Just another Perl hacker,"

  Setting an EVAL scope, savestack=3

   0 <> <Just another>    |  1:  EXACT <Just another >

  13 <ther > <Perl ha>    |  6:  OPEN1

  13 <ther > <Perl ha>    |  8:  PLUS

                                  NSPACE can match 4 times out of 2147483647...

  Setting an EVAL scope, savestack=3

  17 < Perl> < hacker>    | 10:    CLOSE1

  17 < Perl> < hacker>    | 12:    EXACT < hacker,>

  25 <Perl hacker,> <>    | 15:    END

Match successful!

Freeing REx: `"Just another (\\S+) hacker,"'

 

Perl自带的编译器指令re有一个调试模式,该模式不须要解析器打开-DDEBUGGING选项。用use re 'debug'打开调试模式后,它就会对整个程序有效。它不像大部分编译器指令那样有词义范围。我们修改前面的程序用编译器指令re替换掉命令行开关:

 

#!/usr/bin/perl

 

use re 'debug';

 

$ARGV[0] =~ /$ARGV[1]/;

 

其实不修改程序也可以用re。我们可以通过命令行参数来指定:

$ perl -Mre=debug explain-regex 'Just another Perl hacker,' 'Just another (\S+) hacker,'

 

把正则表达式作为参数传给这个程序后,我们得到的结果和前面那个用-Dr的程序几乎完全一样。

尽管有点过时了,模块YAPE::Regex::Explain可以用非常朴实的英语解释正则式。它会解析正则表达式并解释每个部分的功能。它不能解释语义上的目的,但是我们不能奢求太多。用一个小小的程序就可以分析通过命令行指定的正则表达式:

 

#!/usr/bin/perl

 

use YAPE::Regex::Explain;

 

print YAPE::Regex::Explain->new( $ARGV[0] )->explain;

 

即使是一个短小简单的正则式,我们也会得到大量的输出:

 

$ perl yape-explain 'Just another (\S+) hacker,'

The regular expression:

 

(?-imsx:Just another (\S+) hacker,)

 

matches as follows:

 

NODE                      EXPLANATION

----------------------------------------------------------------------

(?-imsx:                group, but do not capture (case-sensitive)

                      (with ^ and $ matching normally) (with . not

                     matching \n) (matching whitespace and # normally):

----------------------------------------------------------------------

  Just another             'Just another '

----------------------------------------------------------------------

  (                        group and capture to \1:

----------------------------------------------------------------------

    \S+                      non-whitespace (all but \n, \r, \t, \f,

                             and " ") (1 or more times (matching the most amount possible))

----------------------------------------------------------------------

  )                        end of \1

----------------------------------------------------------------------

   hacker,                 ' hacker,'

----------------------------------------------------------------------

)                        end of grouping

----------------------------------------------------------------------

 

相关阅读《精通Perl》译者序

 

 

“创业&升职”,请看《走出软件作坊》;

“求职&面试”,请看《编程之美——微软技术面试心得

posted @ 2009-02-24 16:59  博文视点  阅读(599)  评论(0编辑  收藏  举报