[LeetCode] Regular Expression Matching

(Version 1.1)

（Regular Expression Matching在LeetCode上的标签是DP，但其实不用DP只用recursion也可以过，这题的test case一般。）

这一题自己没有做出来，感觉还是思维角度的问题。个人感觉一个可行的思维角度是先从Code Ganker的那个recursive的角度出发(http://blog.csdn.net/linhuanmars/article/details/21145563)，写出一个可行的recursive解法，然后再进一步思考怎么用DP来存储可能会多次用到的中间结果，进而推想到喜刷刷的解法(http://bangbingsyb.blogspot.com/2014/11/leetcode-regular-expression-matching.html)。

（一）最straightforward的brutal force解法，利用recursion思想

首先是recursive的brutal force解法：

 1 public class Solution {
 2     public boolean isMatch(String s, String p) {
 3         return helper(s, p, 0, 0);
 4     }
 5 
 6     private boolean helper(String s, String p, int i, int j) {
 7         /* first consider if p is used up */
 8         if (j == p.length()) { // 1. p is used up
 9             return i == s.length();
10         }
11         /* then consider if p is not used up */
12         if (j == p.length() - 1 || p.charAt(j + 1) != '*') { // 2. compare only the current char
13             // 2 possible situation, p reaches the last or p[j + 1] != '*'
14             if (i == s.length() || p.charAt(j) != s.charAt(i) && p.charAt(j) != '.') {
15                 return false;
16             }
17             return helper(s, p, i + 1, j + 1);
18         }
19         // 3. need to consider all possible cases since p[j + 1] == '*'
20         while (i < s.length() && (p.charAt(j) == '.' || p.charAt(j) == s.charAt(i))) {
21             if (helper(s, p, i, j + 2)) { // starting from ".*" matches nothing
22                 return true;
23             }
24             i++;
25         }
26         return helper(s, p, i, j + 2); // all that can be matched to p[j:j+1] has been used
27     }
28 }

首先要去考虑base case是什麽。作为base case的应该是当穷尽了s或者p，因为这时是可以立即得到结果的，如果p（正则表达式）用完了而s有剩余，那么一定不match，这是最简单的情况。

而p没有被用完的情况就是正常的情况了，需要逐个考虑可能性：一是只需考虑p中当前char的，即当前char是p中最后一个（能考虑到把这个情况和后一种情况归类到一起是个挑战）或者当前char后面的一个char不是'*'，二是要考虑p的当前char以及紧跟着的'*'。第一种情况意味着这时只要两个String中的char不match即可返回false，如果相等，则要由两个String后面的内容决定，于是乎recursively call helper。第二种情况，意味着我们现在考虑到p中的当前char+后面跟着的'*'的组合可以在s中match任意多个相同或不同(当组合是".*"时)的char，于是之需要一个while循环，在循环内不停地枚举每一个s中可能与当前p中的组合相match的情况，只要有任何一种情况使得s和p的后面的substring互相match，则可立即返回true，因为毕竟这个helper只是用来判断从某两个下标开始能否match到尽头的。最后当发现所有可能性都不match时，也就是while循环退出时，就需要固定住s中的下标，然后再在p中跳过x+'*'这一组合，去调用helper来查看在p中后面的substring有没有可能跟当前下标开始的s中的substring相match。

（二）对recursive brutal force solution的优化，基于DP思想

DP的解法感觉code ganker的不是很好，没有去深究（也可能他的更高效一些？），而是自己先写了一个结合了Yu's Garden（http://www.cnblogs.com/yuzhangcmu/p/4105529.html）和喜刷刷的版本如下。

 1 public class Solution {
 2     public boolean isMatch(String s, String p) {
 3         boolean[][] match = new boolean[s.length() + 1][p.length() + 1];
 4         match[0][0] = true;
 5         for (int i = 0; i <= s.length(); i++) {
 6             for (int j = 0; j <= p.length(); j++) {
 7                 if (j == 0) {
 8                     match[i][j] = i == 0;
 9                 } else if (j >= 2 && p.charAt(j - 1) == '*') {
10                     char c = p.charAt(j - 2);
11                     if (i == 0 || isSame(s.charAt(i - 1), c)) {
12                         /* 
13                          * Most of the errors here are because I didn't realize that to match 2 strings, the length must be considered
14                          */
15                         if (i <= 1) {
16                             match[i][j] = match[i][j - 2] || match[i][j - 1];
17                         } else {
18                             match[i][j] = match[i][j - 2] || match[i][j - 1] || (match[i - 1][j] && isSame(s.charAt(i - 2), c));
19                         }
20                     } else {
21                         match[i][j] = match[i][j - 2];
22                     }
23                 } else { // need to consider only the current char in p, i >= 1 is important because to match current char in p to s, substring of s must contain at least one char, this is easy to forget!
24                     match[i][j] = (i >= 1) && match[i - 1][j - 1] && isSame(s.charAt(i - 1), p.charAt(j - 1));
25                 }
26             }
27         }
28         return match[s.length()][p.length()];
29     }
30     
31     private boolean isSame(char a, char b) {
32         return b == '.' || a == b;
33     }
34 }

二维数组中的元素match[i][j]代表的是长度为i的s的从头开始的substring能否match长度为j的p的从头开始的substring，所以注意其维度分别为s.length() + 1和p.length() + 1。

考虑问题的切入点，也就是之前一直不得要领的地方，是固定以p为视角，用当前p的char来match s中的char，所以就可以自然考虑到其实只有两种情况需要考虑：p中当前char是'*'和p中当前char不是'*'。若p中当前char不是'*'，则只需要判断其是不是和s中的当前char相匹配（相同或者p的char为'.'）；若p中当前char是'*'，则需要考虑其与之前一个char构成的组合能match多少个s中的char。对于向我这样的DP新手，这里有另一个思维方式需要去学习，就是要以“当前的数组元素应该得什么值”为思考的出发点，而不要去考虑整个问题，否则思路容易混乱，一个时刻只考虑眼下要解决的问题，而眼前的问题是只要这个组合能match上s中0/1/多个char组合这三种情况中的任意一种情况，就应该认为当前match数组元素要取true。用来刻画这个组合match 0个char时的条件是p前一个char能不能match上s当前char，若能，说明当前p中的组合即使match 0个s中的char，依然可以认为s.substring(i)与p.substring(j)相match；用来刻画这个组合match s中的1个char时的条件是p中'*'前面的char与s中的当前char相match；用来刻画这个组合match s中的2个及以上char的条件是最难准确想到的：首先当前p中的这个组合要能够满足p.substring(j) matches s.substring(i - 1)，也就是说这里要求这个组合至少要match两个s中的char，如果不能满足p.substring(j) matches s.substring(i - 1)，那么当前p中我们在考虑到这个组合至多能match一个s中的char。在满足了这第一个条件之后，这个组合同时还要再满足其代表的char能够match当前的s中我们在考虑到char，所以是match[i - 1][j] && isSame(s.charAt(i - 1), p.charAt(j - 2))。

当然上面自己第一次写的那个混杂了两个人答案的代码比实际需要的乱了很多，下面是明白了这题需要考虑的核心之后重新写的DP代码：

 1 public class Solution {
 2     public boolean isMatch(String s, String p) {
 3         boolean[][] match = new boolean[s.length() + 1][p.length() + 1];
 4         match[0][0] = true;
 5         for (int i = 0; i <= s.length(); i++) {
 6             for (int j = 1; j <= p.length(); j++) {
 7                 if (i != 0) {
 8                     char c = p.charAt(j - 1);
 9                     if (j > 1 && c == '*') { // consider 2 chars in p
10                         match[i][j] = match[i][j - 2] || match[i][j - 1] || (match[i - 1][j] && isSame(s.charAt(i - 1), p.charAt(j - 2)));
11                     } else if (match[i - 1][j - 1] && isSame(s.charAt(i - 1), c)) { // consider 1 char in p
12                         match[i][j] = true;
13                     }
14                 } else if (j >= 2 && p.charAt(j - 1) == '*' && match[0][j - 2]) { // handle empty substring of s
15                     match[0][j] = true;
16                 }
17             }
18         }
19         return match[s.length()][p.length()];
20     }
21     
22     private boolean isSame(char s, char p) {
23         return p == '.' || s == p;
24     }
25 }

这里的一个思路的小变化，或者说更突出问题思路本质的地方，是把i == 0单独拿出来处理。不敢说代码这样写是好是坏，不过这样写我觉得突出了解题的思路：考虑当前是s的什么样的substring与p的什么样的substring在试图match，如果i == 0，说明是s的空子串在试图matchp的子串，那么一个insight就是p的子串必须只包含"X*"组合，也就是说其长度必为偶数，且根本无需考虑每个'*'之前的char是什么，只要每隔一个char就是一个'*'即可。

posted on 2015-03-06 09:00 _icecream 阅读(367) 评论(0) 编辑收藏举报