10. Regular Expression Matching

Implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

回溯算法 Backtracking

乍看之下，该问题很像一个字符串匹配问题。那么我们怎么样去匹配 ' * ' ?

通用的解决算法是，使用贪心算法：【尽可能多的匹配 ' * ' 之前的字符】。下面我们分析这种算法可能存在的问题。

s = “abbbc”, p = “ab*c”

假设我们先匹配了字符 'a'，然后我们开始匹配 ‘b*’，我们忽略所有的s中的 'b'，到最后，我们遇到字符 'c' ，p和s匹配。

s = “ac”, p = “ab*c”

在匹配‘a’后，因为s中没有任何字符'b'，所以略过 'b*'，直接匹配最后一个字符 'c'，p和s匹配。

到目前为止，贪心算法表现良好。接着看：

s = “abbc”, p = “ab*bbc”
当匹配‘b*’，我们会略过s中的所有字符'b'；之后所有的'b'就无法再被'bb'匹配，使用贪心算法，这个匹配的返回值将会是false，但我们预计的是 true。
有些人可能会对此提出改进，在s中，计算连续的字符 'b' 的个数，如果个数比在p中‘b*’后的连续b个数小或相等，那么我们认为匹配。

看起来似乎是解决了该问题。我们再看下一个例子：

s = “abcbcd”, p = “a.*c.*d”
这里，p中的“.*” 意味着‘.’ 重复0 or 随意次。因为 ‘.’ 可以匹配任意字符，所以无法确定 ‘.’ 到底应该重复多少次。p中的 ‘c’ 到底去匹配第一个，还是第二个'c' ，不得而知。

所以我们需要使用回溯backtracking 当匹配失败的时候。我们返回到上一次匹配成功的状态，并用‘*‘ 匹配更多的s中的字符。该算法需要使用到递归recursion。

递归recursion 在以下两种情况下停止:

如果p中的下一个字符不是 ‘*’, 则它必须匹配当前s中的字符。并继续用下一个字符，去匹配s中的字符。
如果p中的下一个字符是 ‘*’, 那我们必须使用 暴力搜索brute force 去匹配s中的当前字符，不断重复，直到不再匹配

代码

120ms

class Solution {
public:
    bool isMatch(string s, string p) {
        if (p.size() == 0) return s.size() == 0;//一定不能写反
        if (p[1] != '*')
            return (s[0] == p[0] || p[0] == '.'  && !s.empty()) && isMatch(s.substr(1), p.substr(1));
        int i = 0;
        while (s[i] == p[0] || (p[0] == '.' && i < s.size())) {
            if (isMatch(s.substr(i), p.substr(2))) return true;
            ++i;
        }
        return isMatch(s.substr(i), p.substr(2));
    }
};

现实应用中，实际上是使用正则表达式 grep tool 工具来匹配字符的。

来源： http://articles.leetcode.com/regular-expression-matching

动态规划 Dynamic Programming

首先说明一下规则，dp[m][n] 数组返回值boolean.。它告诉我们长度n的正则表达式，是否匹配长度m的字符串。

dp[m][n]: if s[0..m-1] matches p[0..n-1]

下面看算法的base case:

dp[0][0] 永远是 true
dp[m][0] 当 m > 0, 永远false。空的正则表达式不会匹配任何字符串。
dp[0][n] 可以是 true 或 false。如果p[j - 1] 是'*' 并且p[0..j - 3] 匹配空字符串，那么p[0.., j - 3, j - 2, j - 1] 也匹配空字符串s，

1

2

3

4

5

6

dp[0][0] = true;
for (int i = 1; i <= m; i++)
    dp[i][0] = false;
// p[0.., j - 3, j - 2, j - 1] matches empty if p[j - 1] is '*' and p[0..j - 3] matches empty
for (int j = 1; j <= n; j++)
    dp[0][j] = j > 1 && '*' == p[j - 1] && dp[0][j - 2];

下面来关注递推关系 recurrence relationship

 * dp[i][j]: if s[0..i-1] matches p[0..j-1]
 * if p[j - 1] != '*'
 *      dp[i][j] = dp[i - 1][j - 1] && s[i - 1] == p[j - 1]
 * if p[j - 1] == '*', denote p[j - 2] with x
 *      dp[i][j] is true if any of the following is true
 *      1) "x*" repeats 0 time and matches empty: dp[i][j - 2]
 *      2) "x*" repeats >= 1 times and matches "x*x": s[i - 1] == x && dp[i - 1][j]
 * '.' matches any single character

从而可以得到下面的代码：12ms

class Solution {
public:
    bool isMatch(string s, string p) {
        int m = s.size(), n = p.size();
        vector<vector<bool> > dp(m + 1, vector<bool>(n + 1, false));
 
        dp[0][0] = true;
        for (int i = 1; i <= m; ++i) dp[i][0] = false;
        for (int j = 1; j <= n; ++j) dp[0][j] = j > 1 && p[j - 1] == '*' && dp[0][j - 2];
 
        for (int i = 1; i <= m; ++i)
            for (int j = 1; j <= n; ++j) {
                if (p[j - 1] != '*')
                    dp[i][j] = dp[i - 1][j - 1] && (s[i - 1] == p[j - 1] || p[j-1] == '.');
                else
                    dp[i][j] = dp[i][j - 2] //如果* 代表不重复，即空字符
                    || dp[i - 1][j] && (s[i - 1] == p[j - 2] || '.' == p[j - 2]);//* 代表重复>=1次
            }
        return dp[m][n];
    }
};

参考： https://discuss.leetcode.com/topic/64627/concise-recursive-dp-solution-with-recurrence-relation-defined

参考： https://discuss.leetcode.com/topic/6183/my-concise-recursive-and-dp-solutions-with-full-explanation-in-c/2

来自为知笔记(Wiz)

posted @ 2016-11-29 21:21 copperface 阅读(225) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

10. Regular Expression Matching

回溯算法 Backtracking

代码

动态规划 Dynamic Programming

公告