Leetcode OJ: Wildcard Matching

Wildcard Matching

Implement wildcard pattern matching with support for '?' and '*'.

'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "c*a*b") → false

是思路与正则匹配类似，个人感觉是比正则匹配那题要容易一些的，但一做才发现是个坑。

不过有了正则匹配那题当铺垫，还是能做出来的，LZ实在地把正则那题的思路搬过来（参考http://www.cnblogs.com/flowerkzj/p/3726667.html）

稍微改改就提交了，当然也是先用递归的方案。以下代码通不过：

 1 class Solution {
 2 public:
 3     bool isMatch(const char *s, const char *p) {
 4         if (*p == 0) return *s == 0;
 5         if (*p != '*') {
 6             return (*p == *s || (*p == '?' && *s != 0)) && isMatch(s+1, p+1);
 7         }
 8         while (*s != 0) {
 9             if (isMatch(s, p+1))
10                 return true;
11             s++;
12         }
13         return isMatch(s, p+1);
14     }
15 };

View Code

果断TLE，提示错误样例是

Last executed input:"aaabbbaabaaaaababaabaaabbabbbbbbbbaabababbabbbaaaaba", "a*******b"

看到那些*于是就合并呗，又是一个通不过的代码：

 1 class Solution {
 2 public:
 3     bool isMatch(const char *s, const char *p) {
 4         if (*p == 0) return *s == 0;
 5         if (*p != '*') {
 6             return (*p == *s || (*p == '?' && *s != 0)) && isMatch(s+1, p+1);
 7         }
 8         while (p[1] == '*')
 9             ++p;
10         while (*s != 0) {
11             if (isMatch(s, p+1))
12                 return true;
13             ++s;
14         }
15         return isMatch(s, p+1);
16     }
17 };

View Code

还是TLE，提示错误样例是老长的一段s与p，好吧，只能出大杀器，动态规划了！

还是稍稍改改正则匹配那题的动态规划的代码，但还是过不了，再贴一个过不了的代码：

 1 class Solution {
 2 public:
 3     bool isMatchSingle(char s, char p) {
 4         return (s == p || p == '?');
 5     }
 6     bool isMatch(const char *s, const char *p) {
 7         int slen = strlen(s);
 8         int plen = strlen(p);
 9         
10         vector<int> dp1(slen + 1, false), dp2(slen + 1, false);
11         vector<int> *pre = &dp1, *cur = &dp2;
12         dp1[0] = true;
13         
14         while (*p != 0) {
15             cur->assign(slen + 1, false);
16             if (*p != '*') {
17                 for (int i = 0; i < slen; ++i) {
18                     (*cur)[i + 1] = ((*pre)[i] && isMatchSingle(s[i], p[0]));
19                 }
20             } else {
21                 (*cur)[0] = (*pre)[0];
22                 while (p[1] == '*')
23                     ++p;
24                 for (int i = 0; i < slen; ++i) {
25                     (*cur)[i + 1] = (*pre)[i + 1] || (*pre)[i] || (*cur)[i];
26                 }
27             }
28             ++p;
29             swap(cur, pre);
30         }
31         return (*pre)[slen];
32     }
33 };

View Code

还是TLE，提示错误样例是s是一个超长的a，p也是一个超长的a与*的结合。

这就让我要重新思考下了，通配符里没有像正则那样的a*这样的表达，要匹配多少个a就重复出现多少次！这样就造成p会特别长！

但我看这个样例的特点就是无限个重复，于是我就想到了是不是可以把p压缩一下，p的匹配单元不再是单个字符，按连续重复出现的子串为基本匹配单元。

比如说p = "*aaaabbb*"，则匹配单元为{"*", 4个"a", 3个"b", "*"}，为原来的9个匹配单元变成了4个。于是按这思路把原来的改改，就有如下代码了。

 1 class Solution {
 2 public:
 3     bool isMatch(const char *s, const char *p) {
 4         int slen = strlen(s);
 5         int plen = strlen(p);
 6         
 7         // 记录当前字符及以后会连续出现多少次
 8         vector<int> scount(slen, 1);
 9         vector<int> pcount(plen, 1);
10         
11         for (int i = slen - 2; i >= 0; --i) {
12             if (s[i] == s[i + 1])
13                 scount[i] = scount[i + 1] + 1;
14         }
15         
16         for (int i = plen - 2; i >= 0; --i) {
17             if (p[i] == p[i + 1])
18                 pcount[i] = pcount[i + 1] + 1;
19         }
20         
21         
22         vector<int> dp1(slen + 1, false), dp2(slen + 1, false);
23         vector<int> *pre = &dp1, *cur = &dp2;
24         dp1[0] = true;
25         int pi = 0;
26         while (p[pi] != 0) {
27             cur->assign(slen + 1, false);
28             if (p[pi] != '*') {
29                 // 这里更新的都是(*cur)[i + pcount[pi]]的值
30                 if (p[pi] != '?') { // 判断是否连续匹配pcount[pi]个p[pi]
31                     for (int i = 0; i + pcount[pi] <= slen; ++i) {
32                         (*cur)[i + pcount[pi]] = (*pre)[i] && p[pi] == s[i] && scount[i] >= pcount[pi];
33                     } 
34                 } else { // 当是'?'时，只需要判断s还剩下足够的长度与匹配pcount[pi]的长度
35                     for (int i = 0; i + pcount[pi] <= slen; ++i) {
36                         (*cur)[i + pcount[pi]] = (*pre)[i] && s[i + pcount[pi] - 1] != 0;
37                     }
38                 }
39             } else {
40                 // 是'*'时就思路与正则的一致的
41                 (*cur)[0] = (*pre)[0];
42                 for (int i = 0; i < slen; ++i) {
43                     (*cur)[i + 1] = (*pre)[i + 1] || (*pre)[i] || (*cur)[i];
44                 }
45             }
46             // 这里注意下一个pi是前进了pcount[pi]
47             pi += pcount[pi];
48             // 别忘了交换
49             swap(cur, pre);
50         }
51         return (*pre)[slen];
52     }
53 };

这段代码是290ms过的，感觉还有点慢，先说说这里的缺点吧

时间最坏的情况还是不可避免的O(mn)，空间复杂度为O(n)，m是p的长度，n是s的长度，每层都要遍历一次s。

1. 对于'*'其实在检查到上一层的第一个true后，以后的就一定是true，不需要再计算了。

2. 对于'?'或者是其它非'*'其实不需要遍历字符串，只需要遍历上一层为true的就可以了。

按以上缺点，引进了队列把上一层的true都记录下来，便于当前层的计算，提交了一记，268ms过了。没甚提升。

实在不太甘心，但又已经没思路了。于是。。

搜了下有没别的解法，然后就发现了小磊哥的http://fisherlei.blogspot.com/2013/01/leetcode-wildcard-matching.html（可能需要FQ）

代码写得很简短，我就format一下，然后贴过来吧，个人跑了下是120ms过的。

 1 class Solution {
 2 public:
 3     bool isMatch(const char *s, const char *p) {
 4         bool star = false;
 5         const char *str, *ptr;  
 6         for(str = s, ptr =p; *str!='\0'; str++, ptr++)  { 
 7             switch(*ptr) { 
 8                 case '?':  break;  
 9                 case '*': 
10                     star =true; 
11                     s=str, p =ptr; 
12                     while(*p=='*') 
13                         p++;  
14                     if(*p == '\0') // 如果'*'之后，pat是空的，直接返回true    
15                         return true;    
16                     str = s-1;    
17                     ptr = p-1;    
18                     break;    
19                 default: {    
20                     if(*str != *ptr) {    
21                         // 如果前面没有'*'，则匹配不成功    
22                         if(!star)    
23                             return false;    
24                         s++;    
25                         str = s-1;    
26                         ptr = p-1;    
27                     }    
28                 }    
29             }    
30         }    
31         while(*ptr== '*')    
32             ptr++;    
33         return (*ptr == '\0');    
34     }
35 };

初看真没看太懂，debug一遍才知道了个究竟，其实思路是贪心。

这里先要看到这个通配符匹配与正则那道的最大区别，这里只有'*'是变长匹配的，其它都是单字符的匹配。于是在匹配的模板上其实只有两种

1. '*'：任意长度的匹配

2. '?'或任意非'*'字符：单字符匹配。

再联系本人动态规划的最终思路，用到了整块匹配的思路，但仅仅局限于连续出现的子串，是不是还可以拓展一下呢？

针对刚说过的通配符的分类方法，其实就可以把连续出现的单字符匹配模板作为一个整体匹配。

明白这个之后说说我对以上代码的理解吧。

这段代码中心思想是贪心，而匹配的单元分为两种

1. 连续的'*'子串，意义上其实是单个'*'

2. 连续的单字符匹配的子串

那怎么个贪心呢？

先看个例子吧：

s="aaabcaaabcaaabc", p="*bc*bc"

还是以动态规划的思路去分析吧，只是因为这符合贪心的思路，可以不需要整个DP都进行了。

p的匹配单元分别为{"*", "bc", "*", "bc\0"}（不要忘了最后的一个'\0'）

p[1]: "*"，按动态规划的数组，整行都会是true

p[2]:"bc"，按动态规划走过来，就会有三个地方是符合的

　　aaabcaaabcaaabc

好了，这里就开始考虑，能不能贪心了。

而在这里贪心的意思就是，我们匹配到了第一个"bc"时，是不是还需要再匹配下一个"bc"。

因为我们把p分成了'*'连续与非'*'连续的两种子串，那么"bc"的下一个子串若存在的话，一定就是"*"。

有这样的前提，我们来看看如果：

1. 第一个"bc"不是最优解时，最优解是第二或者第三个，"*"能匹配上

2. 第一个"bc"是最优解时，那没问题，我们可以继续匹配下一个了

Good Job. 这样就可以直接用贪心了！

于是到这里就只匹配第一个，往后p也只匹配后面的部分。

aaabcaaabcaaabc

p[3]:"*"，都能匹配上，先放着

p[4]:"bc\0"，只有一个匹配

aaabcaaabcaaabc

此时，p到结尾，s到结尾了，成功！

在这过程中，其实只要一段出现不匹配的情况，就可以直接返回false了。

perfect！

小磊哥写得很漂亮，不需要每次都遍历s，子串匹配时，只需要遍历部分，而空间复杂度是O(1)。

posted @ 2014-05-15 13:18 flowerkzj 阅读(206) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

flowerkzj

Leetcode OJ: Wildcard Matching

Wildcard Matching

公告