Regular Number 字符串匹配算法 Shift_and

Using regular expression to define a numeric string is a very common thing. Generally, use the shape as follows: 
(0|9|7) (5|6) (2) (4|5) 
Above regular expression matches 4 digits:The first is one of 0,9 and 7. The second is one of 5 and 6. The third is 2. And the fourth is one of 4 and 5. The above regular expression can be successfully matched to 0525, but it cannot be matched to 9634. 
Now,giving you a regular expression like the above formula,and a long string of numbers,please find out all the substrings of this long string that can be matched to the regular expression. 
Input
It contains a set of test data.The first line is a positive integer N (1 ≤ N ≤ 1000),on behalf of the regular representation of the N bit string.In the next N lines,the first integer of the i-th line is ai(1≤ai≤10)ai(1≤ai≤10),representing that the i-th position of regular expression has aiai numbers to be selected.Next there are aiai numeric characters. In the last line,there is a numeric string.The length of the string is not more than 5 * 10^6.
Output
Output all substrings that can be matched by the regular expression. Each substring occupies one line
Sample Input
4
3 0 9 7
2 5 7
2 2 5
2 4 5
09755420524
Sample Output
9755
7554
0524

 

 

适用于t[]串长度较小的情况,利用位运算一般比KMP算法快两倍以上。

用D来记录前缀的匹配情况,要使用Shift 算法,需要一个辅助表B。B 是一个字典,key 是问题域字符集中的每个字符,value 是一个n 位无符号整数,记录该字符在模式串T 的哪些位置出现。

由于D【j】表示的是T[0..J]是否是S[0...i]的后缀,所以只有当D[j-1]==1而且S[i]==T[j]的情况下,D[j]才等于1,同时将最低位设置为1,这样产生从当前位作为第一位的解。

  ,Shift-And 算法实现
Shift-And 匹配过程代码:

 


由于位运算在计算机中可以并行进行,每次循环的执行是常数时间的,所以上面代码段的复杂度是 O(m)。

3,辅助表 B
上面没有提到如何得到辅助表B。很简单,只要获得模式串T 中每个字符出现的位置。

 
#include<iostream>
#include<cstdio>
#include<cmath>
#include<cstring>
#include<sstream>
#include<algorithm>
#include<queue>
#include<deque>
#include<iomanip>
#include<vector>
#include<cmath>
#include<map>
#include<stack>
#include<set>
#include<memory>
#include<list>
#include<bitset>
#include<string>
#include<functional>

using namespace std;
typedef long long LL;
typedef unsigned long long ULL;
const int MAXN = 5e6 + 9;
#define L 1009
#define INF 1000000009
#define eps 0.00000001
#define MOD 1000
bitset<1009> B[256], D;
char str[MAXN];
int main()
{
    int n, tmp, t;
    scanf("%d", &n);
    for (int i = 0; i < n; i++)
    {
        scanf("%d", &tmp);
        while (tmp--)
        {
            scanf("%d", &t);
            B[t].set(i);
        }
    }
    getchar();
    gets(str);
    int l = strlen(str);
    for (int i = 0; i < l; i++)
    {
        D = (D << 1).set(0)&B[str[i] - '0'];
        if (D[n - 1])
        {
            char ch = str[i + 1];
            str[i + 1] = '\0';
            puts(str + i - n + 1);
            str[i + 1] = ch;
        }
    }
}

 

posted @ 2017-08-16 14:47  joeylee97  阅读(503)  评论(0编辑  收藏  举报