Regular Number 字符串匹配算法 Shift_and
Using regular expression to define a numeric string is a very common thing. Generally, use the shape as follows: (0|9|7) (5|6) (2) (4|5) Above regular expression matches 4 digits:The first is one of 0,9 and 7. The second is one of 5 and 6. The third is 2. And the fourth is one of 4 and 5. The above regular expression can be successfully matched to 0525, but it cannot be matched to 9634. Now,giving you a regular expression like the above formula,and a long string of numbers,please find out all the substrings of this long string that can be matched to the regular expression. Input It contains a set of test data.The first line is a positive integer N (1 ≤ N ≤ 1000),on behalf of the regular representation of the N bit string.In the next N lines,the first integer of the i-th line is ai(1≤ai≤10)ai(1≤ai≤10),representing that the i-th position of regular expression has aiai numbers to be selected.Next there are aiai numeric characters. In the last line,there is a numeric string.The length of the string is not more than 5 * 10^6. Output Output all substrings that can be matched by the regular expression. Each substring occupies one line Sample Input 4 3 0 9 7 2 5 7 2 2 5 2 4 5 09755420524 Sample Output 9755 7554 0524
适用于t[]串长度较小的情况,利用位运算一般比KMP算法快两倍以上。
用D来记录前缀的匹配情况,要使用Shift 算法,需要一个辅助表B。B 是一个字典,key 是问题域字符集中的每个字符,value 是一个n 位无符号整数,记录该字符在模式串T 的哪些位置出现。
由于D【j】表示的是T[0..J]是否是S[0...i]的后缀,所以只有当D[j-1]==1而且S[i]==T[j]的情况下,D[j]才等于1,同时将最低位设置为1,这样产生从当前位作为第一位的解。
,Shift-And 算法实现
Shift-And 匹配过程代码:
由于位运算在计算机中可以并行进行,每次循环的执行是常数时间的,所以上面代码段的复杂度是 O(m)。
3,辅助表 B
上面没有提到如何得到辅助表B。很简单,只要获得模式串T 中每个字符出现的位置。
#include<iostream> #include<cstdio> #include<cmath> #include<cstring> #include<sstream> #include<algorithm> #include<queue> #include<deque> #include<iomanip> #include<vector> #include<cmath> #include<map> #include<stack> #include<set> #include<memory> #include<list> #include<bitset> #include<string> #include<functional> using namespace std; typedef long long LL; typedef unsigned long long ULL; const int MAXN = 5e6 + 9; #define L 1009 #define INF 1000000009 #define eps 0.00000001 #define MOD 1000 bitset<1009> B[256], D; char str[MAXN]; int main() { int n, tmp, t; scanf("%d", &n); for (int i = 0; i < n; i++) { scanf("%d", &tmp); while (tmp--) { scanf("%d", &t); B[t].set(i); } } getchar(); gets(str); int l = strlen(str); for (int i = 0; i < l; i++) { D = (D << 1).set(0)&B[str[i] - '0']; if (D[n - 1]) { char ch = str[i + 1]; str[i + 1] = '\0'; puts(str + i - n + 1); str[i + 1] = ch; } } }