HDU 3695 / POJ 3987 Computer Virus on Planet Pandora(AC自动机)(2010 Asia Fuzhou Regional Contest)
Description
Aliens on planet Pandora also write computer programs like us. Their programs only consist of capital letters (‘A’ to ‘Z’) which they learned from the Earth. On planet Pandora, hackers make computer virus, so they also have anti-virus software. Of course they learned virus scanning algorithm from the Earth. Every virus has a pattern string which consists of only capital letters. If a virus’s pattern string is a substring of a program, or the pattern string is a substring of the reverse of that program, they can say the program is infected by that virus. Give you a program and a list of virus pattern strings, please write a program to figure out how many viruses the program is infected by.
Input
There are multiple test cases. The first line in the input is an integer T ( T <= 10) indicating the number of test cases.
For each test case:
The first line is a integer n( 0 < n <= 250) indicating the number of virus pattern strings.
Then n lines follows, each represents a virus pattern string. Every pattern string stands for a virus. It’s guaranteed that those n pattern strings are all different so there are n different viruses. The length of pattern string is no more than 1,000 and a pattern string at least consists of one letter.
The last line of a test case is the program. The program may be described in a compressed format. A compressed program consists of capital letters and “compressors”. A “compressor” is in the following format:
[qx]
q is a number( 0 < q <= 5,000,000)and x is a capital letter. It means q consecutive letter xs in the original uncompressed program. For example, [6K] means ‘KKKKKK’ in the original program. So, if a compressed program is like:
AB[2D]E[7K]G
It actually is ABDDEKKKKKKKG after decompressed to original format. The length of the program is at least 1 and at most 5,100,000, no matter in the compressed format or after it is decompressed to original format.
For each test case:
The first line is a integer n( 0 < n <= 250) indicating the number of virus pattern strings.
Then n lines follows, each represents a virus pattern string. Every pattern string stands for a virus. It’s guaranteed that those n pattern strings are all different so there are n different viruses. The length of pattern string is no more than 1,000 and a pattern string at least consists of one letter.
The last line of a test case is the program. The program may be described in a compressed format. A compressed program consists of capital letters and “compressors”. A “compressor” is in the following format:
[qx]
q is a number( 0 < q <= 5,000,000)and x is a capital letter. It means q consecutive letter xs in the original uncompressed program. For example, [6K] means ‘KKKKKK’ in the original program. So, if a compressed program is like:
AB[2D]E[7K]G
It actually is ABDDEKKKKKKKG after decompressed to original format. The length of the program is at least 1 and at most 5,100,000, no matter in the compressed format or after it is decompressed to original format.
Output
For each test case, print an integer K in a line meaning that the program is infected by K viruses.
题目大意:给n个模式串,一个长串,问有多少个模式串出现在了长串中(翻转后的模式串也算)。
思路:多模式串匹配,裸的AC自动机题。对于一个模式串可能会是另一个模式串的子串的问题,只要每走一步之后把失配指针都走一下即可。不过据说都走一下会超时,所以要给每个点做一个标记,标记这个点的走过了,不用再走一次了。
PS:用指针的孩纸不回收内存会MLE,亲测……
PS2:题目好像没说一个串不能是另一个串的子串,但是我之前没考虑子串(只考虑了一个串是另一个串的后缀)依然AC了……
代码(HDU 1593MS/POJ 1671MS):
1 #include <cstdio> 2 #include <cstring> 3 #include <iostream> 4 #include <algorithm> 5 #include <queue> 6 using namespace std; 7 8 const int MAXN = 5000010; 9 const int MAX = 1010; 10 11 struct Node { 12 Node *go[26], *fail; 13 int src; 14 bool mark; 15 Node(int _src) { 16 src = _src; 17 fail = 0; 18 mark = 0; 19 memset(go, 0, sizeof(go)); 20 } 21 ~Node() { 22 for(int i = 0; i < 26; ++i) 23 if(go[i]) delete go[i]; 24 } 25 }; 26 27 void build(Node *root, char *str, int id) { 28 Node *p = root; 29 for(int i = 0; str[i]; ++i) { 30 int index = str[i] - 'A'; 31 if(!p->go[index]) p->go[index] = new Node(-1); 32 p = p->go[index]; 33 } 34 p->src = id; 35 } 36 37 void makeFail(Node *root) { 38 queue<Node*> que;que.push(root); 39 while(!que.empty()) { 40 Node *tmp = que.front(); que.pop(); 41 for(int i = 0; i < 26; ++i) { 42 if(!tmp->go[i]) continue; 43 if(tmp == root) tmp->go[i]->fail = root; 44 else { 45 Node *p = tmp->fail; 46 while(p) { 47 if(p->go[i]) { 48 tmp->go[i]->fail = p->go[i]; 49 break; 50 } 51 p = p->fail; 52 } 53 if(!p) tmp->go[i]->fail = root; 54 } 55 que.push(tmp->go[i]); 56 } 57 } 58 root->fail = root; 59 } 60 61 bool vis[1010]; 62 63 void solve(Node *root, char *str) { 64 Node *tmp = root; 65 for(char *now = str; *now; ++now) { 66 int index = *now - 'A'; 67 while(tmp != root && !tmp->go[index]) tmp = tmp->fail; 68 if(tmp->go[index]) tmp = tmp->go[index]; 69 Node *q = tmp; 70 while(q != root && !q->mark) { 71 q->mark = true; 72 if(q->src > 0) vis[q->src] = true; 73 q = q->fail; 74 } 75 } 76 } 77 78 int make_ans(int n) { 79 int ret = 0; 80 for(int i = 1; i <= n; ++i) 81 ret += vis[i]; 82 return ret; 83 } 84 85 void trans(char *ss, char *tt) { 86 for(int i = 0; ss[i]; ++i) { 87 if(isalpha(ss[i])) *tt++ = ss[i]; 88 else { 89 ++i; 90 int t = 0; 91 while(isdigit(ss[i])) t = t * 10 + ss[i] - '0', ++i; 92 for(int j = 0; j < t; ++j) *tt++ = ss[i]; 93 ++i; 94 } 95 } 96 *tt = 0; 97 } 98 99 char ss[MAXN], s[MAXN]; 100 char tmp[MAX]; 101 102 int main() { 103 int T, n; 104 scanf("%d", &T); 105 while(T--) { 106 scanf("%d", &n); 107 Node *root = new Node(-1); 108 for(int i = 1; i <= n; ++i) { 109 scanf("%s", tmp); 110 build(root, tmp, i); 111 reverse(tmp, tmp + strlen(tmp)); 112 build(root, tmp, i); 113 } 114 makeFail(root); 115 scanf("%s", ss); 116 trans(ss, s); 117 //printf("%s\n", s); 118 memset(vis, 0, sizeof(vis)); 119 solve(root, s); 120 printf("%d\n", make_ans(n)); 121 delete root; 122 } 123 }