HDU-1686-Oulipo KMP
Oulipo
Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 1245 Accepted Submission(s): 481
Problem Description
The
French author Georges Perec (1936–1982) once wrote a book, La
disparition, without the letter 'e'. He was a member of the Oulipo
group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Input
The
first line of the input file contains a single number: the number of
test cases to follow. Each test case has the following format:
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
Output
For
every test case in the input file, the output should contain a single
number, on a single line: the number of occurrences of the word W in the
text T.
Sample Input
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
Sample Output
1
3
0
3
0
该题与HDU-2087-剪花布条的差别在于该题是要求母串中最多出现多少次,而且字符是可以重复利用的,通过第二个字符就能看出来对吧。那么这里总结一下这类题目的解法。
普通的KMP算法:题目一般要求输出匹配满足的最小值,即一匹配完成就跳出。代码是 if( k<= lo&& j<= ls )。
剪布条:每次匹配成功,不及时跳出,而是使得j= 0,再继续匹配 代码是 if( k<= lo )。
该题:每次匹配成功,不及时跳出,而是使得j= next[j],即假设单前匹配不成功,使得已经匹配的部分尽量“重复利用”。
代码如下:
#include <stdio.h> #include <string.h> char o[1000005], s[10005]; int next[10005], ls, lo; void getnext( char *s, int *next ) { int k= 1, j= 0; while( k< ls ) { if( j== 0|| s[k]== s[j] ) { ++j, ++k; next[k]= j; } else { j= next[j]; } } } bool getstr( char *str ) { char c; /* */ int i= 0; while( c= getchar( ) ) { if( c== 10 ) { str[i++]= '\0'; break; } else if( c>= 'A'&& c<= 'Z' ) { str[i++]= c; } else { return false; } } return true; } int kmp( char *o, char *s, int *next ) { int k= 0, j= 0, ans= 0; while( k<= lo ) { if( j== 0|| o[k]== s[j] ) { if( j== ls ) { ans++; j= next[j]; continue; } ++j, ++k; } else { j= next[j]; } } return ans; } int main( ) { int T; scanf( "%d", &T ); getchar( ); while( T-- ) { getstr( s+ 1 ); getstr( o+ 1 ); ls= strlen( s+ 1 ); lo= strlen( o+ 1 ); getnext( s, next ); printf( "%d\n", kmp( o, s, next ) ); } return 0; }