[poj1200]Crazy Search(hash)
Crazy Search
Time Limit: 1000MS | Memory Limit: 65536K | |
Total Submissions: 26713 | Accepted: 7449 |
Description
Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle.
Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text.
As an example, consider N=3, NC=4 and the text "daababac". The different substrings of size 3 that can be found in this text are: "daa"; "aab"; "aba"; "bab"; "bac". Therefore, the answer should be 5.
Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text.
As an example, consider N=3, NC=4 and the text "daababac". The different substrings of size 3 that can be found in this text are: "daa"; "aab"; "aba"; "bab"; "bac". Therefore, the answer should be 5.
Input
The first line of input consists of two numbers, N and NC, separated by exactly one space. This is followed by the text where the search takes place. You may assume that the maximum number of substrings formed by the possible set of characters does not exceed 16 Millions.
Output
The program should output just an integer corresponding to the number of different substrings of size N found in the given text.
Sample Input
3 4 daababac
Sample Output
5
Hint
Huge input,scanf is recommended.
(养成翻译的好习惯)给定字符串,其中字符集大小不超过nc,求其中长度为n的不同的子串个数
第一不要dp做多了把子串看成不连续的
子串就是源字符串连续的子序列!这点看题解才发现……想了半天也想不出来
接下来就好办多了,枚举每一位即可
但问题又来了,如何去重?kmp不行,ac自动机没试过不会,但目测仍然超时
接下来由rk-hash实力打脸kmp!
o(len)的速度没的说,而且已知hash值的话只用o(1)就能办到
rk-hash是什么?把字符串看成一个整数的高精度即可(请自行百度)
但算出哈希还不够,hash值应该会很大,所以要再用一次哈希,模一个素数,模拟链表处理冲突
这样大概空间时间就差不多了
但!but!
“字符集大小”并不意味着按照abcde的顺序给出!
所以单个字符对应的hash值还得自己做出来(具体看代码)
1 //子串还必须是连续的(不然无解了)
2 #include<stdio.h>
3 #include<stdlib.h>
4 #include<string.h>
5 int base,len;
6 const int mod=100007;
7 char read[16000000];
8 int ex[128]={0};//单个字符值
9 int hash[mod+1][100]={{0}};
10 int get(int pos){
11 int ans=0;
12 for(int i=pos;i<pos+len;i++){
13 ans*=base;
14 ans+=ex[read[i]];
15 }
16 int tmp=ans%mod;
17 if(hash[tmp][0])for(int i=1;i<=hash[tmp][0];i++)if(hash[tmp][i]==ans)return 0;
18 hash[tmp][0]++;
19 hash[tmp][hash[tmp][0]]=ans;
20 return 1;
21 }
22 int main(){
23 scanf("%d %d\n%s",&len,&base,read);
24 int le=strlen(read);
25 for(int i=0,j=0;i<le;i++){
26 if(!ex[read[i]])ex[read[i]]=++j;
27 if(j==base)break;//很简洁地处理字符对应关系
28 }
29 int ans=0;
30 for(int i=0;i<=le-len;i++)ans+=get(i);
31 printf("%d\n",ans);
32 return 0;
2 #include<stdio.h>
3 #include<stdlib.h>
4 #include<string.h>
5 int base,len;
6 const int mod=100007;
7 char read[16000000];
8 int ex[128]={0};//单个字符值
9 int hash[mod+1][100]={{0}};
10 int get(int pos){
11 int ans=0;
12 for(int i=pos;i<pos+len;i++){
13 ans*=base;
14 ans+=ex[read[i]];
15 }
16 int tmp=ans%mod;
17 if(hash[tmp][0])for(int i=1;i<=hash[tmp][0];i++)if(hash[tmp][i]==ans)return 0;
18 hash[tmp][0]++;
19 hash[tmp][hash[tmp][0]]=ans;
20 return 1;
21 }
22 int main(){
23 scanf("%d %d\n%s",&len,&base,read);
24 int le=strlen(read);
25 for(int i=0,j=0;i<le;i++){
26 if(!ex[read[i]])ex[read[i]]=++j;
27 if(j==base)break;//很简洁地处理字符对应关系
28 }
29 int ans=0;
30 for(int i=0;i<=le-len;i++)ans+=get(i);
31 printf("%d\n",ans);
32 return 0;
33 }