Crazy Search POJ - 1200 (字符串哈希hash)
Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle.
Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text.
As an example, consider N=3, NC=4 and the text "daababac". The different substrings of size 3 that can be found in this text are: "daa"; "aab"; "aba"; "bab"; "bac". Therefore, the answer should be 5.
Input
The first line of input consists of two numbers, N and NC, separated by exactly one space. This is followed by the text where the search takes place. You may assume that the maximum number of substrings formed by the possible set of characters does not exceed 16 Millions.
Output
The program should output just an integer corresponding to the number of different substrings of size N found in the given text.
Sample Input
3 4
daababac
Sample Output
5
Hint
Huge input,scanf is recommended.
题意:
给你一个字符串,告诉你这个字符串中字符的种类数是nc个,求长度为n的连续子串有多少种?
思路:
字符串哈希题,
因为告诉了字符的种类数,所以我们可以把字符串转为nc进制的数字表示。
代码有注释。
细节见代码:
#include <iostream>
#include <cstdio>
#include <cstring>
#include <algorithm>
#include <cmath>
#include <queue>
#include <stack>
#include <map>
#include <set>
#include <vector>
#include <iomanip>
#define ALL(x) (x).begin(), (x).end()
#define rt return
#define dll(x) scanf("%I64d",&x)
#define xll(x) printf("%I64d\n",x)
#define sz(a) int(a.size())
#define all(a) a.begin(), a.end()
#define rep(i,x,n) for(int i=x;i<n;i++)
#define repd(i,x,n) for(int i=x;i<=n;i++)
#define pii pair<int,int>
#define pll pair<long long ,long long>
#define gbtb ios::sync_with_stdio(false),cin.tie(0),cout.tie(0)
#define MS0(X) memset((X), 0, sizeof((X)))
#define MSC0(X) memset((X), '\0', sizeof((X)))
#define pb push_back
#define mp make_pair
#define fi first
#define se second
#define eps 1e-6
#define gg(x) getInt(&x)
#define chu(x) cout<<"["<<#x<<" "<<(x)<<"]"<<endl
using namespace std;
typedef long long ll;
ll gcd(ll a, ll b) {return b ? gcd(b, a % b) : a;}
ll lcm(ll a, ll b) {return a / gcd(a, b) * b;}
ll powmod(ll a, ll b, ll MOD) {ll ans = 1; while (b) {if (b % 2)ans = ans * a % MOD; a = a * a % MOD; b /= 2;} return ans;}
inline void getInt(int* p);
const int maxn = 160000010;
const int inf = 0x3f3f3f3f;
/*** TEMPLATE CODE * * STARTS HERE ***/
bool vis[500];
bool b[maxn];
int n, nc;
char str[maxn];
int id[500];
ll p = 1ll;
int main() {
//freopen("D:\\code\\text\\input.txt","r",stdin);
//freopen("D:\\code\\text\\output.txt","w",stdout);
scanf("%d %d", &n, &nc);
scanf("%s", str + 1);
int len = strlen(str + 1);
repd(i, 1, len) {
vis[str[i]] = 1;// 标记哪些字符出现了。
}
int num = 0;
repd(i, 0, 256) {
if (vis[i]) {
id[i] = num++;// 把出现了的字符改为 0~nc 之间的数
}
}
repd(i, 1, n-1) {
p = p * nc;// 因为长度是固定的,即长度为n,所以我们就可以不维护 nc 的i次幂数组了,只维护固定的p即可
}
ll x = 0ll;
repd(i, 1, n) {
x = x * nc + id[str[i]];// 先处理出第一个长度为n的子串
}
b[x] = 1;
ll ans = 1ll;
// chu(p);
repd(i, n + 1, len) {
x = x - id[str[i - n]] * p;// 减去最前端字符对hash值的influence
x = x * nc + id[str[i]];// 当前数值进位,再加上尾部新来的字符串的id值
if (b[x] == 0) {
b[x] = 1;// 如果没出现过就答案加1,并且标记为出现过。
ans++;
}
}
printf("%lld\n", ans);
return 0;
}
inline void getInt(int* p) {
char ch;
do {
ch = getchar();
} while (ch == ' ' || ch == '\n');
if (ch == '-') {
*p = -(getchar() - '0');
while ((ch = getchar()) >= '0' && ch <= '9') {
*p = *p * 10 - ch + '0';
}
} else {
*p = ch - '0';
while ((ch = getchar()) >= '0' && ch <= '9') {
*p = *p * 10 + ch - '0';
}
}
}