CodeForce Round 578 Div 2 E. Compress Words
E. Compress Words
Problem
Amugae has a sentence consisting of \(n\) words. He want to compress this sentence into one word. Amugae doesn't like repetitions, so when he merges two words into one word, he removes the longest prefix of the second word that coincides with a suffix of the first word. For example, he merges "sample" and "please" into "samplease".
Amugae will merge his sentence left to right (i.e. first merge the first two words, then merge the result with the third word and so on). Write a program that prints the compressed word after the merging process ends.
Input
The first line contains an integer \(n (1 \leq n \leq 10^5)\), the number of the words in Amugae's sentence.
The second line contains \(n\) words separated by single space. Each words is non-empty and consists of uppercase and lowercase English letters and digits \(('A', 'B', ..., 'Z', 'a', 'b', ..., 'z', '0', '1', ..., '9')\). The total length of the words does not exceed \(10^6\).
Output
In the only line output the compressed word after the merging process ends as described in the problem.
Examples
input
5
I want to order pizza
output
Iwantorderpizza
input
5
sample please ease in out
output
sampleaseinout
想法
先来想两个单词的连接 , 比如qwer和asdf这两个,没有重复部分,直接变为qwerasdf;qwer和erty有重复的er,则变成qweryt。
两个单词的做完了,那么来考虑多个单词asdf,qwer,erty,则结果是asdfqwerty;
很显然只有相邻的单词会有互相影响,那么这题就把大问题——n个单词连起来——转换为小问题——两两连接
对于短的单词直接BF算法暴力匹配是可以的,但是考虑到这题数据范围,显然要用到KMP(其实字符串哈希也行但是我不会)
计算KMP的时候可以进行优化,只需要比较两个串中短串的长度即可。
比如我有qwertyuiopasdfghjklzxcv和zxcvbnm,长度分别是23和7,那么直接比较长串的后7个和短串;
再比如我有90qwer和qwertyuiopasdfghjklzxcv那么我只要比较长串的前6个和90qwer。
官方题解
Denote the words from left to right as \(W_1,W_2,W_3,⋯,W_n\).
If we define string \(F(k)\) as the result of merging as described in the problem \(k\) times, we can get \(F(k+1)\) by the following process:
If length of \(F(k)\) \(>\) length of \(W_{k+1}\)
Assume the length of \(F(K)\) is \(x\), and the length of \(W_{k+1}\) is \(y\). Construct the string $c=W_{k+1}+F(k) [ x−y...x ] $ ( * \(s[x..y]\) for string \(s\) is the substring from index \(x\) to \(y\))
Get the KMP failure function from string c.
We can get maximum overlapped length of \(W_{k+1}\)'s prefix and \(F(k)\)'s suffix from this function. Suppose the last element of the failure function smaller than the length of \(W_{k+1}\) is \(z\). Then the longest overlapped length of \(F(k)\)'s suffix and \(W_{k+1}\)'s prefix is \(min(z,y)\). Let \(L=min(z,y)\).
Then, \(F(k+1)=F(k)+W_{k+1}[L+1...y]\)
Otherwise
Construct \(c\) as \(W_{k+1}[1...x]+F(k)\). We can get \(F(k+1)\) from the same process described in 1.
In this process, we can get \(F(k+1)\) from \(F(k)\) in time complexity \(O(len(W_{k+1}))\). So, we can get \(F(N)\) (the answer of this problem) in \(O(len(W_1)+len(W_2)+⋯+len(W_N))\).
The Code Of My Program
#include<cstdio>
#include<iostream>
#include<fstream>
#include<algorithm>
#include<functional>
#include<cstring>
#include<string>
#include<cstdlib>
#include<iomanip>
#include<numeric>
#include<cctype>
#include<cmath>
#include<ctime>
#include<queue>
#include<stack>
#include<list>
#include<set>
#include<map>
using namespace std;
const int N = 10000002;
int nxt[N];
char S[N], T[N],str1[N],str2[N];
int slen, tlen;
void getNext()
{
int j, k;
j = 0;
k = -1;
nxt[0] = -1;
while(j < tlen)
if(k == -1 || T[j] == T[k])
{
nxt[++j] = ++k;
if (T[j] != T[k]) //KMP优化,加速得到next数组
nxt[j] = k;
}
else
k = nxt[k];
}
/*
返回模式串T在主串S中首次出现的位置
返回的位置是从0开始的
*/
int KMP_Index()
{
int i = max(slen-tlen,0), j = 0;
getNext();
while(i < slen && j < tlen)
{
if(j == -1 || S[i] == T[j])
{
i++;
j++;
}
else
j = nxt[j];
}
return j;//T串前缀在主串S可以匹配的最远的可匹配的位置
}
int main()
{
int n;
scanf("%d",&n);
scanf("%s",S);
slen=strlen(S);
for(int i=1; i<n; i++)
{
scanf("%s",T);
tlen = strlen(T);
int pos=KMP_Index();
//printf(" pos = %d \n",pos);
//cout<<pos<<endl;
for(int j=pos;j<tlen;j++)
{
S[slen]=T[j];
slen++;
}
}
printf("%s",S);
return 0;
}