G - Compress Strings
G - Compress Strings
Problem Statement
You are given $N$ strings $S_1, S_2, \ldots, S_N$.
Find the minimum length of a string that contains all these strings as substrings.
Here, a string $S$ contains a string $T$ as a substring if $T$ can be obtained by deleting zero or more characters from the beginning and zero or more characters from the end of $S$.
Constraints
- $N$ is an integer.
- $1 \leq N \leq 20$
- $S_i$ is a string consisting of lowercase English letters whose length is at least $1$.
- The total length of $S_1, S_2, \dots, S_N$ is at most $2\times 10^5$.
Input
The input is given from Standard Input in the following format:
$N$
$S_1$
$S_2$
$\vdots$
$S_N$
Output
Print the answer as an integer.
Sample Input 1
3
snuke
kensho
uk
Sample Output 1
9
The string snukensho
of length $9$ contains all of $S_1$, $S_2$, and $S_3$ as substrings.
Specifically, the first to fifth characters of snukensho
correspond to $S_1$, the fourth to ninth correspond to $S_2$, and the third to fourth correspond to $S_3$.
No shorter string contains all of $S_1$, $S_2$, and $S_3$ as substrings. Thus, the answer is $9$.
Sample Input 2
3
abc
abc
arc
Sample Output 2
6
Sample Input 3
6
cmcmrcc
rmrrrmr
mrccm
mmcr
rmmrmrcc
ccmcrcmcm
Sample Output 3
27
解题思路
首先如果存在某个字符串 $s_i$ 是另外一个字符串 $s_j$ 的子串这种情况,可以直接忽略 $s_i$,因为构造出的字符串中包含 $s_j$ 因此也必然包含 $s_i$。为此我们先把所有满足该条件的 $s_i$ 先删去,可以用 Z 函数 来实现。具体做法是对于每个 $s_i$ 枚举剩下还没被删除的 $s_j$,拼接得到 $t = s_i + s_j$ 并求 Z 函数,然后枚举 $k \in [|s_i|, |t|]$ 的部分(下标从 $0$ 开始),如果发现 $z_k \geq |s_i|$,说明 $s_j[k - |s_i| \sim k - 1]$ 与 $s_i$ 匹配,即 $s_i$ 是 $s_j$ 的子串。
然后我们用剩余的字符串按任意顺序拼接出答案,这显然是可以的。考虑拼接得到的字符串中相邻的两个字符串 $s_i$ 和 $s_j$,用 $g(i,j)$ 表示 $s_i$ 的后缀与 $s_j$ 的前缀的最大匹配长度,即使得 $s_i[|s_i| - 1, |s_i| - g(i,j)] = s_j[0, g(i,j) - 1]$ 的最大的 $g(i,j)$。那么我们可以让答案的长度减少 $|s_j| - g(i,j)$。另外我们只需考虑相邻的字符串即可(不用再考虑更前面的字符串),因为不存在包含或被包含的关系。
因此就可以 dp 了,定义 $f(i,j)$ 表示在选择了 $i$ 所表示的二进制集合中的字符串进行拼接,且最后一个字符串是 $s_j$ 的所有方案中长度的最小值。根据前倒数第二个字符串是什么进行状态转移,转移方程就是 $f(i,j) = \min\limits_{k \in i} \left\{ f(i \setminus \{j\}, k) + |s_j| - g(k,j) \right\}$。
$g(i,j)$ 也可以用 Z 函数来求出来。枚举每一个 $s_i$ 作为前一个字符串,再枚举剩余的 $s_j$ 作为后一个字符串,拼接得到 $t = s_j + s_i$ 并求 Z 函数。枚举 $k \in [0, |s_i|-1]$,如果发现 $z_{k + |s_j|} \geq |s_i| - k$,说明 $s_i$ 的后缀 $s_i[k, |s_i| - 1]$ 与 $s_j$ 的前缀 $s_j[0, |s_i| - k - 1]$ 匹配,那么有 $g(i,j) = |s_i| - k$。
另外由于涉及到的都是字符串匹配,可以把 Z 函数换成简单粗暴的字符串哈希,但不建议这么做因为容易被卡。
AC 代码如下,时间复杂度为 $O\left(n \sum{|s_i|} + 2^n n^2 \right)$:
#include <bits/stdc++.h>
using namespace std;
const int N = 25, M = 2e5 + 10;
string s[N];
int z[M];
bool vis[N];
int g[N][N];
int f[1 << 20][N];
void z_function(string &s) {
int n = s.size();
for (int i = 1, l = 0, r = 0; i < n; i++) {
z[i] = 0;
if (i <= r) z[i] = min(z[i - l], r - i + 1);
while (i + z[i] < n && s[z[i]] == s[i + z[i]]) {
z[i]++;
l = i, r = i + z[i] - 1;
}
}
}
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n;
cin >> n;
for (int i = 0; i < n; i++) {
cin >> s[i];
}
vector<int> p;
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
if (i == j || vis[j]) continue;
string t = s[i] + s[j];
z_function(t);
for (int k = s[i].size(); k < t.size(); k++) {
if (z[k] >= s[i].size()) vis[i] = true;
}
}
if (!vis[i]) p.push_back(i);
}
n = p.size();
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
if (i == j) continue;
string t = s[p[j]] + s[p[i]];
z_function(t);
for (int k = 0; k < s[p[i]].size(); k++) {
if (z[k + s[p[j]].size()] >= s[p[i]].size() - k) {
g[i][j] = s[p[i]].size() - k;
break;
}
}
}
}
memset(f, 0x3f, sizeof(f));
for (int i = 0; i < n; i++) {
f[0][i] = 0;
}
for (int i = 1; i < 1 << n; i++) {
for (int j = 0; j < n; j++) {
if (~i >> j & 1) continue;
for (int k = 0; k < n; k++) {
if (i >> k & 1) f[i][j] = min(f[i][j], f[i ^ 1 << j][k] + int(s[p[j]].size()) - g[k][j]);
}
}
}
int ret = 0x3f3f3f3f;
for (int i = 0; i < n; i++) {
ret = min(ret, f[(1 << n) - 1][i]);
}
cout << ret;
return 0;
}
参考资料
Editorial - AtCoder Beginner Contest 343:https://atcoder.jp/contests/abc343/editorial/9447
本文来自博客园,作者:onlyblues,转载请注明原文链接:https://www.cnblogs.com/onlyblues/p/18053429