G - Compress Strings

G - Compress Strings

Problem Statement

You are given $N$ strings $S_1, S_2, \ldots, S_N$.

Find the minimum length of a string that contains all these strings as substrings.

Here, a string $S$ contains a string $T$ as a substring if $T$ can be obtained by deleting zero or more characters from the beginning and zero or more characters from the end of $S$.

Constraints

  • $N$ is an integer.
  • $1 \leq N \leq 20$
  • $S_i$ is a string consisting of lowercase English letters whose length is at least $1$.
  • The total length of $S_1, S_2, \dots, S_N$ is at most $2\times 10^5$.

Input

The input is given from Standard Input in the following format:

$N$
$S_1$
$S_2$
$\vdots$
$S_N$

Output

Print the answer as an integer.


Sample Input 1

3
snuke
kensho
uk

Sample Output 1

9

The string snukensho of length $9$ contains all of $S_1$, $S_2$, and $S_3$ as substrings.

Specifically, the first to fifth characters of snukensho correspond to $S_1$, the fourth to ninth correspond to $S_2$, and the third to fourth correspond to $S_3$.

No shorter string contains all of $S_1$, $S_2$, and $S_3$ as substrings. Thus, the answer is $9$.


Sample Input 2

3
abc
abc
arc

Sample Output 2

6

Sample Input 3

6
cmcmrcc
rmrrrmr
mrccm
mmcr
rmmrmrcc
ccmcrcmcm

Sample Output 3

27

 

解题思路

  首先如果存在某个字符串 $s_i$ 是另外一个字符串 $s_j$ 的子串这种情况,可以直接忽略 $s_i$,因为构造出的字符串中包含 $s_j$ 因此也必然包含 $s_i$。为此我们先把所有满足该条件的 $s_i$ 先删去,可以用 Z 函数 来实现。具体做法是对于每个 $s_i$ 枚举剩下还没被删除的 $s_j$,拼接得到 $t = s_i + s_j$ 并求 Z 函数,然后枚举 $k \in [|s_i|, |t|]$ 的部分(下标从 $0$ 开始),如果发现 $z_k \geq |s_i|$,说明 $s_j[k - |s_i| \sim k - 1]$ 与 $s_i$ 匹配,即 $s_i$ 是 $s_j$ 的子串。

  然后我们用剩余的字符串按任意顺序拼接出答案,这显然是可以的。考虑拼接得到的字符串中相邻的两个字符串 $s_i$ 和 $s_j$,用 $g(i,j)$ 表示 $s_i$ 的后缀与 $s_j$ 的前缀的最大匹配长度,即使得 $s_i[|s_i| - 1, |s_i| - g(i,j)] = s_j[0, g(i,j) - 1]$ 的最大的 $g(i,j)$。那么我们可以让答案的长度减少 $|s_j| - g(i,j)$。另外我们只需考虑相邻的字符串即可(不用再考虑更前面的字符串),因为不存在包含或被包含的关系。

  因此就可以 dp 了,定义 $f(i,j)$ 表示在选择了 $i$ 所表示的二进制集合中的字符串进行拼接,且最后一个字符串是 $s_j$ 的所有方案中长度的最小值。根据前倒数第二个字符串是什么进行状态转移,转移方程就是 $f(i,j) = \min\limits_{k \in i} \left\{ f(i \setminus \{j\}, k) + |s_j| - g(k,j) \right\}$。

  $g(i,j)$ 也可以用 Z 函数来求出来。枚举每一个 $s_i$ 作为前一个字符串,再枚举剩余的 $s_j$ 作为后一个字符串,拼接得到 $t = s_j + s_i$ 并求 Z 函数。枚举 $k \in [0, |s_i|-1]$,如果发现 $z_{k + |s_j|} \geq |s_i| - k$,说明 $s_i$ 的后缀 $s_i[k, |s_i| - 1]$ 与 $s_j$ 的前缀 $s_j[0, |s_i| - k - 1]$ 匹配,那么有 $g(i,j) = |s_i| - k$。

  另外由于涉及到的都是字符串匹配,可以把 Z 函数换成简单粗暴的字符串哈希,但不建议这么做因为容易被卡。

  AC 代码如下,时间复杂度为 $O\left(n \sum{|s_i|} + 2^n n^2 \right)$:

#include <bits/stdc++.h>
using namespace std;

const int N = 25, M = 2e5 + 10;

string s[N];
int z[M];
bool vis[N];
int g[N][N];
int f[1 << 20][N];

void z_function(string &s) {
    int n = s.size();
    for (int i = 1, l = 0, r = 0; i < n; i++) {
        z[i] = 0;
        if (i <= r) z[i] = min(z[i - l], r - i + 1);
        while (i + z[i] < n && s[z[i]] == s[i + z[i]]) {
            z[i]++;
            l = i, r = i + z[i] - 1;
        }
    }
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    int n;
    cin >> n;
    for (int i = 0; i < n; i++) {
        cin >> s[i];
    }
    vector<int> p;
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            if (i == j || vis[j]) continue;
            string t = s[i] + s[j];
            z_function(t);
            for (int k = s[i].size(); k < t.size(); k++) {
                if (z[k] >= s[i].size()) vis[i] = true;
            }
        }
        if (!vis[i]) p.push_back(i);
    }
    n = p.size();
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            if (i == j) continue;
            string t = s[p[j]] + s[p[i]];
            z_function(t);
            for (int k = 0; k < s[p[i]].size(); k++) {
                if (z[k + s[p[j]].size()] >= s[p[i]].size() - k) {
                    g[i][j] = s[p[i]].size() - k;
                    break;
                }
            }
        }
    }
    memset(f, 0x3f, sizeof(f));
    for (int i = 0; i < n; i++) {
        f[0][i] = 0;
    }
    for (int i = 1; i < 1 << n; i++) {
        for (int j = 0; j < n; j++) {
            if (~i >> j & 1) continue;
            for (int k = 0; k < n; k++) {
                if (i >> k & 1) f[i][j] = min(f[i][j], f[i ^ 1 << j][k] + int(s[p[j]].size()) - g[k][j]);
            }
        }
    }
    int ret = 0x3f3f3f3f;
    for (int i = 0; i < n; i++) {
        ret = min(ret, f[(1 << n) - 1][i]);
    }
    cout << ret;
    
    return 0;
}

 

参考资料

  Editorial - AtCoder Beginner Contest 343:https://atcoder.jp/contests/abc343/editorial/9447

posted @ 2024-03-05 10:34  onlyblues  阅读(28)  评论(0编辑  收藏  举报
Web Analytics