4.基础算法--字符串2023-10-05

$K M P$

$K M P$ 算法（Knuth-Morris-Pratt 算法）是一个著名的字符串匹配算法，效率很高，但是确实有点复杂。

基本概念

$1$ 、s[ ]是模式串，即比较长的字符串。
$2$ 、p[ ]是模板串，即比较短的字符串。（这样可能不严谨。。。）
$3$ 、“非平凡前缀”：指除了最后一个字符以外，一个字符串的全部头部组合。
$4$ 、“非平凡后缀”：指除了第一个字符以外，一个字符串的全部尾部组合。（后面会有例子，均简称为前/后缀）
$5$ 、“部分匹配值”：前缀和后缀的最长共有元素的长度。
$6$ 、 $n e x t$ [ ]是“部分匹配值表”，即 $n e x t$ 数组，它存储的是每一个下标对应的“部分匹配值”，是 $K M P$ 算法的核心。

核心思想:在每次失配时，不是把p串往后移一位，而是把p串往后移动至下一次可以和前面部分匹配的位置，这样就可以跳过大多数的失配步骤。而每次 $p$ 串移动的步数就是通过查找 $n e x t$ [ ]数组确定的。

next数组的含义及手动模拟

next数组的含义：对 $n e x t [j]$ ，是 $p [1, j]$ 串中前缀和后缀相同的最大长度（部分匹配值），即 $p [1, n e x t [j]] = p [j - n e x t [j] + 1, j]$ 。

手动模拟求next数组：

对 p = “abcab”

对 $n e x t [1]$ ：前缀 = 空集————后缀 = 空集———— $n e x t [1] = 0$ ;

对 $n e x t [2]$ ：前缀 = { $a$ }————后缀 = { $b$ }———— $n e x t [2] = 0$ ;

对 $n e x t [3]$ ：前缀 = { $a$ , $a b$ }————后缀 = { $c$ , $b c$ }———— $n e x t$ [ $3] = 0$ ;

对 $n e x t$ [ $4$ ] ：前缀 = { $a$ , $a b$ , $a b c$ }————后缀 = { $a$ , $c a$ , $b c a$ }———— $n e x t$ [ $4] = 1$ ;

对 $n e x t$ [ $5$ ] ：前缀 = { $a$ , $a b$ , $a b c$ , $a b c a$ }———后缀 = { $b$ , $a b$ , $c a b$ , $b c a b$ }———— $n e x t$ [ $5] = 2$ ;

匹配思路

KMP主要分两步：求next数组、匹配字符串。

$s 串和 p 串都是从 1 开始的。 i 从 1 开始， j 从 0 开始，每次 s [i] 和 p [j + 1] 比较$

当匹配过程到上图所示时，

$s [a, b] = p [1, j] 且 s [i]! = p [j + 1]$ 此时要移动p串（不是移动1格，而是直接移动到下次能匹配的位置）

其中 $1$ 串为 $[1, n e x t [j]] ， 3$ 串为 $[j - n e x t [j] + 1, j]$ 。由匹配可知 $1$ 串等于 $3$ 串， $3$ 串等于 $2$ 串。所以直接移动p串使1到3的位置即可。这个操作可由 $j = n e x t [j]$ 直接完成。如此往复下去，当 $j == m$ 时匹配成功。

例. KMP字符串

题目描述:

给定一个字符串 $S$ ，以及一个模式串 $P$ ，所有字符串中只包含大小写英文字母以及阿拉伯数字。

模式串 $P$ 在字符串 $S$ 中多次作为子串出现。

求出模式串 $P$ 在字符串 $S$ 中所有出现的位置的起始下标。

输入格式:

第一行输入整数 $N$ ，表示字符串 $P$ 的长度。

第二行输入字符串 $P$ 。

第三行输入整数 $M$ ，表示字符串 $S$ 的长度。

第四行输入字符串 $S$ 。

输出格式:

共一行，输出所有出现位置的起始下标（下标从 $0$ 开始计数），整数之间用空格隔开。

输入

3
aba
5
ababa

输出

0 2

数据范围

$1 \leq N \leq 10^{5}, 1 \leq M \leq 10^{6}$

KMP

 #include <bits/stdc++.h>
#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
#define int long long
using namespace std;
 
const int N = 1e6 + 10;
 
char p[N], s[N];
int n, m, ne[N];
 
void solve()
{
    cin >> n >> s >> m >> p;
    ne[0] = -1;
    for(int i = 1, j = -1; i < n; i ++)
    {
        while(j != -1&&s[i] != s[j + 1])
            j = ne[j];
        if(s[i] == s[j + 1])
            j ++;
        ne[i] = j;
    }
    for(int i = 0, j = -1; i < m; i ++)
    {
        while(j != -1&&p[i] != s[j + 1])
            j = ne[j];
        if(p[i] == s[j + 1])
            j ++;
        if(j == n - 1)
        {
            cout << i - j << ' ';
            j = ne[j];
        }
    }
}
 
signed main()
{
    IOS;
    int _ = 1;
    // cin >> _;
    while(_ --)
        solve();
    return _ ^ _;
}

字符串哈希 Hash

我们定义一个把字符串映射到整数的函数 $f$ ，这个 $f$ 称为是 $H a s h$ 函数。

我们希望这个函数 $f$ 可以方便地帮我们判断两个字符串是否相等。

$H a s h$ 的核心思想在于，将输入映射到一个值域较小、可以方便比较的范围。

$H a s h$ 将一个字符串表示成 $p$ 进制

f (s) = \sum_{i = 1}^{l} s [i] \times p^{l - i} (m o d M)

为了避免冲突， $p$ 一般取 $131$ 或 $13331$ , $f ()$ 通常定义成 $u n s i g n e d l o n g l o n g$

$s [l - > r]$ 这段区间的 $H a s h = f (r) - f (l - 1) \times p^{r - l + 1}$ ，在不考虑冲突的情况下，判断两个字符串相等只需判断两字符串的哈希值是否相等。

例1. 模拟散列表

题目描述：

维护一个集合，支持如下几种操作：

1.I x，插入一个数 $x$ ；
2.Q x，询问数 $x$ 是否在集合中出现过；
现在要进行 $N$ 次操作，对于每个询问操作输出对应的结果。

输入格式：

第一行包含整数 $N$ ，表示操作数量。

接下来 $N$ 行，每行包含一个操作指令，操作指令为 I x，Q x 中的一种。

输出格式：

对于每个询问指令 Q x，输出一个询问结果，如果 $x$ 在集合中出现过，则输出 Yes，否则输出 No。

每个结果占一行。

输入

5
I 1
I 2
I 3
Q 2
Q 5

输出

Yes
No

数据范围

$1 \leq N \leq 10^{5}, - 10^{9} \leq x \leq 10^{9}$

代码

1

.开放寻址法

 #include <bits/stdc++.h>
#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
// #define int long long
 
using namespace std;
 
const int N = 3e5 + 10, null = 0x3f3f3f3f;
 
int h[N], n;
 
int find(int x)
{
    int t = (x % N + N) % N;
    while(h[t] != null&&h[t] != x)
    {
        t ++;
        if(t == N)
            t = 0;
    }
    return t;
}
 
void solve()
{
    string op;
    int x;
    cin >> op >> x;
    if(op == "I")
        h[find(x)] = x;
    else
    {
        if(h[find(x)] != null)
            puts("Yes");
        else puts("No");
    }
}
 
signed main()
{
    IOS;
    memset(h, 0x3f, sizeof(h));
    int _ = 1;
    cin >> _;
    while(_ --)
        solve();
    return _ ^ _;
}

代码

2

.拉链法

 #include <bits/stdc++.h>
#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
#define int long long
 
 
using namespace std;
 
const int N = 1e5 + 10;
 
int h[N], ne[N], e[N], idx, n;
 
void insert(int x)
{
    int t = (x % N + N) % N;
    e[idx] = x;
    ne[idx] = h[t];
    h[t] = idx ++;
}
 
bool find(int x)
{
    int t = (x % N + N) % N;
    for(int i = h[t]; i != -1; i = ne[i])
        if(e[i] == x)
            return true;
    return false;
}
 
void solve()
{
    string op;
    int x;
    cin >> op >> x;
    if(op == "I")
        insert(x);
    else 
    {
        if(find(x))
            puts("Yes");
        else puts("No");
    }
}
 
signed main()
{
    IOS;
    memset(h, -1, sizeof(h));
    int _ = 1;
    cin >> _;
    while(_ --)
        solve();
    return _ ^ _;
}

例2. 字符串哈希

题目描述：

给定一个长度为 $n$ 的字符串，再给定 $m$ 个询问，每个询问包含四个整数 $l_{1}, r_{1}, l_{2}, r_{2}$ ，请你判断 $[l_{1}, r_{1}]$ 和 $[l_{2}, r_{2}]$ 这两个区间所包含的字符串子串是否完全相同。

字符串中只包含大小写英文字母和数字。

输入格式：

第一行包含整数 $n$ 和 $m$ ，表示字符串长度和询问次数。

第二行包含一个长度为 $n$ 的字符串，字符串中只包含大小写英文字母和数字。

接下来 $m$ 行，每行包含四个整数 $l_{1}, r_{1}, l_{2}, r_{2}$ ，表示一次询问所涉及的两个区间。

注意，字符串的位置从 $1$ 开始编号。

输出格式：

对于每个询问输出一个结果，如果两个字符串子串完全相同则输出 Yes，否则输出 No。

每个结果占一行。

输入

8 3
aabbaabb
1 3 5 7
1 3 6 8
1 2 1 2

输出

Yes
No
Yes

数据范围

$1 \leq n, m \leq 10^{5}$

字符串哈希模板

 #include <bits/stdc++.h>
#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
// #define int long long
 
using namespace std;
typedef unsigned long long ull;
 
const int N = 1e6 + 10;
 
ull p[N], h[N];
char s[N];
int n, m, b = 131;
 
ull check(int l, int r)
{
    return h[r] - h[l - 1] * p[r - l + 1];
}
 
void solve()
{
    cin >> n >> m >> s + 1;
    p[0] = 1;
    for(int i = 1; i <= n; i ++)
        p[i] = p[i - 1] * b;
    for(int i = 1; i <= n; i ++)
        h[i] = h[i - 1] * b + s[i] - 'a' + 1;
    while(m --)
    {
        int a, b, c, d;
        cin >> a >> b >> c >> d;
        if(check(a, b) == check(c, d))
            puts("Yes");
        else puts("No");
    }
}
 
signed main()
{
    IOS;
    int _ = 1;
    // cin >> _;
    while(_ --)
        solve();
    return _ ^ _;
}

例3. 电源串

题目描述：

给定若干个长度 $\leq 10^{6}$ 的字符串，询问每个字符串最多是由多少个相同的子字符串重复连接而成的。如：ababab 则最多有 $3$ 个 ab 连接而成。

输入格式：

输入若干行，每行有一个字符串。特别的，字符串可能为 . 即一个半角句号，此时输入结束。

输出格式：

对于每个输入,输出一个数字,表示最多的重复子字符串数量

输入

abcd
aaaa
ababab
.

输出

1
4
3

数据范围

字符串长度 $\leq 10^{6}$ 。

点击查看代码

 #include <bits/stdc++.h>
#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
// #define int long long
 
using namespace std;
typedef unsigned long long ull;
const int N = 1e6 + 10;
ull p[N], h[N];
 
int ne[N], n, b = 131;
char s[N];
 
void solve()
{
    while(~scanf("%s", s + 1))
    {
        if(s[1] == '.') break;
        n = strlen(s + 1);
        int ans = 1, f = 1;
        for(int i = 1; i <= n; i ++)
            h[i] = h[i - 1] * b + s[i] - 'a' + 1;
        for(int i = 1; i <= n; i++)
        {
            f = 1;
            if(n % i) continue;
            for(int j = 0; j < n; j += i)
                if(h[j + i] - h[j] * p[i] != h[i])
                {
                    f = 0;
                    break;
                }
            if(f)
            {
                cout << n / i << endl;
                break;
            }
        }
    }
}
 
signed main()
{
    p[0] = 1;
    for(int i = 1; i <= 1000001; i ++)
        p[i] = p[i - 1] * b;
    IOS;
    int _ = 1;
    // cin >> _;
    while(_ --)
        solve();
    return _ ^ _;
}

字典树(trie树)

字典树，英文名 trie。顾名思义，就是一个像字典一样的树。

可以发现，这棵字典树用边来代表字母，而从根结点到树上某一结点的路径就代表了一个字符串。

$t r i e$ 的结构非常好懂，我们用 $δ (u, c)$ 表示结点 $u$ 的 $c$ 字符指向的下一个结点，或着说是结点 $u$ 代表的字符串后面添加一个字符 $c$ 形成的字符串的结点。（ $c$ 的取值范围和字符集大小有关，不一定是 $0 \sim 26$ 。）

有时需要标记插入进 $t r i e$ 的是哪些字符串，每次插入完成时在这个字符串所代表的节点处打上标记即可。( $c n t [x] + +$ )

$T r i e$ 树中有个二维数组 $s o n [N] [26]$ ，表示当前结点的儿子，如果没有的话，可以等于 $+ + i d x$ 。 $T r i e$ 树本质上是一颗多叉树，对于字母而言最多有 $26$ 个子结点。所以这个数组包含了两条信息。比如： $s o n [1] [0] = 2$ 表示 $1$ 结点的一个值为a的子结点为结点2;如果 $s o n [1] [0] = 0$ ，则意味着没有值为a子结点。这里的 $s o n [N] [26]$ 相当于链表中的 $n e [N]$

例1. Trie字符串统计

题目描述：

维护一个字符串集合，支持两种操作：

1.I x 向集合中插入一个字符串 $x$ ；
2.Q x 询问一个字符串在集合中出现了多少次。
共有 $N$ 个操作，所有输入的字符串总长度不超过 $10^{5}$ ，字符串仅包含小写英文字母。

输入格式：

第一行包含整数 $N$ ，表示操作数。

接下来 $N$ 行，每行包含一个操作指令，指令为 I x 或 Q x 中的一种。

输出格式：

对于每个询问指令 Q x，都要输出一个整数作为结果，表示 $x$ 在集合中出现的次数。

每个结果占一行。

输入

5
I abc
Q abc
Q ab
I ab
Q ab

输出

1
0
1

数据范围

$1 \leq N \leq 2 * 10^{4}$

点击查看代码

 #include <bits/stdc++.h>
#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
#define int long long
 
using namespace std;
 
const int N = 1e5 + 10, M = 26;
 
int son[N][M];
int e[N], ne[N], h[N], cnt[N], idx;
char str[N];
 
void insert()
{
    int p = 0;
    for(int i = 0; str[i]; i ++)
    {
        int u = str[i] - 'a';
        if(!son[p][u]) son[p][u] = ++ idx;
        p = son[p][u];
    }
    cnt[p] ++;
}
 
int query()
{
    int p = 0;
    for(int i = 0; str[i]; i ++)
    {
        int u = str[i] - 'a';
        if(!son[p][u]) return 0;
        p = son[p][u];
    }
    return cnt[p];
}
 
void solve()
{
    int n;
    cin >> n;
    
    char f;
    while(n --)
    {
        cin >> f >> str;
        if(f == 'I') insert();
        else cout << query() << '\n';
    }
}
 
signed main()
{
    IOS;
    int _ = 1;
    // cin >> _;
    while(_ --)
        solve();
    return _ ^ _;
}

例2. 字典树

题目描述：

给定 $n$ 个模式串 $s_{1}, s_{2}, \dots, s_{n}$ 和 $q$ 次询问，每次询问给定一个文本串 $t_{i}$ ，请回答
$s_{1} \sim s_{n}$ 中有多少个字符串 $s_{j}$ 满足 $t_{i}$ 是 $s_{j}$ 的前缀。
一个字符串 $t$ 是 $s$ 的前缀当且仅当从 $s$ 的末尾删去若干个（可以为 $0$ 个）连续的字符后与 $t$ 相同。

输入的字符串大小敏感。例如，字符串 Fusu 和字符串 fusu 不同。

输入格式：

本题单测试点内有多组测试数据。

输入的第一行是一个整数，表示数据组数 $T$ 。

对于每组数据，格式如下：
第一行是两个整数，分别表示模式串的个数 $n$ 和询问的个数 $q$ 。
接下来 $n$ 行，每行一个字符串，表示一个模式串。
接下来 $q$ 行，每行一个字符串，表示一次询问。

输出格式：

按照输入的顺序依次输出各测试数据的答案。
对于每次询问，输出一行一个整数表示答案。

输入

3
3 3
fusufusu
fusu
anguei
fusu
anguei
kkksc
5 2
fusu
Fusu
AFakeFusu
afakefusu
fusuisnotfake
Fusu
fusu
1 1
998244353
9

输出

2
1
0
1
2
1

解释

对于全部的测试点，保证 $1 \leq T, n, q \leq 10^{5}$ ，且输入字符串的总长度不超过 $3 \times 10^{6}$ 。输入的字符串只含大小写字母和数字，且不含空串。

点击查看代码

 #include<bits/stdc++.h>
#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
 
using namespace std;
 
const int N = 3e6 + 10, M = 80;
 
int son[N][M], cnt[N], idx;
int n, m;
string s;
 
void Insert()
{
    int p = 0;
    for(int i = 0; s[i]; i ++)
    {
        int u = s[i] - '0';
        if(!son[p][u]) son[p][u] = ++ idx;
        p = son[p][u];
        cnt[p] ++;
    }
}
 
int query()
{
    int p = 0;
    for(int i = 0; s[i]; i ++)
    {
        int u = s[i] - '0';
        if(!son[p][u]) return 0;
        p = son[p][u];
    }
    return cnt[p];
}
 
void solve()
{
    cin >> n >> m;
    while(n --)
    {
        cin >> s;
        Insert();
    }
    while(m --)
    {
        cin >> s;
        cout << query() << '\n';
    }
    for(int i = 0; i <= idx; i ++)
        for(int j = 0; j < M; j ++)
            son[i][j] = 0;
    for(int i = 0; i <= idx; i ++)
        cnt[i] = 0;
    idx = 0;
}
 
signed main()
{
    IOS;
    int _ = 1;
    cin >> _;
    while(_ --)
        solve();
    return _ ^ _;
}

例3. 阅读理解

题目描述：

英语老师留了 $N$ 篇阅读理解作业，但是每篇英文短文都有很多生词需要查字典，为了节约时间，现在要做个统计，算一算某些生词都在哪几篇短文中出现过。

输入格式：

第一行为整数 $N$ ，表示短文篇数，其中每篇短文只含空格和小写字母。

按下来的 $N$ 行，每行描述一篇短文。每行的开头是一个整数 $L$ ，表示这篇短文由 $L$ 个单词组成。接下来是 $L$ 个单词，单词之间用一个空格分隔。

然后为一个整数 $M$ ，表示要做几次询问。后面有 $M$ 行，每行表示一个要统计的生词。

输出格式：

对于每个生词输出一行，统计其在哪几篇短文中出现过，并按从小到大输出短文的序号，序号不应有重复，序号之间用一个空格隔开（注意第一个序号的前面和最后一个序号的后面不应有空格）。如果该单词一直没出现过，则输出一个空行。

输入

3
9 you are a good boy ha ha o yeah
13 o my god you like bleach naruto one piece and so do i
11 but i do not think you will get all the points
5
you
i
o
all
naruto

输出

1 2 3
2 3
1 2
3
2

解释

对于 $30 %$ 的数据， $1 \leq M \leq 10^{3}$ 。

对于 $100 %$ 的数据， $1 \leq M \leq 10^{4} ， 1 \leq N \leq 10^{3}$ 。

每篇短文长度（含相邻单词之间的空格） $\leq 5 \times 10^{3}$ 字符，每个单词长度 $\leq 20$ 字符。

每个测试点时限 $2$ 秒。

点击查看代码

 #include <bits/stdc++.h>
#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
 
using namespace std;
 
const int N = 2e5 + 10;
 
int n, m, idx;
 
void Insert(vector<vector<int> >&son, vector<vector<bool> > &cnt, string s, int x, int y)
{
    int p = 0;
    for(int i = 0; i < y; i ++)
    {
        int u = s[i] - 'a';
        if(!son[p][u]) son[p][u] = ++ idx;
        p = son[p][u];
    }
    cnt[p][x] = true;
}
 
void query(vector<vector<int> >&son, vector<vector<bool> > &cnt, string s, int x)
{
    int p = 0;
    for(int i = 0; i < x; i ++)
    {
        int u = s[i] - 'a';
        if(!son[p][u])return;
        p = son[p][u];
    }
    for(int i = 1, j = 0; i <= n; i ++)
        if(cnt[p][i])
        {
            if(j == 0)
                cout << i;
            else
                cout << ' ' << i;
            j ++;
        }
}
 
void solve()
{
    cin >> n;
    vector<vector<int> > son(N, vector<int> (26, 0));
    vector<vector<bool> > cnt(N, vector<bool> (n + 1, false));
    for(int i = 1; i <= n; i ++)
    {
        cin >> m;
        while(m --)
        {
            string s;
            cin >> s;
            Insert(son, cnt, s, i, s.size());
        }
    }
    cin >> m;
    while(m --)
    {
        string s;
        cin >> s;
        query(son, cnt, s, s.size());
        cout << '\n';
    }
}
signed main()
{
    IOS;
    int _ = 1;
    // cin >> _;
    while(_ --)
        solve();
    return _ ^ _;
}

AC自动机

AC 自动机是以 Trie 的结构为基础，结合 KMP 的思想建立的自动机，用于解决多模式匹配等任务。Aho–Corasick（Alfred V. Aho, Margaret J. Corasick. 1975）

建立AC自动机

建立一个 AC 自动机有两个步骤：

1.基础的 Trie 结构：将所有的模式串构成一棵 Trie。
2.KMP 的思想：对 Trie 树上所有的结点构造失配指针。
然后就可以利用它进行多模式匹配了。

在AC自动机的构建过程中，用 $B F S$ 遍历该层，当当前节点的下一个字母不存在时，记录下

$n e [] 数组含义：$ 记录树上结点的指针，表示以当前位置为结尾的前缀匹配的所有最长的后缀最后一个字母所在的位置，即在 $T r i e$ 树上从当前位置往回跳，所有能跳到的与当前位置为结尾的前缀相匹配的最长的前缀结尾位置。

例. 搜索关键词

题目描述：

给定 $n$ 个长度不超过 $50$ 的由小写英文字母组成的单词，以及一篇长为 $m$ 的文章。

请问，其中有多少个单词在文章中出现了。

注意：每个单词不论在文章中出现多少次，仅累计 $1$ 次。

输入格式：

第一行包含整数 $T$ ，表示共有 $T$ 组测试数据。

对于每组数据，第一行一个整数 $n$ ，接下去 $n$ 行表示 $n$ 个单词，最后一行输入一个字符串，表示文章。

输出格式：

对于每组数据，输出一个占一行的整数，表示有多少个单词在文章中出现。

输入

1
5
she
he
say
shr
her
yasherhs

输出

3

数据范围

$1 \leq n \leq 10^{4}, 1 \leq m \leq 10^{6}$

AC自动机原版

 #include <bits/stdc++.h>
#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
// #define int long long
 
using namespace std;
 
typedef long long LL;
 
const int N = 10010, s = 55, M = 1000010;
 
int tr[N * s][26], cnt[N * s], idx;
int q[N * s], ne[N * s];
char str[M];
 
void insert()
{
	int p = 0;
	for(int i = 0; str[i]; i ++)
	{
		int u = str[i] - 'a';
		if(!tr[p][u]) tr[p][u] = ++idx;
		p = tr[p][u];
	}
	cnt[p] ++;
}
 
void build()
{
	int hh = 0, tt = -1;
	for(int i = 0; i < 26; i ++)
		if(tr[0][i])
			q[++ tt] = tr[0][i];
	while(hh <= tt)
	{
		int t = q[hh ++];
		for(int i = 0; i < 26; i ++)
		{
			int c = tr[t][i];
			if(!c) continue;
			
			int j = ne[t];
			while(j&&!tr[j][i]) j = ne[j];
			if(tr[j][i]) j = tr[j][i];
			ne[c] = j;
			q[++ tt] = c;
		}
	}
}
 
void solve()
{
	int n, m;
	memset(tr, 0, sizeof tr);
	memset(cnt, 0, sizeof cnt);
	memset(ne, 0, sizeof ne);
	idx = 0;
	cin >> n;
	for(int i = 0; i < n; i ++)
	{
		cin >> str;
		insert();
	}
	build();
	cin >> str;
	int ans = 0;
	for(int i = 0, j = 0; str[i]; i ++)
	{
		int t = str[i] - 'a';
		while(j&&!tr[j][t]) j = ne[j];
		if(tr[j][t]) j = tr[j][t];
		
		int p = j;
		
		while(p)
		{
			ans += cnt[p];
			cnt[p] = 0;
			p = ne[p];
		}
	}
	cout << ans << "\n";
}
 
signed main()
{
    IOS;
    int _ = 1;
    cin >> _;
    while(_ --)
        solve();
    return _ ^ _;
}

AC自动机优化版

 #include <bits/stdc++.h>
#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
// #define int long long
 
using namespace std;
 
const int N = 1e4 + 10, M = 1e6 + 10, S = 55;
 
int tr[N * S][26], ne[N * S];
int cnt[N * S], q[N * S], idx;
int n, m;
string s;
 
void Insert()
{
    int p = 0;
    for(int i = 0; s[i]; i ++)
    {
        int u = s[i] - 'a';
        if(!tr[p][u]) tr[p][u] = ++ idx;
        p = tr[p][u];
    }
    cnt[p] ++;
}
 
void build()
{
    int hh = 0, tt = -1;
    for(int i = 0; i < 26; i ++)
        if(tr[0][i])
            q[++ tt] = tr[0][i];
    while(hh <= tt)
    {
        int t = q[hh ++];
        for(int i = 0; i < 26; i ++)
        {
            int p = tr[t][i];
            if(!p) tr[t][i] = tr[ne[t]][i];
            else
            {
                ne[p] = tr[ne[t]][i];
                q[++ tt] = p;
            }
        }
    }
}
 
void solve()
{
    cin >> n;
    while(n --)
    {
        cin >> s;
        Insert();
    }
    build();
    cin >> s;
    int ans = 0;
    for(int i = 0, j = 0; s[i]; i ++)
    {
        int t = s[i] - 'a';
        j = tr[j][t];
        int p = j;
        while(p)
        {
            ans += cnt[p];
            cnt[p] = 0;
            p = ne[p];
        }
    }
    cout << ans << '\n';
    for(int i = 0; i <= idx; i ++)
    {
        ne[i] = cnt[i] = 0;
        for(int j = 0; j < 26; j ++)
            tr[i][j] = 0;
    }
}
 
signed main()
{
    IOS;
    int _ = 1;
    cin >> _;
    while(_ --)
        solve();
    return _ ^ _;
}

manachar(马拉车)

最小表示法 $O (l o g n)$

最小表示法是用于解决字符串最小表示问题的方法。

循环同构:

当字符串 $S$ 中可以选定一个位置 $i$ 满足

S [i \cdot \cdot \cdot n] + S [1 \cdot \cdot \cdot i - 1] = T

则称 $S$ 与 $T$ 循环同构

最小表示:字符串 $S$ 的最小表示为与 $S$ 循环同构的所有字符串中字典序最小的字符串

simple 的暴力

最小表示法

考虑对于一对字符串 $A, B$ , 它们在原字符串 $S$ 中的起始位置分别为 $i, j$ , 且它们的前 $k$ 个字符均相同，即

S [i \dots i + k - 1] = S [j \dots j + k - 1]

不妨先考虑 $S [i + k] > S [j + k]$ 的情况，我们发现起始位置下标 $l$ 满足 $i \leq l \leq i + k$ 的字符串均不能成为答案。因为对于任意一个字符串 $S_{i + p}$ （表示以 $i + p$ 为起始位置的字符串， $p \in [0, k]$ ）一定存在字符串 $S_{j + p}$ 比它更优。

过程：

$1$ .初始化指针 $i$ 为 $0$ ， $j$ 为 $1$ ；初始化匹配长度 $k$ 为 $0$
$2$ .比较第 $k$ 位的大小，根据比较结果跳转相应指针。若跳转后两个指针相同，则随意选一个加一以保证比较的两个字符串不同
$3$ .重复上述过程，直到比较结束
$4$ .答案为 $i, j$ 中较小的一个

例2. 循环同构

题目描述：

如果存在一个整数 $k$ 使得字符串 $S$ 在被循环右移 $k$ 个位置后等于字符串 $T$ ，那么字符串 $S$ 和 $T$ 被称为循环右移。

现在，给定由小写字母组成的 $n$ 个长度为 $m$ 的字符串，总共有 $Q$ 个查询。每个查询都提供两个正整数 $x$ 和 $y$ 。如果字符串 $s_{x}$ 和 $s_{y}$ 是循环右移，输出 “Yes” ；否则，输出 “No” 。

输入格式：

输入由多个测试用例组成。第一行包含一个整数 $T (1 \leq T \leq 5)$ —测试用例的数量。测试用例的描述如下。

每个测试用例的第一行包含两个整数 $n$ 和 $m$ $(1 \leq n \times m \leq 10^{5})$ —字符串的数量和长度。

接下来的 $n$ 行中的每一行都包含一个小写字母 $s_{i}$ 的字符串.

下一行包含一个正整数 $q$ $(1 \leq q \leq 10^{5})$ 。

接下来的 $Q$ 行中的每一行都包含两个整数 $x ， y$ $(1 \leq x ， y \leq n)$ 询问字符串 $s_{x}$ 和字符串 $s_{y}$ 是否循环同构。

输出格式：

对于每个测试用例，输出 $Q$ 行。每行应该包含一个字符串，指示当前查询字符串 $s_{x}$ 和 $s_{y}$ 是否循环同构。如果它们是循环同构的，输出“Yes”；否则，输出“No”。

输入

2
2 2
ab
ba
1
1 2
4 3
aab
baa
bba
bab
6
1 2
1 3
1 4
2 3
2 4
3 4

输出

Yes
Yes
No
No
No
No
Yes

点击查看代码

 #include <bits/stdc++.h>
#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
 
using namespace std;
 
const int N = 1e5 + 10;
 
int ne[N];
int x, y, n, m, q, ans;
string s[N], ss[N];
 
string check(string s)
{
	int n = s.size();
	s = s + s;
	int i = 0, j = 1;
	while(i < n && j < n)
	{
		int k = 0;
		while(k < n && s[i + k] == s[j + k]) k ++;
		if(s[i + k] > s[j + k]) i += k + 1;
		else j += k + 1;
		if(i == j) i ++;
	}
	int k = min(i, j);
	return s.substr(k, n);
}
 
void solve()
{
	cin >> n >> m;
	for(int i = 1; i <= n; i ++)
		cin >> s[i];
	for(int i = 1; i <= n; i ++)
		ss[i] = check(s[i]);
	cin >> q;
	while(q --)
	{
		cin >> x >> y;
		if(ss[x] == ss[y])
			puts("Yes");
		else puts("No");
	}
}
 
signed main()
{
	ios::sync_with_stdio(false);
	cin.tie(0);
	int T = 1;
	cin >> T;
	while(T --)
		solve();
	return T ^ T;
}

上一篇动态规划--DP

下一篇第十四届蓝桥杯大赛软件赛省赛 C/C++ 大学 B 组

本文作者：chfychin

本文链接：https://www.cnblogs.com/chfychin/p/17743525.html

posted @ 2023-10-05 23:40 chfychin 阅读(46) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

chfychin

基础算法--字符串

目录

$K M P$

基本概念

next数组的含义及手动模拟

匹配思路

例. KMP字符串

字符串哈希 Hash

例1. 模拟散列表

例2. 字符串哈希

例3. 电源串

字典树(trie树)

例1. Trie字符串统计

例2. 字典树

例3. 阅读理解

AC自动机

建立AC自动机

例. 搜索关键词

manachar(马拉车)

最小表示法 $O (l o g n)$

例2. 循环同构

公告

常用链接

最新随笔

合集 (1)

随笔分类 (17)

随笔档案 (20)

阅读排行榜

评论排行榜

推荐排行榜

最新评论

	#include <bits/stdc++.h>
	#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
	#define int long long
	using namespace std;

	const int N = 1e6 + 10;

	char p[N], s[N];
	int n, m, ne[N];

	void solve()
	{
	cin >> n >> s >> m >> p;
	ne[0] = -1;
	for(int i = 1, j = -1; i < n; i ++)
	{
	while(j != -1&&s[i] != s[j + 1])
	j = ne[j];
	if(s[i] == s[j + 1])
	j ++;
	ne[i] = j;
	}
	for(int i = 0, j = -1; i < m; i ++)
	{
	while(j != -1&&p[i] != s[j + 1])
	j = ne[j];
	if(p[i] == s[j + 1])
	j ++;
	if(j == n - 1)
	{
	cout << i - j << ' ';
	j = ne[j];
	}
	}
	}

	signed main()
	{
	IOS;
	int _ = 1;
	// cin >> _;
	while(_ --)
	solve();
	return _ ^ _;
	}

	#include <bits/stdc++.h>
	#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)
	// #define int long long

	using namespace std;

	const int N = 3e5 + 10, null = 0x3f3f3f3f;

	int h[N], n;

	int find(int x)
	{
	int t = (x % N + N) % N;
	while(h[t] != null&&h[t] != x)
	{
	t ++;
	if(t == N)
	t = 0;
	}
	return t;
	}

	void solve()
	{
	string op;
	int x;
	cin >> op >> x;
	if(op == "I")
	h[find(x)] = x;
	else
	{
	if(h[find(x)] != null)
	puts("Yes");
	else puts("No");
	}
	}

	signed main()
	{
	IOS;
	memset(h, 0x3f, sizeof(h));
	int _ = 1;
	cin >> _;
	while(_ --)
	solve();
	return _ ^ _;
	}

	#include<bits/stdc++.h>
	#define IOS ios::sync_with_stdio(false);cin.tie(nullptr),cout.tie(nullptr)

	using namespace std;

	const int N = 3e6 + 10, M = 80;

	int son[N][M], cnt[N], idx;
	int n, m;
	string s;

	void Insert()
	{
	int p = 0;
	for(int i = 0; s[i]; i ++)
	{
	int u = s[i] - '0';
	if(!son[p][u]) son[p][u] = ++ idx;
	p = son[p][u];
	cnt[p] ++;
	}
	}

	int query()
	{
	int p = 0;
	for(int i = 0; s[i]; i ++)
	{
	int u = s[i] - '0';
	if(!son[p][u]) return 0;
	p = son[p][u];
	}
	return cnt[p];
	}

	void solve()
	{
	cin >> n >> m;
	while(n --)
	{
	cin >> s;
	Insert();
	}
	while(m --)
	{
	cin >> s;
	cout << query() << '\n';
	}
	for(int i = 0; i <= idx; i ++)
	for(int j = 0; j < M; j ++)
	son[i][j] = 0;
	for(int i = 0; i <= idx; i ++)
	cnt[i] = 0;
	idx = 0;
	}

	signed main()
	{
	IOS;
	int _ = 1;
	cin >> _;
	while(_ --)
	solve();
	return _ ^ _;
	}

chfychin

基础算法--字符串

目录

KMP

基本概念

next数组的含义及手动模拟

匹配思路

例. KMP字符串

字符串哈希 Hash

例1. 模拟散列表

例2. 字符串哈希

例3. 电源串

字典树(trie树)

例1. Trie字符串统计

例2. 字典树

例3. 阅读理解

AC自动机

建立AC自动机

例. 搜索关键词

manachar(马拉车)

最小表示法 O(logn)

例2. 循环同构

公告

常用链接

最新随笔

合集 (1)

随笔分类 (17)

随笔档案 (20)

阅读排行榜

评论排行榜

推荐排行榜

最新评论

$K M P$

最小表示法 $O (l o g n)$