String Algorithm

KMP

 

char a b a b a b c a
index 0 1 2 3 4 5 6 7
value 0 0 1 2 3 4 0 1

 

 

 

 

"slash"

proper profix: s, sl, sla, slas

proper suffix: h, sh, ash, lash

 

partial match table(也有人叫失配函数,也有人叫next数组):

每个index表示字符串是一个子串,子串的下标∈(0, index)

value = "proper profix"和"proper suffix"相同的最长字符串

举例:
index = 3处, proper profix: a, ab, aba;  proper suffix:b, ab, bab;

value = "ab"的长度 = 2

实现:

 1 void computeLPSArray(char *pat, int M, int *lps)
 2 {
 3     // length of the previous longest prefix suffix
 4     int len = 0;
 5  
 6     lps[0] = 0; // lps[0] is always 0
 7  
 8     // the loop calculates lps[i] for i = 1 to M-1
 9     int i = 1;
10     while (i < M)
11     {
12         if (pat[i] == pat[len])
13         {
14             len++;
15             lps[i] = len;
16             i++;
17         }
18         else // (pat[i] != pat[len])
19         {
20             // This is tricky. Consider the example.
21             // AAACAAAA and i = 7. The idea is similar 
22             // to search step.
23             if (len != 0)
24             {
25                 len = lps[len-1];
26  
27                 // Also, note that we do not increment
28                 // i here
29             }
30             else // if (len == 0)
31             {
32                 lps[i] = 0;
33                 i++;
34             }
35         }
36     }
37 }

 

 

模式字符串移动:

当table[partial_match_length] > 1, 移动partial_match_length - table[partial_match_length - 1]

 

KMP search实现:

 1 void KMPSearch(char *pat, char *txt)
 2 {
 3     int M = strlen(pat);
 4     int N = strlen(txt);
 5  
 6     // create lps[] that will hold the longest prefix suffix
 7     // values for pattern
 8     int lps[M];
 9  
10     // Preprocess the pattern (calculate lps[] array)
11     computeLPSArray(pat, M, lps);
12  
13     int i = 0;  // index for txt[]
14     int j  = 0;  // index for pat[]
15     while (i < N)
16     {
17         if (pat[j] == txt[i])
18         {
19             j++;
20             i++;
21         }
22  
23         if (j == M)
24         {
25             printf("Found pattern at index %d \n", i-j);
26             j = lps[j-1];
27         }
28  
29         // mismatch after j matches
30         else if (i < N && pat[j] != txt[i])
31         {
32             // Do not match lps[0..lps[j-1]] characters,
33             // they will match anyway
34             if (j != 0)
35                 j = lps[j-1];
36             else
37                 i = i+1;
38         }
39     }
40 }

紧凑的实现:

 1 void preparation(char *P, int *f) {
 2     int m = strlen(P);
 3     f[0] = f[1] = 0;
 4     for (int i = 1; i < m; i++) {
 5         int j = f[i];
 6         while (j && P[i] != P[j])
 7               j = f[j];
 8         f[i + 1] = (P[i] == P[j])? j+1 : 0;
 9     }
10 }
11 void KMP(char *T, char *P, int *f) {
12     int n = strlen(T), m = strlen(P);
13     preparation(P, f);
14     int j = 0;
15     for (int i = 0; i < n; i++) {
16         while (j && P[i] != T[i]) j = f[j];
17         if (P[j] == T[i]) j++;
18         if (j == m) answer(i - m + 1);
19     }
20 }

 

 

 

参考:

http://jakeboxer.com/blog/2009/12/13/the-knuth-morris-pratt-algorithm-in-my-own-words/

 http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm

 

 

AC 自动机

背景:基于有限状态自动机

KMP的partial match table那么多叫法什么失败指针,失配函数,就来源于AC自动机。

 

 

 

参考:http://www.cs.uku.fi/~kilpelai/BSA05/lectures/slides04.pdf

http://www.cnblogs.com/en-heng/p/5247903.html

posted @ 2016-10-30 16:16  autoria  阅读(205)  评论(0编辑  收藏  举报