String Algorithm
KMP
char | a | b | a | b | a | b | c | a |
index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
value | 0 | 0 | 1 | 2 | 3 | 4 | 0 | 1 |
"slash"
proper profix: s, sl, sla, slas
proper suffix: h, sh, ash, lash
partial match table(也有人叫失配函数,也有人叫next数组):
每个index表示字符串是一个子串,子串的下标∈(0, index)
value = "proper profix"和"proper suffix"相同的最长字符串
举例:
index = 3处, proper profix: a, ab, aba; proper suffix:b, ab, bab;
value = "ab"的长度 = 2
实现:
1 void computeLPSArray(char *pat, int M, int *lps) 2 { 3 // length of the previous longest prefix suffix 4 int len = 0; 5 6 lps[0] = 0; // lps[0] is always 0 7 8 // the loop calculates lps[i] for i = 1 to M-1 9 int i = 1; 10 while (i < M) 11 { 12 if (pat[i] == pat[len]) 13 { 14 len++; 15 lps[i] = len; 16 i++; 17 } 18 else // (pat[i] != pat[len]) 19 { 20 // This is tricky. Consider the example. 21 // AAACAAAA and i = 7. The idea is similar 22 // to search step. 23 if (len != 0) 24 { 25 len = lps[len-1]; 26 27 // Also, note that we do not increment 28 // i here 29 } 30 else // if (len == 0) 31 { 32 lps[i] = 0; 33 i++; 34 } 35 } 36 } 37 }
模式字符串移动:
当table[partial_match_length] > 1
, 移动partial_match_length - table[partial_match_length - 1]
KMP search实现:
1 void KMPSearch(char *pat, char *txt) 2 { 3 int M = strlen(pat); 4 int N = strlen(txt); 5 6 // create lps[] that will hold the longest prefix suffix 7 // values for pattern 8 int lps[M]; 9 10 // Preprocess the pattern (calculate lps[] array) 11 computeLPSArray(pat, M, lps); 12 13 int i = 0; // index for txt[] 14 int j = 0; // index for pat[] 15 while (i < N) 16 { 17 if (pat[j] == txt[i]) 18 { 19 j++; 20 i++; 21 } 22 23 if (j == M) 24 { 25 printf("Found pattern at index %d \n", i-j); 26 j = lps[j-1]; 27 } 28 29 // mismatch after j matches 30 else if (i < N && pat[j] != txt[i]) 31 { 32 // Do not match lps[0..lps[j-1]] characters, 33 // they will match anyway 34 if (j != 0) 35 j = lps[j-1]; 36 else 37 i = i+1; 38 } 39 } 40 }
紧凑的实现:
1 void preparation(char *P, int *f) { 2 int m = strlen(P); 3 f[0] = f[1] = 0; 4 for (int i = 1; i < m; i++) { 5 int j = f[i]; 6 while (j && P[i] != P[j]) 7 j = f[j]; 8 f[i + 1] = (P[i] == P[j])? j+1 : 0; 9 } 10 } 11 void KMP(char *T, char *P, int *f) { 12 int n = strlen(T), m = strlen(P); 13 preparation(P, f); 14 int j = 0; 15 for (int i = 0; i < n; i++) { 16 while (j && P[i] != T[i]) j = f[j]; 17 if (P[j] == T[i]) j++; 18 if (j == m) answer(i - m + 1); 19 } 20 }
参考:
http://jakeboxer.com/blog/2009/12/13/the-knuth-morris-pratt-algorithm-in-my-own-words/
http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm
AC 自动机
背景:基于有限状态自动机
KMP的partial match table那么多叫法什么失败指针,失配函数,就来源于AC自动机。
参考:http://www.cs.uku.fi/~kilpelai/BSA05/lectures/slides04.pdf
http://www.cnblogs.com/en-heng/p/5247903.html