KMP算法
注意算法导论中的next[]数组与偏移量的差别
next[]数组(即部分匹配值)的计算
引用:阮一峰的网络日志
从头到尾彻底理解KMP(2014年8月22日版)
偏移数组offset的计算
此处发现阮一峰错误,下次更新纠正
移动位数 = 已匹配的字符数 - 对应的部分匹配值 (算得移动位数为0时,仍移动1位)
对于ABCABCABAC, next[] = {0, 0, 0, 1, 2, 3, 4, 0, 1, 2}
offset[] = …ABCDABD
next[] = {0, 0, 0, 0, 1, 2, 0}
offset[] = {1, 1, 2, 3, 3, 3, 6}
{1, 1, 2, 3, 3, 3, 6}按博主做法是ABCDABD的移动量(即相减算得)
但是对于ABCDABD, 应为{1,1,2,3,6,6,4}
比如: ABCABDABCABC ABCABC 因为C和D不匹配,按照文中的公式,应右移5-2=3位。其实可以直接移动6位。 因为C与D不匹配,C又和第三位C匹配,所以第三位C和这个D肯定是不匹配的,可以直接跳过。我认为正确的公式应该是: 移动位数 = 已匹配对的字符数 - 最后一个匹配对的部分匹配值 + 当前匹配错的部分匹配值
伪码(pseudocode)
while (!matched && !exhausted)
{
while (pattern char != text char)
{
shift pattern as far right as possible;
// Amount to shift pattern to the right is obtained from a
// table which is calculated by pre-processing the pattern
if pattern has been moved past the current position of the text
start search one position to the right;
else
start search at current position of the text;
}
increment indices of pattern and text by one;
}
java例子程序
public static int KMP (String text, String pattern) {
int tLen = text.length();
int pLen = pattern.length();
// create and initialise the array of offsets
int [ ] next = new int[pLen+1];
int i = 0; int j = -1; next[i] = j;
while (i < pLen) {
if (j == -1 || pattern.charAt(i) == pattern.charAt(j) ) {
i++; j++; next[i] = j;
} else {
j = next[j];
}
}
for (int j2 = 0; j2 < next.length; j2++) {
System.out.println("offset is " + next[j2]);
}
// now find the match, if any
int tPos = 0; int pPos = 0;
while (tPos < tLen && pPos < pLen) {
if ( pPos == -1 || text.charAt(tPos) == pattern.charAt(pPos) ) {
pPos++; tPos++;
if (pPos >= pLen) {
return tPos - pLen;
}
} else {
pPos = next[pPos];
}
}
return -1;
}
相关学习课程
http://www.cs.utexas.edu/~moore/best-ideas/string-searching/
https://www.coursera.org/course/algs4partII
http://study.163.com/course/introduction.htm?courseId=468002#/courseDetail
http://blog.csdn.net/yutianzuijin/article/details/11954939/