KMP算法

注意算法导论中的next[]数组与偏移量的差别

pattern偏移量：
偏移例子

next[]数组(即部分匹配值)的计算

计算next
引用：阮一峰的网络日志
 从头到尾彻底理解KMP（2014年8月22日版）

偏移数组offset的计算

此处发现阮一峰错误，下次更新纠正

移动位数 = 已匹配的字符数 - 对应的部分匹配值 (算得移动位数为0时，仍移动1位)

对于ABCABCABAC， next[] = {0, 0, 0, 1, 2, 3, 4, 0, 1, 2}
offset[] = …

ABCDABD
next[] = {0, 0, 0, 0, 1, 2, 0}
offset[] = {1, 1, 2, 3, 3, 3, 6}
{1, 1, 2, 3, 3, 3, 6}按博主做法是ABCDABD的移动量(即相减算得)
但是对于ABCDABD, 应为{1,1,2,3,6,6,4}

比如: ABCABDABCABC ABCABC 因为C和D不匹配，按照文中的公式，应右移5-2=3位。其实可以直接移动6位。因为C与D不匹配，C又和第三位C匹配，所以第三位C和这个D肯定是不匹配的，可以直接跳过。我认为正确的公式应该是：移动位数 = 已匹配对的字符数 - 最后一个匹配对的部分匹配值 + 当前匹配错的部分匹配值

伪码(pseudocode)

while (!matched && !exhausted)
{
    while (pattern char != text char)
    {
        shift pattern as far right as possible;
        // Amount to shift pattern to the right is obtained from a
        // table which is calculated by pre-processing the pattern
        if pattern has been moved past the current position of the text
            start search one position to the right;
        else
            start search at current position of the text;
    }
    increment indices of pattern and text by one;
}

java例子程序

public static int KMP (String text, String pattern) {

        int tLen = text.length();
        int pLen = pattern.length();

        // create and initialise the array of offsets

        int [ ] next = new int[pLen+1];

        int i = 0; int j = -1; next[i] = j;

        while (i < pLen) {
              if (j == -1 || pattern.charAt(i) == pattern.charAt(j) ) {
                  i++; j++; next[i] = j;
              } else {
                  j = next[j];
              }
        }
        for (int j2 = 0; j2 < next.length; j2++) {
            System.out.println("offset is " + next[j2]);
        }

     // now find the match, if any
        int tPos = 0; int pPos = 0;

        while (tPos < tLen && pPos < pLen) {
            if ( pPos == -1 || text.charAt(tPos) == pattern.charAt(pPos) ) {
                pPos++; tPos++;
               if (pPos >= pLen) {
                   return tPos - pLen;
                }
            } else {
                pPos = next[pPos];
            }
        }
        return -1;
    }

posted @ 2016-04-26 12:53 panty 阅读(221) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

panty

@see also http://blog.csdn.net/neuldp

KMP算法

注意算法导论中的next[]数组与偏移量的差别

next[]数组(即部分匹配值)的计算

偏移数组offset的计算

伪码(pseudocode)

java例子程序