KMP算法

注意算法导论中的next[]数组与偏移量的差别


pattern偏移量:
偏移例子
偏移例子


next[]数组(即部分匹配值)的计算

计算next
引用:阮一峰的网络日志
从头到尾彻底理解KMP(2014年8月22日版)


偏移数组offset的计算


此处发现阮一峰错误,下次更新纠正

移动位数 = 已匹配的字符数 - 对应的部分匹配值 (算得移动位数为0时,仍移动1位)

对于ABCABCABAC, next[] = {0, 0, 0, 1, 2, 3, 4, 0, 1, 2}
offset[] = …

ABCDABD
next[] = {0, 0, 0, 0, 1, 2, 0}
offset[] = {1, 1, 2, 3, 3, 3, 6}
{1, 1, 2, 3, 3, 3, 6}按博主做法是ABCDABD的移动量(即相减算得)
但是对于ABCDABD, 应为{1,1,2,3,6,6,4}

比如: ABCABDABCABC ABCABC 因为C和D不匹配,按照文中的公式,应右移5-2=3位。其实可以直接移动6位。 因为C与D不匹配,C又和第三位C匹配,所以第三位C和这个D肯定是不匹配的,可以直接跳过。我认为正确的公式应该是: 移动位数 = 已匹配对的字符数 - 最后一个匹配对的部分匹配值 + 当前匹配错的部分匹配值


伪码(pseudocode)

while (!matched && !exhausted)
{
    while (pattern char != text char)
    {
        shift pattern as far right as possible;
        // Amount to shift pattern to the right is obtained from a
        // table which is calculated by pre-processing the pattern
        if pattern has been moved past the current position of the text
            start search one position to the right;
        else
            start search at current position of the text;
    }
    increment indices of pattern and text by one;
}

java例子程序

public static int KMP (String text, String pattern) {

        int tLen = text.length();
        int pLen = pattern.length();

        // create and initialise the array of offsets

        int [ ] next = new int[pLen+1];

        int i = 0; int j = -1; next[i] = j;

        while (i < pLen) {
              if (j == -1 || pattern.charAt(i) == pattern.charAt(j) ) {
                  i++; j++; next[i] = j;
              } else {
                  j = next[j];
              }
        }
        for (int j2 = 0; j2 < next.length; j2++) {
            System.out.println("offset is " + next[j2]);
        }

     // now find the match, if any
        int tPos = 0; int pPos = 0;

        while (tPos < tLen && pPos < pLen) {
            if ( pPos == -1 || text.charAt(tPos) == pattern.charAt(pPos) ) {
                pPos++; tPos++;
               if (pPos >= pLen) {
                   return tPos - pLen;
                }
            } else {
                pPos = next[pPos];
            }
        }
        return -1;
    }

相关学习课程
http://www.cs.utexas.edu/~moore/best-ideas/string-searching/
https://www.coursera.org/course/algs4partII
http://study.163.com/course/introduction.htm?courseId=468002#/courseDetail
http://blog.csdn.net/yutianzuijin/article/details/11954939/


posted @ 2016-04-26 12:53  panty  阅读(221)  评论(0编辑  收藏  举报