Knuth-Morris-Pratt KMP 字符串 匹配 算法

http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

http://www-igm.univ-mlv.fr/~lecroq/string/node8.html

http://www.matrix67.com/blog/archives/115

The Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters. The algorithm was conceived by Donald Knuth and Vaughan Pratt and independently by James H. Morris in 1977, but the three published it jointly.

void KMP(char *x, int m, char *y, int n) {
 int i, j, kmpNext[XSIZE];

 /* Preprocessing */
 preKmp(x, m, kmpNext);

 /* Searching */
 i = j = 0;
 while (j < n) {
  while (i > 1 && x[i] != y[j])
   i = kmpNext[i - 1];
  if (x[i] == y[j])
  {
   i++;
  }
  if (i >= m) {
   printf("success\n");
   i = kmpNext[i - 1];
  }
  j++;
 }
}

先说一下匹配过程,KMP的核心思想就是使得发生不匹配后,模式串移动的距离尽可能多。

 i = 1 2 3 4 5 6 7 8 9 ……

A = a b a b a b a a b a b …

B = a b a b a c b

j = 1 2 3 4 5 6 7

--------------------------------------------------------------------

 i = 1 2 3 4 5 6 7 8 9 ……

A = a b a b a b a a b a b …

B =       a b a b a c b

 j =        1 2 3 4 5 6 7

--------------------------------------------------------------------

i = 1 2 3 4 5 6 7 8 9 ……

A = a b a b a b a a b a b …

B =       a b a b a c b

j =        1 2 3 4 5 6 7

--------------------------------------------------------------------

i  = 1 2 3 4 5 6 7 8 9 ……

A = a b a b a b a a b a b …

B =             a b a b a c b

j =              1 2 3 4 5 6 7

关于这个移动距离的计算,其实是一种自我匹配的过程:

void preKmp(char *x, int m, int kmpNext[]) {
 int i, j;
 kmpNext[0] = 0;
 i = 1;
 j = 0;
 while (i < m) {
  while (j > 1 && x[i] != x[j])
   j = kmpNext[j - 1];
  if (x[i] == x[j])
   j++;
  kmpNext[i] = j;
  i++;
 }
}

The table kmpNext can be computed in O(m) space and time before the searching phase, applying the same searching algorithm to the pattern itself, as if x=y.

            i  =  1  2  3  4  5  6  7

            A =  a  b  a  b  a  c  b

            B =      a  b  a  b  a  c  b

             j =       1  2  3  4  5  6  7

kmpNext = 0

--------------------------------------------------------------------

            i  =  1  2  3  4  5  6  7

            A =  a  b  a  b  a  c  b

            B =          a  b  a  b  a  c  b

             j =           1  2  3  4  5  6  7

kmpNext = 0   0  1   2  3 

--------------------------------------------------------------------

            i  =  1  2  3  4  5  6  7

            A =  a  b  a  b  a  c  b

            B =                  a  b  a  b  a  c  b

             j =                  1  2  3  4  5  6  7

kmpNext =  0   0  1  2  3  0

 

 

posted @ 2010-08-18 18:41  Algorithms  阅读(344)  评论(0编辑  收藏  举报