KMP算法描述-python
KMP算法理论主要参考
阮一峰的博客:http://www.ruanyifeng.com/blog/2013/05/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.html
GeeksforGeeks上的文章 :https://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/
KMP 比 原始搜索的优势:
1 pattern不用每次都去wholeString回溯
2 通过 partial table记录了pattern,所以避免了重复搜索
KMP算法步骤:
1 wholeString 和 Pattern 进行首部匹配,否则wholeSring向后移
2 当首匹配,wholeString和Pattern同时向后移,直到位置不匹配
3 当不匹配时,通过partial table 可以让Pattern向后移,当移到Pattern首的时候,回到步骤1
4 期间如果Pattern被完全匹配,结果添加 当前位置-Pattern的长度
5 搜索继续
python代码描述
class KMP(object): # partial table def partial(self, pattern): """ Calculate partial match table: String -> [Int]""" partialList = [] for i in xrange(len(pattern)): p = pattern[:i+1] pre, last = len(p)-1, 1 while pre>0 and p[:pre] != p[last:]: # trickier: from long to short pre -= 1 last += 1 # print p, len(p[:pre]) partialList.append(len(p[:pre])) # print partialList return [0]+partialList[:-1] # nextList def search(self, T, P): """ KMP search main algorithm: String -> String -> [Int] Return all the matching position of pattern string P in S """ ansList = [] partial = self.partial(P) print partial i, j = 0, 0 # T的index; P的index while i < len(T): if T[i] == P[j]: # 两个index都向后走 i += 1 j += 1 # 全部匹配 if j == len(P)-1 and T[i] == P[j]: ansList.append(i-j) j = 0 # 当发生部分没有匹配的时候 while j>0 and T[i] != P[j]: j = partial[j] # P在向后移动, 直到移动到P的首位 else: # 找出P与T第一个相遇的点 i += 1 print ansList return ansList
测试用例
s1 = 'BBCABCDABABCDABCDABDEABCDABD' p1 = 'ABCDABD' s2 = '"ABABDABACDABABCABAB"' p2 = 'ABABCAB' KMP().search(s2, p2)
还有一个比较好的写法,来自m00nlight的github:https://gist.github.com/m00nlight/daa6786cc503fde12a77#file-gistfile1-py,代码如下:
class KMP: def partial(self, pattern): """ Calculate partial match table: String -> [Int]""" ret = [0] for i in range(1, len(pattern)): j = ret[i - 1] while j > 0 and pattern[j] != pattern[i]: j = ret[j - 1] ret.append(j + 1 if pattern[j] == pattern[i] else j) return ret def search(self, T, P): """ KMP search main algorithm: String -> String -> [Int] Return all the matching position of pattern string P in S """ partial, ret, j = self.partial(P), [], 0 for i in range(len(T)): while j > 0 and T[i] != P[j]: j = partial[j - 1] if T[i] == P[j]: j += 1 if j == len(P): ret.append(i - (j - 1)) j = 0 return ret