KMP算法描述-python

KMP算法理论主要参考

  阮一峰的博客:http://www.ruanyifeng.com/blog/2013/05/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.html

  GeeksforGeeks上的文章 :https://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/

KMP 比 原始搜索的优势:

  1 pattern不用每次都去wholeString回溯

   2 通过 partial table记录了pattern,所以避免了重复搜索

KMP算法步骤:

  1 wholeString 和 Pattern 进行首部匹配,否则wholeSring向后移

  2 当首匹配,wholeString和Pattern同时向后移,直到位置不匹配

  3 当不匹配时,通过partial table 可以让Pattern向后移,当移到Pattern首的时候,回到步骤1

  4 期间如果Pattern被完全匹配,结果添加 当前位置-Pattern的长度

  5 搜索继续

python代码描述

class KMP(object):

    # partial table
    def partial(self, pattern):
        """ Calculate partial match table: String -> [Int]"""
        partialList = []
        for i in xrange(len(pattern)):
            p = pattern[:i+1]
            pre, last = len(p)-1, 1
            while pre>0 and p[:pre] != p[last:]:  # trickier: from long to short
                pre -= 1
                last += 1
            # print p, len(p[:pre])
            partialList.append(len(p[:pre]))
        # print partialList
        return [0]+partialList[:-1]  # nextList

    def search(self, T, P):
        """
        KMP search main algorithm: String -> String -> [Int]
        Return all the matching position of pattern string P in S
        """
        ansList = []
        partial = self.partial(P)
        print partial
        i, j = 0, 0  # T的index; P的index
        while i < len(T):
            if T[i] == P[j]:
                # 两个index都向后走
                i += 1
                j += 1
                # 全部匹配
                if j == len(P)-1 and T[i] == P[j]:
                    ansList.append(i-j)
                    j = 0
                # 当发生部分没有匹配的时候
                while j>0 and T[i] != P[j]:
                    j = partial[j]  # P在向后移动, 直到移动到P的首位
            else:  # 找出P与T第一个相遇的点
                i += 1
        print ansList
        return ansList

测试用例

s1 = 'BBCABCDABABCDABCDABDEABCDABD'
p1 = 'ABCDABD'
s2 = '&quot;ABABDABACDABABCABAB&quot'
p2 = 'ABABCAB'
KMP().search(s2, p2)

还有一个比较好的写法,来自m00nlight的github:https://gist.github.com/m00nlight/daa6786cc503fde12a77#file-gistfile1-py,代码如下:

class KMP:
    def partial(self, pattern):
        """ Calculate partial match table: String -> [Int]"""
        ret = [0]
        
        for i in range(1, len(pattern)):
            j = ret[i - 1]
            while j > 0 and pattern[j] != pattern[i]:
                j = ret[j - 1]
            ret.append(j + 1 if pattern[j] == pattern[i] else j)
        return ret
        
    def search(self, T, P):
        """ 
        KMP search main algorithm: String -> String -> [Int] 
        Return all the matching position of pattern string P in S
        """
        partial, ret, j = self.partial(P), [], 0
        
        for i in range(len(T)):
            while j > 0 and T[i] != P[j]:
                j = partial[j - 1]
            if T[i] == P[j]: j += 1
            if j == len(P): 
                ret.append(i - (j - 1))
                j = 0
            
        return ret

 

posted @ 2018-06-14 13:12  fuzzier  阅读(328)  评论(0编辑  收藏  举报