边工作边刷题:70天一遍leetcode: day 4

Repeated DNA Sequences

要点:可以用rolling hash做,不过只beat了7%的python解法,还是用bit数组快。rolling hash还是就用左边高位吧,别老变来变去的,这样每次取余去掉高位。
错误点:rolling hash:rolling的公式可不是+=

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

class Solution(object):
    def findRepeatedDnaSequences(self, s):
        """
        :type s: str
        :rtype: List[str]
        """
        n = len(s)
        res = []
        if n<11: return res
        hmap = {}
        hval = 0
        cmap = {'A':0,'C':1,'G':2,'T':3}
        for i in xrange(10):
            hval = (hval<<2 & 0xfffff) + cmap[s[i]]
        
        hmap[hval]=1
        # print bin(hval)
        for i in xrange(10, n):
            hval = (hval<<2 & 0xfffff) + cmap[s[i]]
            if hval in hmap and hmap[hval]==1:
                res.append(s[i-9:i+1])
            if hval not in hmap:
                hmap[hval]=0
            hmap[hval]+=1
        return res
posted @ 2016-05-23 23:47  absolute100  阅读(116)  评论(0编辑  收藏  举报