187. Repeated DNA Sequences - Medium

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

 

用两个hashset,set和res。从s[0]开始,十位十位的扫,如果不能被加入set中,说明该子串重复出现,就要加入res中。

注意为了避免"AAAAAAAAAAAA"情况重复输出,在加入res时应该check一下是否已经存在该子串,或者res也用hashset

time: O(n), space: O(n)

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        Set<String> set = new HashSet<>();
        Set<String> res = new HashSet<>();
        
        for(int i = 0; i + 9 < s.length(); i++) {
            if(!set.add(s.substring(i, i + 10))) {
                res.add(s.substring(i, i + 10));
            }
        }
        return new ArrayList<>(res);
    }
}

 

posted @ 2019-01-03 08:02  fatttcat  阅读(109)  评论(0编辑  收藏  举报