LeetCode Repeated DNA Sequences

原题链接在这里:https://leetcode.com/problems/repeated-dna-sequences/

题目:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

题解:

取长度为10的字符串,比较是否在之前出现过,这里可以用HashSet来保留之前出现过的substring.

也可以在把这长度为10的字符串生成hashcode来检查是否有重复.算hashcode时,A为00; C为01; G为10; T为11. 每次hash位运算左移两位,后两位就变成0了,再和新的字符代表数字做位运算或。 

Time Complexity: O(n). n = s.length().

Space: O(n).

AC Java:

 1 class Solution {
 2     public List<String> findRepeatedDnaSequences(String s) {
 3         List<String> res = new ArrayList<String>();
 4         if(s == null || s.length() < 10){
 5             return res;
 6         }
 7         
 8         HashSet<Integer> resSet = new HashSet<Integer>();
 9         HashSet<Integer> hs = new HashSet<Integer>();
10         char [] map = new char[26];
11         map['A'-'A'] = 0;
12         map['C'-'A'] = 1;
13         map['G'-'A'] = 2;
14         map['T'-'A'] = 3;
15         for(int i = 0; i<=s.length()-10; i++){
16             int hash = 0;
17             for(int j = i; j<i+10; j++){
18                 hash = (hash << 2) | map[s.charAt(j)-'A'];
19             }
20             
21             if(!hs.add(hash) && resSet.add(hash)){
22                 res.add(s.substring(i, i+10));
23             }
24         }
25         
26         return res;
27     }
28 }

 

posted @ 2015-11-02 12:10  Dylan_Java_NYC  阅读(199)  评论(0编辑  收藏  举报