187. Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].


java的int占了4个字节位,总共32位;本题是将A,T,C,G四个字母每个字母占两位进行考虑,十个字母长度就是20位,每达到20位就存找到hashset里面。答案有一个地方非常巧妙,就是创建
两个hashset,一个来存储出现一次,另一个hashset来存储出现第二次,这里面有一个细节,在if条件句里面&&连接的两个hashset,如果第一个为false,那么第二个将不会执行;代码如下:
 1 public class Solution {
 2     public List<String> findRepeatedDnaSequences(String s) {
 3         List<String> res = new ArrayList<String>();
 4         Set<Integer> words = new HashSet<Integer>();
 5         Set<Integer> doublewords = new HashSet<Integer>();
 6         int[] map = new int[26];
 7         map[0] = 0;
 8         map['C'-'A'] = 1;
 9         map['T'-'A'] = 2;
10         map['G'-'A'] = 3;
11         for(int i=0;i<s.length()-9;i++){
12             int v = 0;
13             for(int j=i;j<i+10;j++){
14                 v<<=2;
15                 v|=map[s.charAt(j)-'A'];
16             }
17             if(!words.add(v)&&doublewords.add(v)){
18                 res.add(s.substring(i,i+10));
19             }
20         }
21         return res;
22     }
23 }

 

posted @ 2017-03-21 01:05  CodesKiller  阅读(145)  评论(0编辑  收藏  举报