[leetcode 187]Repeated DNA Sequences

1 题目

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

2 思路

刚开始想的要存储s.length()-10个字符串，来比较是否有重复，设置一个hashmap来存储所有的10字母序列，设置一个list来存储有重复的。发现提示内存超了。

后来把hashmap换成另一个list，发现时间超了。

网上一查，发现由于字符串只有A，C，G，T，所以可以把它转化为对应的0，1，2，3，只需两位就够了。这样就由 2 ＊ 16 字节变为 2 ＊10bit来表达一个字符串。节约了内存空间。

具体参考了http://blog.csdn.net/dddongdong/article/details/43758603，他讲的也更详细一些。个人认为节约的具体空间没有他说的那么多，应该与具体语言、编译器，硬件结构有关，但是肯定是能节约空间的。

3 代码

 1     private int myHash(String s){ //java int占4*8=32位，大于10*2=20位，不会溢出，若L大于16，这段代码应该有问题。
 2         int n = 0;  
 3         
 4         for(int i = 0; i < s.length(); i++){            
 5                 char c = s.charAt(i);  
 6                 if(c == 'C'){  
 7                     n += 1;  
 8                 }else if(c == 'G'){  
 9                     n += 2;  
10                 }else if(c == 'T'){  
11                     n += 3;  
12                 }  
13                 n <<=2; //左移两位，用两位表示一个字符
14         }  
15         return n;  
16     }  
17 
18     
19     public LinkedList<String> findRepeatedDnaSequences(String s){
20         int L = 10;
21         LinkedList<String> repeatedDnaSequences = new LinkedList<String>();
22         if(s == null || s.length() < L) return repeatedDnaSequences;
23         
24         HashMap<Integer, Boolean> tenLettlesHashMap = new HashMap<Integer, Boolean>();
25         for (int i = 0; i <= s.length() - L; i++) {
26             String string = s.substring(i, i + L);
27             int dnaSequences = myHash(string);
28             if (tenLettlesHashMap.containsKey(dnaSequences)) {
29                 if (!repeatedDnaSequences.contains(string)) {//防止加入重复的string
30                     repeatedDnaSequences.add(string);
31                 }
32             }else {
33                 tenLettlesHashMap.put(dnaSequences, true);//true还是false没关系，主要运用的是查找是否已经存在
34             }
35         }    
36         System.out.println(repeatedDnaSequences);
37         return repeatedDnaSequences;
38     }

posted on 2015-03-01 16:20 聆听V风声阅读(199) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

导航

[leetcode 187]Repeated DNA Sequences