Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
参考:http://blog.csdn.net/coderhuhy/article/details/43647731
1 import java.util.ArrayList; 2 import java.util.HashMap; 3 import java.util.List; 4 import java.util.Map; 5 6 7 public class Solution { 8 public List<String> findRepeatedDnaSequences(String s) { 9 if(s.length() < 10) 10 return new ArrayList<String>(); 11 List<String> result = new ArrayList<String>(); //结果集 12 Map<Character, Integer> dict = new HashMap<Character, Integer>(); //ACGT对应的整数编码 13 Map<Integer, Integer> check = new HashMap<Integer, Integer>(); //存放已经放到结果集中字符串,用于去重 14 Map<Integer, Integer> subValue = new HashMap<Integer, Integer>(); //遍历过的子串,用于检查重复子串 15 int erase = 0x0003ffff; 16 17 dict.put('A', 0); 18 dict.put('C', 1); 19 dict.put('G', 2); 20 dict.put('T', 3); 21 22 int hint = 0; 23 24 for(int i = 0; i < 10; i++){ 25 hint <<= 2; 26 hint += dict.get(s.charAt(i)); 27 } 28 29 subValue.put(hint, 1); 30 // 31 for(int i = 10; i < s.length(); i++){ 32 hint = ((hint & erase) << 2) + dict.get(s.charAt(i)); 33 if(subValue.get(hint) != null){ //遇到重复的子串 34 if(check.get(hint) == null){ //结果集中没有,放到结果集和check中 35 result.add(s.substring(i - 9, i + 1)); 36 check.put(hint, 1); 37 }//if 38 }//if 39 subValue.put(hint, 1); 40 } 41 42 return result; 43 } 44 }
Please call me JiangYouDang!