[leedcode 187] Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
public class Solution { public List<String> findRepeatedDnaSequences(String s) { //因为只有4个字母,所以可以创建自己的hashkey, 每两个BITS, 对应一个 incoming character. 超过20bit 即10个字符时, 只保留20bits. Map<Character,Integer> map=new HashMap<Character,Integer>(); map.put('A',0); map.put('C',1); map.put('G',2); map.put('T',3); List<String> res=new ArrayList<String>(); int hash=0; Set<Integer> set=new HashSet<Integer>(); for(int i=0;i<s.length();i++){ char c=s.charAt(i); if(i<9){ hash=(hash<<2)+map.get(c); }else{ hash=(hash<<2)+map.get(c); hash&=(1<<20)-1; if(set.contains(hash)){ if(!res.contains(s.substring(i-9,i+1))) res.add(s.substring(i-9,i+1)); }else{ set.add(hash); } } } return res; } }