187. Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].
题目含义：找出给出字符串里面，连续10个字母出现多次的串。

方法一:

import java.util.ArrayList;
import java.util.Hashtable;
import java.util.List;
public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res = new ArrayList<>();
        Hashtable<String,Integer> temp = new Hashtable<>();
 
        for(int i=0;i<s.length()-9;i++){      //将每一个长度为10的子字符串进行遍历，没有就将其放进hashtable里面，有且现在之出现了一次就添加进结果里面。
            String subString = s.substring(i,i+10);
            if(temp.containsKey(subString)){
                int count=temp.get(subString);  //如果为1，则添加进结果，否则继续遍历
                if(count==1){
                    temp.remove(subString);
                    temp.put(subString,2);
                    res.add(subString);
                }
            }else{
                temp.put(subString,1);
            }
        }
        return res;
    }
}




方法二:

 1     public List<String> findRepeatedDnaSequences(String s) {
 2 //        http://www.cnblogs.com/grandyang/p/4284205.html
 3 //        A: 0100 0001　　C: 0100 0011　　G: 0100 0111　　T: 0101 0100,目的是利用位来区分字符，当然是越少位越好，通过观察发现，每个字符的后三位都不相同，
 4 //        故而我们可以用末尾三位来区分这四个字符。而题目要求是10个字符长度的串，每个字符用三位来区分，10个字符需要30位，在32位机上也OK。
 5 //        为了提取出后30位，我们还需要用个mask，取值为0x7ffffff，用此mask可取出后27位，再向左平移三位即可。
 6 //        算法的思想是，当取出第十个字符时，将其存在Set里，之后每向左移三位替换一个字符，查找新cur在set中是否出现，
 7 //        如果之前刚好出现过一次，则将当前字符串（从i-10到i）存入返回值的数组，如果从未出现过，则将cur添加到set中。
 8         Set<String> res = new HashSet<>();
 9         if (s.length()<=10) return new ArrayList<>();
10         int mask = 0x7ffffff;
11         Set<Integer> set = new HashSet<>();
12         int cur=0,i=0;
13         while (i<9)
14         {
15             cur = (cur<<3)|(s.charAt(i++)&7);
16         }
17         while (i<s.length())
18         {
19             cur = ((cur & mask) << 3) | (s.charAt(i++) & 7);
20             if (set.contains(cur))
21             {
22                 res.add(s.substring(i-10,i));
23             }
24             else set.add(cur);
25         }
26         return new ArrayList<>(res) ;     
27     }