187. Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].


滑动窗口,每次向后移动一个字符,因为直接存string会超出内存限制,所以转换为int
A -> 0
B -> 1
C -> 2
D -> 3
时间复杂度O(n),空间复杂度O(4^10)

public class Solution {
  public List<String> findRepeatedDnaSequences(String s) {
    Map<Integer, Integer> map = new HashMap<>();
    List<String> result = new ArrayList<>();
    for (int i = 0; i < s.length() - 9; i++) {
      int subStr = convertToInt(s, i, i + 10);
      if (map.containsKey(subStr)) {
        if (map.get(subStr) == 1) {
          result.add(s.substring(i, i + 10));
          map.put(subStr, map.get(subStr) + 1);
        }
      } else {
        map.put(subStr, 1);
      }
    }
    return result;
  }

  private int convertToInt(String s, int start, int end) {
    int res = 0;
    while (start < end) {
      char c = s.charAt(start);
      int v = 0;
      switch(c) {
        case 'A': v = 0; break;
        case 'C': v = 1; break;
        case 'G': v = 2; break;
        case 'T': v = 3; break;
      }
      res = res << 2 | v;
      start++;
    }
    return res;
  }
}

posted on 2015-04-09 11:29  shini  阅读(177)  评论(0编辑  收藏  举报

导航