LeetCode Repeated DNA Sequences
原题链接在这里:https://leetcode.com/problems/repeated-dna-sequences/
题目:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
题解:
取长度为10的字符串,比较是否在之前出现过,这里可以用HashSet来保留之前出现过的substring.
也可以在把这长度为10的字符串生成hashcode来检查是否有重复.算hashcode时,A为00; C为01; G为10; T为11. 每次hash位运算左移两位,后两位就变成0了,再和新的字符代表数字做位运算或。
Time Complexity: O(n). n = s.length().
Space: O(n).
AC Java:
1 class Solution { 2 public List<String> findRepeatedDnaSequences(String s) { 3 List<String> res = new ArrayList<String>(); 4 if(s == null || s.length() < 10){ 5 return res; 6 } 7 8 HashSet<Integer> resSet = new HashSet<Integer>(); 9 HashSet<Integer> hs = new HashSet<Integer>(); 10 char [] map = new char[26]; 11 map['A'-'A'] = 0; 12 map['C'-'A'] = 1; 13 map['G'-'A'] = 2; 14 map['T'-'A'] = 3; 15 for(int i = 0; i<=s.length()-10; i++){ 16 int hash = 0; 17 for(int j = i; j<i+10; j++){ 18 hash = (hash << 2) | map[s.charAt(j)-'A']; 19 } 20 21 if(!hs.add(hash) && resSet.add(hash)){ 22 res.add(s.substring(i, i+10)); 23 } 24 } 25 26 return res; 27 } 28 }