程序媛詹妮弗
终身学习

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

 

题意:

DNA序列里有ATCG四种,求所有长度为10,出现次数超过一次的序列。

 

Solution1: HashMap

1.  scan the given string,  put each 10-letter-long substring and its corresponding frequency into a map

2. looping each entrySet in the map, find if entry.getValue() > 1 

 

 

 

code

 1 /*
 2 Time Complexity: O(n)
 3 Space Complexity: O(n)
 4 */
 5 class Solution {
 6     public List<String> findRepeatedDnaSequences(String s) {
 7         List<String> result = new ArrayList<>();
 8         // corner case
 9         if (s.length() < 10) return result;
10         
11         Map<String, Integer> map = new HashMap<>();
12         for (int i = 0; i < s.length() - 9; ++i) {
13             String key = s.substring(i, i + 10);
14             if(map.containsKey(key)){
15                 map.put(key, map.get(key) + 1);
16             }else{
17                 map.put(key, 1);
18             } 
19         }
20 
21         for (Map.Entry<String, Integer> entry : map.entrySet()) {
22             if (entry.getValue() > 1) {
23                 result.add(entry.getKey());
24             }
25         }
26         return result;
27     }
28 }

 

posted on 2019-05-04 06:50  程序媛詹妮弗  阅读(142)  评论(0编辑  收藏  举报