[LeetCode] Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
用Map的话超内存了,改用bitsmap,因为只有4个字母,所以只要用两位就可以做为一个字母的编码,10个字母就是20位,所以创建一个2^20大小的数组就可以解决问题了。
1 class Solution { 2 public: 3 int getVal(char ch) { 4 if (ch == 'A') return 0; 5 if (ch == 'C') return 1; 6 if (ch == 'G') return 2; 7 if (ch == 'T') return 3; 8 } 9 10 vector<string> findRepeatedDnaSequences(string s) { 11 set<string> st; 12 vector<string> res; 13 string str; 14 if (s.length() < 10 || s == "") return res; 15 int mp[1024*1024] = {0}; 16 unsigned int val = 0; 17 for (int i = 0; i < 9; ++i) { 18 val <<= 2; 19 val |= getVal(s[i]); 20 } 21 for (int i = 9; i < s.length(); ++i) { 22 val <<= 14; 23 val >>= 12; 24 val |= getVal(s[i]); 25 ++mp[val]; 26 if (mp[val] > 1) { 27 str = s.substr(i-9, 10); 28 st.insert(str); 29 } 30 } 31 for (set<string>::iterator i = st.begin(); i != st.end(); ++i) { 32 res.push_back(*i); 33 } 34 return res; 35 } 36 };