[LeetCode] Repeated DNA Sequences
This link has a great discussion about this problem. You may refer to it if you like. In fact, the idea and code in this passage is from the former link.
Well, there is a very intuitive solution to this problem. That is, starting from the first letter of the string, extract a substring of length 10, check whether it has occurred and not been added to the result. If so, add it to the result; otherwise, visit the next letter and repeat the above process. However, a naive implementation of this idea will give the MLE error, and this is the real obstacle of the problem.
Then we need to save spaces. Instead of keeping the whole substring, can be convert it to other formats? Well, you have noticed that there are only 4 letters A, T, C, G in the substring. If we assign each letter 2 bits, then a 10-letter substring will only cost 20 bits and can thus be accommodated by a 32-bit integer, greatly lowering the space complexity.
Then you may put this idea into code and get an simple Accepted solution as follows. Congratulations!
1 class Solution { 2 public: 3 vector<string> findRepeatedDnaSequences(string s) { 4 unordered_map<int, int> mp; 5 vector<string> res; 6 int i = 0, code = 0; 7 while (i < 9) 8 code = ((code << 2) | mapping(s[i++])); 9 for (; i < (int)s.length(); i++) { 10 code = (((code << 2) & 0xfffff) | mapping(s[i])); 11 if (mp[code]++ == 1) 12 res.push_back(s.substr(i - 9, 10)); 13 } 14 return res; 15 } 16 private: 17 int mapping(char s) { 18 if (s == 'A') return 0; 19 if (s == 'C') return 1; 20 if (s == 'G') return 2; 21 if (s == 'T') return 3; 22 } 23 };
Do you see the logic in the above code? Well, we first merge 9 letters into code. Then, each time we meet a new letter, we merge it to code by | mapping(s[i]) and mask the leftmost letter by & 0xfffff (20 bits take 5 hexadecimal digits). Thus we have a code for the current 10-letter substring. We check whether it has occurred exactly for once to decide whether to push it to the result or not.
The above code can still be shorten using tricks from the above link. In fact, if we code A, T, C, G using 3 bits, the code will be as short as 10 lines! Refer to the above link to learn more!
【推荐】还在用 ECharts 开发大屏?试试这款永久免费的开源 BI 工具!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· .NET 原生驾驭 AI 新基建实战系列:向量数据库的应用与畅想
· 从问题排查到源码分析:ActiveMQ消费端频繁日志刷屏的秘密
· 一次Java后端服务间歇性响应慢的问题排查记录
· dotnet 源代码生成器分析器入门
· ASP.NET Core 模型验证消息的本地化新姿势
· 从零开始开发一个 MCP Server!
· ThreeJs-16智慧城市项目(重磅以及未来发展ai)
· .NET 原生驾驭 AI 新基建实战系列(一):向量数据库的应用与畅想
· Ai满嘴顺口溜,想考研?浪费我几个小时
· Browser-use 详细介绍&使用文档