linux 系统中shell实现将fasta文件的碱基转换为一行及还原
1、测试数据
root@PC1:/home/test# cat a.fna ## 实现将碱基转换为1行, 其他信息不变 >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] gataaaaaataaatagaaacaaaatcactgaagaaCCAGTGTGCCTGCTCAGGTCAGATGAAGCCAGAGGGCTGCCAGAG GGCAAGCGAGCTGCGTTGCCTGGAAAAAGTTAAACACACAGAGAGCATGGTGGCTCTGATACTTTCTAGAAGGATTAAAG TCACTTTCCCAGTCTTTATGAGAATTGGGCCGAAGCTTAGCTGGTGCAACGAATTTAGAAATGAATGCACTTGCATTTGA >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] AGATGATGTGTCTTTGCCTTGAgctaaaaattttagaataatctgaACGTCATCTGAGGAACCTGCTTCTGGCGTGGTTT TGGTGTCAGCATCTTCTCACCCTCTCTAGTAATTTTCAGTATGCATTTCTATTTTCGTGTAGTTATTTACAGGAGCATTT TATGGAAAACCGGCTCAAATCTTTTTGGGTGCAGGGGTAGTTCAAATGCACTGAGACCCTCAGTTTCACTTGCTAATCTC >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] CTCCAGAAACCCTGTTCTCCTCGAGTGACAAGGTCAGCAGGGCAGCACGTGTGTTCCTGTCACTGCCAACTCAAGAATAT GAAGTTTAAAGAGTTTCACCATCAAATGCAGTGTCGTGGACTGCCCCTGAACAGGTGTTTATAATCACGTGTGCAAGTGA AGCAAGCACAAATCCTCAGTGGAAAACGGGCAGAGGACACGAGCagacaattctttttaaaaactgcacaaATTAGCACA >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] CTAGGCACGGATGAGCGTGCCTACCGTGTTGCATGGAGGTAACAGATGCCAGAGCCCGGAGGAGGCGCAAAGCTCACAAA CAGATGCGGACCGCAGGAAGCCGGGACGGCCTTCCTCCCCTGAAGCAGGAGGACGCGCCCTACAGAAAGCCGCTCGATCC TCCAGGCATTTGTTGTGAGCACTTAATCATCATTCGATCATTTGACGTGTACTCACTAGTAAAAGGCAGGACTGTGTCCC
2、 awk + sed实现
root@PC1:/home/test# awk '{if($0 ~ /^[a-zA-Z]/) {printf("%s", $0)} else {print $0}}' a.fna | sed '$ s/$/\n/' | sed '2,$ s/>/\n>/' >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] gataaaaaataaatagaaacaaaatcactgaagaaCCAGTGTGCCTGCTCAGGTCAGATGAAGCCAGAGGGCTGCCAGAGGGCAAGCGAGCTGCGTTGCCTGGAAAAAGTTAAACACACAGAGAGCATGGTGGCTCTGATACTTTCTAGAAGGATTAAAGTCACTTTCCCAGTCTTTATGAGAATTGGGCCGAAGCTTAGCTGGTGCAACGAATTTAGAAATGAATGCACTTGCATTTGA >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] AGATGATGTGTCTTTGCCTTGAgctaaaaattttagaataatctgaACGTCATCTGAGGAACCTGCTTCTGGCGTGGTTTTGGTGTCAGCATCTTCTCACCCTCTCTAGTAATTTTCAGTATGCATTTCTATTTTCGTGTAGTTATTTACAGGAGCATTTTATGGAAAACCGGCTCAAATCTTTTTGGGTGCAGGGGTAGTTCAAATGCACTGAGACCCTCAGTTTCACTTGCTAATCTC >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] CTCCAGAAACCCTGTTCTCCTCGAGTGACAAGGTCAGCAGGGCAGCACGTGTGTTCCTGTCACTGCCAACTCAAGAATATGAAGTTTAAAGAGTTTCACCATCAAATGCAGTGTCGTGGACTGCCCCTGAACAGGTGTTTATAATCACGTGTGCAAGTGAAGCAAGCACAAATCCTCAGTGGAAAACGGGCAGAGGACACGAGCagacaattctttttaaaaactgcacaaATTAGCACA >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] CTAGGCACGGATGAGCGTGCCTACCGTGTTGCATGGAGGTAACAGATGCCAGAGCCCGGAGGAGGCGCAAAGCTCACAAACAGATGCGGACCGCAGGAAGCCGGGACGGCCTTCCTCCCCTGAAGCAGGAGGACGCGCCCTACAGAAAGCCGCTCGATCCTCCAGGCATTTGTTGTGAGCACTTAATCATCATTCGATCATTTGACGTGTACTCACTAGTAAAAGGCAGGACTGTGTCCC
3、利用正则表达式及sed预存储还原
root@PC1:/home/test# ls a.fna b.fna root@PC1:/home/test# cp b.fna b.fna_bak ## 要在源文件进行修改,先进行备份 root@PC1:/home/test# ls a.fna b.fna b.fna_bak root@PC1:/home/test# cat b.fna >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] gataaaaaataaatagaaacaaaatcactgaagaaCCAGTGTGCCTGCTCAGGTCAGATGAAGCCAGAGGGCTGCCAGAGGGCAAGCGAGCTGCGTTGCCTGGAAAAAGTTAAACACACAGAGAGCATGGTGGCTCTGATACTTTCTAGAAGGATTAAAGTCACTTTCCCAGTCTTTATGAGAATTGGGCCGAAGCTTAGCTGGTGCAACGAATTTAGAAATGAATGCACTTGCATTTGA >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] AGATGATGTGTCTTTGCCTTGAgctaaaaattttagaataatctgaACGTCATCTGAGGAACCTGCTTCTGGCGTGGTTTTGGTGTCAGCATCTTCTCACCCTCTCTAGTAATTTTCAGTATGCATTTCTATTTTCGTGTAGTTATTTACAGGAGCATTTTATGGAAAACCGGCTCAAATCTTTTTGGGTGCAGGGGTAGTTCAAATGCACTGAGACCCTCAGTTTCACTTGCTAATCTC >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] CTCCAGAAACCCTGTTCTCCTCGAGTGACAAGGTCAGCAGGGCAGCACGTGTGTTCCTGTCACTGCCAACTCAAGAATATGAAGTTTAAAGAGTTTCACCATCAAATGCAGTGTCGTGGACTGCCCCTGAACAGGTGTTTATAATCACGTGTGCAAGTGAAGCAAGCACAAATCCTCAGTGGAAAACGGGCAGAGGACACGAGCagacaattctttttaaaaactgcacaaATTAGCACA >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] CTAGGCACGGATGAGCGTGCCTACCGTGTTGCATGGAGGTAACAGATGCCAGAGCCCGGAGGAGGCGCAAAGCTCACAAACAGATGCGGACCGCAGGAAGCCGGGACGGCCTTCCTCCCCTGAAGCAGGAGGACGCGCCCTACAGAAAGCCGCTCGATCCTCCAGGCATTTGTTGTGAGCACTTAATCATCATTCGATCATTTGACGTGTACTCACTAGTAAAAGGCAGGACTGTGTCCC root@PC1:/home/test# for i in `seq 10`; do sed 's/\(^[a-zA-Z]\{50\}\)[^\n]/\1\n/' b.fna -i; done ## 50可修改为任意的碱基数 root@PC1:/home/test# cat b.fna >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] gataaaaaataaatagaaacaaaatcactgaagaaCCAGTGTGCCTGCTC GGTCAGATGAAGCCAGAGGGCTGCCAGAGGGCAAGCGAGCTGCGTTGCCT GAAAAAGTTAAACACACAGAGAGCATGGTGGCTCTGATACTTTCTAGAAG ATTAAAGTCACTTTCCCAGTCTTTATGAGAATTGGGCCGAAGCTTAGCTG TGCAACGAATTTAGAAATGAATGCACTTGCATTTGA >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] AGATGATGTGTCTTTGCCTTGAgctaaaaattttagaataatctgaACGT ATCTGAGGAACCTGCTTCTGGCGTGGTTTTGGTGTCAGCATCTTCTCACC TCTCTAGTAATTTTCAGTATGCATTTCTATTTTCGTGTAGTTATTTACAG AGCATTTTATGGAAAACCGGCTCAAATCTTTTTGGGTGCAGGGGTAGTTC AATGCACTGAGACCCTCAGTTTCACTTGCTAATCTC >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] CTCCAGAAACCCTGTTCTCCTCGAGTGACAAGGTCAGCAGGGCAGCACGT TGTTCCTGTCACTGCCAACTCAAGAATATGAAGTTTAAAGAGTTTCACCA CAAATGCAGTGTCGTGGACTGCCCCTGAACAGGTGTTTATAATCACGTGT CAAGTGAAGCAAGCACAAATCCTCAGTGGAAAACGGGCAGAGGACACGAG agacaattctttttaaaaactgcacaaATTAGCACA >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence] CTAGGCACGGATGAGCGTGCCTACCGTGTTGCATGGAGGTAACAGATGCC GAGCCCGGAGGAGGCGCAAAGCTCACAAACAGATGCGGACCGCAGGAAGC GGGACGGCCTTCCTCCCCTGAAGCAGGAGGACGCGCCCTACAGAAAGCCG TCGATCCTCCAGGCATTTGTTGTGAGCACTTAATCATCATTCGATCATTT ACGTGTACTCACTAGTAAAAGGCAGGACTGTGTCCC
分类:
linux shell
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
2020-11-22 linux 系统中semanage命令
2020-11-22 linux系统中 SElinux安全子系统
2020-11-22 linux 系统中httpd服务的配置参数