python中将fasta文件按照每行指定碱基数输出
1、测试数据test.fa (一共两条染色体)
>OR4F5_ENSG00000186092_ENST00000641515_61_1038_2618 CCCAGATCTCTTCAGTTTTTATGCCTCATTCTGTGAAAATTGCTGTAGTCTCTTCCAGTTATGAAGAAGGTAACTGCAGAGGCTATTTCCTGGAATGAATCAACGAGTGAAACGAATAACTCTATGGTGACTGAATTCATTTTTCTGGGTCTCTCTGATTCTCAGGAACTCCAGACCTTCCTATTTATGTTGTTTTTTGTATTCTATGGAGGAATCGTGTTTGGAAACCTTCTTATTGTCATAACAGTGGTATCTGACTCCCACCTTCACTCTCCCATGTACTTCCTGCTAGCCAACCTCTCACTCATTGATCTGTCTCTGTCTTCAGTCACAGCCCCCAAGATGATTACTGACTTTTTCAGCCAGCGCAAAGTCATCTCTTTCAAGGGCTGCCTTGTTCAGATATTTCTCCTTCACTTCTTTGGTGGGAGTGAGATGGTGATCCTCATAGCCATGGGCTTTGACAGATATATAGCAATATGCAAGCCCCTACACTACACTACAATTATGTGTGGCAACGCATGTGTCGGCATTATGGCTGTCACATGGGGAATTGGCTTTCTCCATTCGGTGAGCCAGTTGGCGTTTGCCGTGCACTTACTCTTCTGTGGTCCCAATGAGGTCGATAGTTTTTATTGTGACCTTCCTAGGGTAATCAAACTTGCCTGTACAGATACCTACAGGCTAGATATTATGGTCATTGCTAACAGTGGTGTGCTCACTGTGTGTTCTTTTGTTCTTCTAATCATCTCATACACTATCATCCTAATGACCATCCAGCATCGCCCTTTAGATAAGTCGTCCAAAGCTCTGTCCACTTTGACTGCTCACATTACAGTAGTTCTTTTGTTCTTTGGACCATGTGTCTTTATTTATGCCTGGCCATTCCCCATCAAGTCATTAGATAAATTCCTTGCTGTATTTTATTCTGTGATCACCCCTCTCTTGAACCCAATTATATACACACTGAGGAACAAAGACATGAAGACGGCAATAAGACAGCTGAGAAAATGGGATGCACATTCTAGTGTAAAGTTTTAGATCTTATATAACTGTGAGATTAATCTCAGATAATGACACAAAATATAGTGAAGTTGGTAAGTTATTTAGTAAAGCTCATGAAAATTGTGCCCTCCATTCCCATATAATTTAGTAATTGTCTAGGAACTTCCACATACATTGCCTCAATTTATCTTTCAACAACTTGTGTGTTATATTTTGGAATACAGATACAAAGTTATTATGCTTTCAAAATATTCTTTTGCTAATTCTTAGAACAAAGAAAGGCATAAATATATTAGTATTTGTGTACACCTGTTCCTTCCTGTGTGACCCTAAGTTTAGTAGAAGAAAGGAGAGAAAATATAGCCTAGCTTATAAATTTAAAAAAAAATTTATTTGGTCCATTTTGTGAAAAACATAAAAAAAGAACTGTCACATCTTAATTTAAAAAATATATGCTTAGTGGTAAGGAGATATATGTCAACTTTTAAGAGGTTGAAAAACAAACGCCTCCCATTATAAGTTTATACTTCACCTCCCACCACTATAACAACCCAGAATCCATGAGGGCATTATCAGGAGTGAGTGGAAGAGTAAGTTTGCCAATGTGAAATGTGCCTTCTAGGTCCTAGACGTCTGTGGTATAACTGCTCATAAGCAGTAGAAAGAATTTAGAGGGATCCAGGCTCTCATCACGTTGGCACAAAGTATATTACTTGGATCCATCTATGTCATTTTCCATGGTTAATGTTTAAAAGCACAGGCTTTAAAGTAAAAAACAAAGAGCTGGATTCAACTCTACTGACTCTTATTAATCATGATTTTGGGCACATTACGTAGCTTTCATGAGCTTTAGTTTCTACATTTATAAACAGGAGATTATACCTATTATGCATGGTTATTATGAAGGAAAATGACAAAATAGATATAAATCAAATAGCCCACTTCGAGACATATTAAGCATGAATAAACATTAGATACTATTAAAATCCTATATATTAACAAAGCCAAAAGTTTCAAACTTTACTTTTTCCCAACATTCTTGTGAAATATGACACATCCCAATCTTAACAGATGCTCATTTGGGATACTGTACTTGTGAGTGGAAGTGTGTATATTTGTGTGCAAGTGTGTACTCATATACTTCCACCTTACCACCCTAGAAAGGCATGATGAAAATTTAAGATAGAAGGAAAATATAAATTGAAAAAAAAAAACCTTAACAAATGATTCTGACAAATATCTTCTCTTTCCAGGGAGAATCACTGAGCCAGAATAAAATTGAACACTAAATATTCTAAGAAAAAAGGAATCTAGTTTGTCAAAATGTGACTTGAATTAATAGATAAGGAGAGTCAGATGATAAGAGGGTCAAAATTATGTTTATCTTAGGAAAAGTAGAATAGAAAATTTATAAGCAGATTAAAAACACATAATAAAAGTAGTAAATAATAATGACAGTATCTCAAATCAGTGCAGGGGGGAAAGGCCTACTAATGTGATGGTGGGATAATTGGATAGCAATATGGGAAAAGATATATTTAATTTATTTGCTACACCAAATGCCAGGACAATCTCTAAGTGAATTCAAGACATAACTCTTTTTTCAAAAAAAC >OR4F29_ENSG00000284733_ENST00000426406_20_955_995 AGCCCAGTTGGCTGGACCAATGGATGGAGAGAATCACTCAGTGGTATCTGAGTTTTTGTTTCTGGGACTCACTCATTCATGGGAGATCCAGCTCCTCCTCCTAGTGTTTTCCTCTGTGCTCTATGTGGCAAGCATTACTGGAAACATCCTCATTGTGTTTTCTGTGACCACTGACCCTCACTTACACTCCCCCATGTACTTTCTACTGGCCAGTCTCTCCTTCATTGACTTAGGAGCCTGCTCTGTCACTTCTCCCAAGATGATTTATGACCTGTTCAGAAAGCGCAAAGTCATCTCCTTTGGAGGCTGCATCGCTCAAATCTTCTTCATCCACGTCGTTGGTGGTGTGGAGATGGTGCTGCTCATAGCCATGGCCTTTGACAGATATGTGGCCCTATGTAAGCCCCTCCACTATCTGACCATTATGAGCCCAAGAATGTGCCTTTCATTTCTGGCTGTTGCCTGGACCCTTGGTGTCAGTCACTCCCTGTTCCAACTGGCATTTCTTGTTAATTTAGCCTTCTGTGGCCCTAATGTGTTGGACAGCTTCTACTGTGACCTTCCTCGGCTTCTCAGACTAGCCTGTACCGACACCTACAGATTGCAGTTCATGGTCACTGTTAACAGTGGGTTTATCTGTGTGGGTACTTTCTTCATACTTCTAATCTCCTACGTCTTCATCCTGTTTACTGTTTGGAAACATTCCTCAGGTGGTTCATCCAAGGCCCTTTCCACTCTTTCAGCTCACAGCACAGTGGTCCTTTTGTTCTTTGGTCCACCCATGTTTGTGTATACACGGCCACACCCTAATTCACAGATGGACAAGTTTCTGGCTATTTTTGATGCAGTTCTCACTCCTTTTCTGAATCCAGTTGTCTATACATTCAGGAATAAGGAGATGAAGGCAGCAATAAAGAGAGTATGCAAACAGCTAGTGATTTACAAGAGGATCTCATAAATGATATAATAAGCCCTTCTCATTAAACATGATATGG
2、python脚本
root@PC1:/home/test# cat test.py # read fasta and save in dict input_fa = {} ## 此处定义一个空的字典 # loop for read with open('test.fa','r') as rawfa: ## 读入测试文件,并命名为rawfa for line in rawfa: ## 对rawfa逐行进行循环 line = line.strip('\n') ## 删除末尾的换行符 if line.startswith('>'): ## 如果匹配开始是> 的行 key = line ## 将该行定义为字典的键 input_fa[key] = '' ## 同时将该键对应的值定义为空 else: input_fa[key] += line.replace('\n','') ## 如果没有匹配>, 将这一行赋值给对应键的值, 同时删除末尾的换行符 # output save in another file output_fa = open('new_fasta.fa','w') ## 以写的形式打开一个文件,用于储存结果 # choose length you want to separate my_length = 20 ## 此处定义每行的字符数 # separate sequences for key,val in input_fa.items(): ## 对字典的键和值进行循环 output_fa.write(key + '\n') ## 将键写入文件,同时增加换行符 while len(val) > my_length: ## 利用while循环对值的长度进行判断,当长于设定的长度时,一直循环 output_fa.write(val[0:my_length] + '\n') ## 每次写入指定长度个碱基 val = val[my_length:len(val)] ## 此处是while循环的关键,每执行一次循环,val向右移动my_length个长度的碱基,也就是val长度每次减少my_length output_fa.write(val + '\n') ## 写入最后长度不足my_length的碱基 # file close output_fa.close() ## 关闭文件
3、执行脚本
root@PC1:/home/test# ls ## 测试数据和脚本 test.fa test.py root@PC1:/home/test# python3 test.py ## 执行脚本 root@PC1:/home/test# ls new_fasta.fa test.fa test.py ## 结果文件 root@PC1:/home/test# head new_fasta.fa ## 查看前10行 >OR4F29_ENSG00000284733_ENST00000426406_20_955_995 AGCCCAGTTGGCTGGACCAA TGGATGGAGAGAATCACTCA GTGGTATCTGAGTTTTTGTT TCTGGGACTCACTCATTCAT GGGAGATCCAGCTCCTCCTC CTAGTGTTTTCCTCTGTGCT CTATGTGGCAAGCATTACTG GAAACATCCTCATTGTGTTT TCTGTGACCACTGACCCTCA root@PC1:/home/test# tail new_fasta.fa ## 查看后10行, CACACCCTAATTCACAGATG GACAAGTTTCTGGCTATTTT TGATGCAGTTCTCACTCCTT TTCTGAATCCAGTTGTCTAT ACATTCAGGAATAAGGAGAT GAAGGCAGCAATAAAGAGAG TATGCAAACAGCTAGTGATT TACAAGAGGATCTCATAAAT GATATAATAAGCCCTTCTCA TTAAACATGATATGG
来源:https://mp.weixin.qq.com/s?__biz=MzkyMTI1MTYxNA==&mid=2247494994&idx=1&sn=e1db20f09eac3ce6296ce3236970dbfc&chksm=c184d723f6f35e35406d92953e3ba9c72faf40192a8f8ea8507ed85cd52eb76744aae597cc16&mpshare=1&scene=23&srcid=0202lOxgPkVBELbemkiqnRTf&sharer_sharetime=1643763712632&sharer_shareid=4ed060cc4cd1efce40e3ab6dd8d8c7d4#rd
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
2021-02-02 linux系统 centos8.3 中安装 Rsudio
2021-02-02 Error in .External2(C_X11, paste0("png::", filename), g$width, g$height, : 解决linux R绘图问题
2021-02-02 linux系统中使用R的Cairo绘制png格式图片
2021-02-02 configure: error: Cannot find cairo.h! Please install cairo (http://www.cairographics.org/) and/or set CAIRO_CFLAGS/LIBS correspondingly
2021-02-02 linux系统中安装R包