python 中 实现 fasta文件每一个scaffold的碱基按照一行输出

 

001、字典实现

(base) root@PC1:/home/test2# ls
a.fasta  test.py
(base) root@PC1:/home/test2# cat a.fasta                        ## 测试数据
>scaffold_1
CCCGGGTAAAACGGGTCTTCAAGAAAACGCTCCTCCGTTAATGCCGGCCGATTCAAATAA
CGCTGATTCTGATTCAGGATATACAATCTGACATGATGAACAGGTTTTCCAATTGGAATC
>scaffold_2
CACGCCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCCGATAAATGGATCACAGGTT
AAGTTCACCGCATCCTGCGGCGACACCTGTGTGGCCTGCGTCGTGCAGGCCCTAGTTTGA
CTGACTACGCACATCGCTGTGCGATTTATAAAAATGAATTAACAGGTACGTTTTGTCTTG
>scaffold_3
TTGATCCAGTGGCTCCGGTTACTCCAGTTGATCCTGTTGCGCCTGTTGCTCCAGTTTCTC
CGGTTGGTCCGGTTGATCCGGTTGCA
>scaffold_4
CCTGAGCCAGGATCAAACTCTCCGATAAATGGATCACAGGTTAAGTTCACCGCATCCTGC
GGCGACACCTGTGAGGTACGTTTTGTCTTGT
(base) root@PC1:/home/test2# cat test.py                        ## 测试脚本
#!/usr/bin/python
in_file = open("a.fasta", "r")
out_file = open("result.txt", "w")

dict1 = dict()
for i in in_file:
    i = i.strip()
    if i[0] == ">":
        key = i
        dict1[key] = ""
    else:
        dict1[key] = dict1[key] + i
for i in dict1:
    print(i, file = out_file)
    print(dict1[i], file = out_file)

in_file.close()
out_file.close()
(base) root@PC1:/home/test2# python test.py                     ## 执行程序
(base) root@PC1:/home/test2# ls
a.fasta  result.txt  test.py
(base) root@PC1:/home/test2# cat result.txt                     ## 查看运行结果
>scaffold_1
CCCGGGTAAAACGGGTCTTCAAGAAAACGCTCCTCCGTTAATGCCGGCCGATTCAAATAACGCTGATTCTGATTCAGGATATACAATCTGACATGATGAACAGGTTTTCCAATTGGAATC
>scaffold_2
CACGCCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCCGATAAATGGATCACAGGTTAAGTTCACCGCATCCTGCGGCGACACCTGTGTGGCCTGCGTCGTGCAGGCCCTAGTTTGACTGACTACGCACATCGCTGTGCGATTTATAAAAATGAATTAACAGGTACGTTTTGTCTTG
>scaffold_3
TTGATCCAGTGGCTCCGGTTACTCCAGTTGATCCTGTTGCGCCTGTTGCTCCAGTTTCTCCGGTTGGTCCGGTTGATCCGGTTGCA
>scaffold_4
CCTGAGCCAGGATCAAACTCTCCGATAAATGGATCACAGGTTAAGTTCACCGCATCCTGCGGCGACACCTGTGAGGTACGTTTTGTCTTGT

 

002、方法2

(base) root@PC1:/home/test2# ls
a.fasta  test.py
(base) root@PC1:/home/test2# cat a.fasta                 ## 测试数据
>scaffold_1
CCCGGGTAAAACGGGTCTTCAAGAAAACGCTCCTCCGTTAATGCCGGCCGATTCAAATAA
CGCTGATTCTGATTCAGGATATACAATCTGACATGATGAACAGGTTTTCCAATTGGAATC
>scaffold_2
CACGCCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCCGATAAATGGATCACAGGTT
AAGTTCACCGCATCCTGCGGCGACACCTGTGTGGCCTGCGTCGTGCAGGCCCTAGTTTGA
CTGACTACGCACATCGCTGTGCGATTTATAAAAATGAATTAACAGGTACGTTTTGTCTTG
>scaffold_3
TTGATCCAGTGGCTCCGGTTACTCCAGTTGATCCTGTTGCGCCTGTTGCTCCAGTTTCTC
CGGTTGGTCCGGTTGATCCGGTTGCA
>scaffold_4
CCTGAGCCAGGATCAAACTCTCCGATAAATGGATCACAGGTTAAGTTCACCGCATCCTGC
GGCGACACCTGTGAGGTACGTTTTGTCTTGT
(base) root@PC1:/home/test2# cat test.py                 ## 测试脚本
#!/usr/bin/python

in_file = open("a.fasta", "r")
out_file = open("result.txt", "w")
idx = 0
for i in in_file:
    if i[0] == ">" and idx == 0:
        idx += 1
        out_file.write(i)
    elif i[0] == ">":
        out_file.write("\n" + i)
    else:
        i = i.strip()
        out_file.write(i)
out_file.write("\n")

in_file.close()
out_file.close()
(base) root@PC1:/home/test2# python test.py               ## 运行程序
(base) root@PC1:/home/test2# ls
a.fasta  result.txt  test.py
(base) root@PC1:/home/test2# cat result.txt               ## 查看结果
>scaffold_1
CCCGGGTAAAACGGGTCTTCAAGAAAACGCTCCTCCGTTAATGCCGGCCGATTCAAATAACGCTGATTCTGATTCAGGATATACAATCTGACATGATGAACAGGTTTTCCAATTGGAATC
>scaffold_2
CACGCCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCCGATAAATGGATCACAGGTTAAGTTCACCGCATCCTGCGGCGACACCTGTGTGGCCTGCGTCGTGCAGGCCCTAGTTTGACTGACTACGCACATCGCTGTGCGATTTATAAAAATGAATTAACAGGTACGTTTTGTCTTG
>scaffold_3
TTGATCCAGTGGCTCCGGTTACTCCAGTTGATCCTGTTGCGCCTGTTGCTCCAGTTTCTCCGGTTGGTCCGGTTGATCCGGTTGCA
>scaffold_4
CCTGAGCCAGGATCAAACTCTCCGATAAATGGATCACAGGTTAAGTTCACCGCATCCTGCGGCGACACCTGTGAGGTACGTTTTGTCTTGT

 

参考:https://mp.weixin.qq.com/s?__biz=MzIxNzc1Mzk3NQ==&mid=2247491492&idx=1&sn=d9bfd396369b802700ef764d8174669d&chksm=97f5afbca08226aa255d2ee747a179a1e13e9160baf87b4186f871e396e8b86c826a5ca51ef4&scene=178&cur_album_id=2403674812188688386#rd

 

posted @ 2022-08-09 11:41  小鲨鱼2018  阅读(78)  评论(0编辑  收藏  举报