python中统计fasta文件大小分别在1000000bp、100000 bp、10000bp、1000bp以上的scaffolds的总个数及长度

 

001、

(base) root@PC1:/home/test# ls
a.fasta  test.py
(base) root@PC1:/home/test# head a.fasta        ## 测试fasta文件
>scaffold_1
CCCGGGTAAAACGGGTCTTCAAGAAAACGCTCCTCCGTTAATGCCGGCCGATTCAAATAA
CCTCTGGCAACACCCGCTCCGGCAATGTATAGTTCACCGATACATCCAACAGGCAGCATC
CGCTGATTCTGATTCAGGATATACAATCTGACATGATGAACAGGTTTTCCAATTGGAATC
CGTTCAAGTTTTTCTTGCGGCGGACAATCAAAGAATGCAGCTTCTACGGTTGCTTCCGTT
GGCCCATAGGAATTGGTTATTGAAACATTTGGAAGCAACACGTGAAATCGGGAGACAAGA
TGGGTCCCCAGCTGTTCTCCCCCAGAAAACACTCGCTTGAGTCTGTTTGTCTTAATCGGT
ACAGAGCGATATTTTATATGTTCTAAAAATGCATGGAGCATTGAAGGCACAAAATGCATA
GCTGTGATCTTTTGTTCTTCTATGGCCTTCGCGATCACTTCAGGTTCTTTTTCGCCTCCC
TGAGGAAGCAGATAAACAGAAGCTCCGGCATAAGGCCACCAAAACAGCTCCCATATTGAA
(base) root@PC1:/home/test# cat test.py         ## 测试脚本
#!/usr/bin/python

in_file = open("a.fasta", "r")
out_file = open("result.txt", "w")

dict1 = {}
for i in in_file:
    i = i.strip()
    if i[0] == ">":
        key = i
        dict1[key] = 0
    else:
        dict1[key] += len(i)

dict2 = dict(zip([1000000, 100000, 10000,1000], [[0,0],[0,0],[0,0],[0,0]]))
for i in dict1:
    if dict1[i] > 1000000:
        dict2[1000000][0] += 1
        dict2[1000000][1] += dict1[i]
    if dict1[i] > 100000:
        dict2[100000][0] += 1
        dict2[100000][1] += dict1[i]
    if dict1[i] > 10000:
        dict2[10000][0] += 1
        dict2[10000][1] += dict1[i]
    if dict1[i] > 1000:
        dict2[1000][0] += 1
        dict2[1000][1] += dict1[i]

print("item", "count", "sum", file = out_file, sep = "\t")
for i in dict2:
    print(i, dict2[i][0], dict2[i][1], file = out_file, sep = "\t")

in_file.close()
out_file.close()
(base) root@PC1:/home/test# python test.py            ## 运行脚本
(base) root@PC1:/home/test# ls
a.fasta  result.txt  test.py
(base) root@PC1:/home/test# cat result.txt            ## 查看统计结果
item    count   sum
1000000 2       2305059
100000  6       3997314
10000   10      4220017
1000    15      4236731

 

参考:https://mp.weixin.qq.com/s?__biz=MzIxNzc1Mzk3NQ==&mid=2247491482&idx=1&sn=596fd0f0e7d41757e1e539f3223a8c8c&chksm=97f5af82a08226943da69bca8228480d4b708ca2c89f8008281f140682e8814b43cf49d60762&scene=178&cur_album_id=2403674812188688386#rd

 

posted @ 2022-08-08 16:04  小鲨鱼2018  阅读(113)  评论(0编辑  收藏  举报