python 中计算fasta文件中每一条序列的G、C含量

 

001、

(base) root@PC1:/home/test2# ls
a.fasta  test.py
(base) root@PC1:/home/test2# cat a.fasta            ## 测试fasta文件
>gene1 myc
AGCTGCCTAAGC
GGCATAGCTAATCG
>gene2 jun
ACCGAATCGGAGCGATG
GGCATTAAAGATCTAGCT
>gene3 malat1
AGGCTAGCGAG
GCGCGAG
GATTAGGCG
(base) root@PC1:/home/test2# cat test.py            ## 测试脚本
#!/usr/bin/python
in_file = open("a.fasta", "r")

dict1 = dict()
for i in in_file:
    i = i.strip()
    if i.startswith(">"):
        key = i[1:]
        dict1[key] = ""
    else:
        dict1[key] += i
for i,j in dict1.items():
    print(f"\n{i}:")
    print("G:", j.count("G")/len(j))
    print("C:", j.count("C")/len(j))

in_file.close()
(base) root@PC1:/home/test2# python test.py       ## 运行程序结果

gene1 myc:
G: 0.2692307692307692
C: 0.2692307692307692

gene2 jun:
G: 0.2857142857142857
C: 0.2

gene3 malat1:
G: 0.48148148148148145
C: 0.18518518518518517

 

参考:https://mp.weixin.qq.com/s?__biz=MzkyMTI1MTYxNA==&mid=2247493739&idx=1&sn=f690c93761307e6ec9bb77cca2eb4619&chksm=c184d21af6f35b0cda1d964ed896adee1091e1f615b7f6be0caf2508105275ca3ae66889c58e&mpshare=1&scene=23&srcid=0811LY0ghlyV0yNXki8WcW6m&sharer_sharetime=1660215059305&sharer_shareid=50b75c6a886e09824b582fb782a7678b#rd

 

posted @ 2022-08-11 19:24  小鲨鱼2018  阅读(101)  评论(0编辑  收藏  举报