python中 对序列进行碱基统计

 

001、测试序列,碱基序列保存只a.fa文件中,统计下面这段序列中A、C、G、T碱基的个数

[root@PC1 test01]# ls
a.fa
[root@PC1 test01]# cat a.fa      ## 测试fasta文件
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC

 

002、利用基本循环统计

[root@PC1 test01]# ls
a.fa  count.py
[root@PC1 test01]# cat count.py     ## 统计程序
#!/usr/bin/env python
# -*- coding: utf-8 -*-

in_file = open("a.fa", "r")
a = 0; c = 0; g = 0; t = 0
for i in in_file:
        i = i.strip()
        for j in i:
                if j == "A":
                        a += 1
                elif j == "C":
                        c += 1
                elif j == "G":
                        g += 1
                elif j == "T":
                        t += 1
                else:
                        print("anomanous letter!" + j)
                        break
#in_file.close()
print(a, c, g, t)

 

执行程序:

[root@PC1 test01]# ls
a.fa  count.py
[root@PC1 test01]# cat a.fa            ## 测试文件
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
[root@PC1 test01]# python3 count.py    ## 执行程序
20 12 17 21

 

003、利用字符串计数函数实现

[root@PC1 test01]# ls
a.fa  count.py
[root@PC1 test01]# cat a.fa       ## 测试数据
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
[root@PC1 test01]# cat count.py     ## 统计程序
#!/usr/bin/env python
# -*- coding: utf-8 -*-

in_file = open("a.fa", "r")
file = in_file.read()

print(file.count("A"), file.count("C"), file.count("G"), file.count("T"))
[root@PC1 test01]# python3 count.py    ## 执行程序
20 12 17 21

 

004、借助函数实现

[root@PC1 test01]# ls
a.fa  count.py
[root@PC1 test01]# cat a.fa        ## 测试序列
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
[root@PC1 test01]# cat count.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-

from collections import defaultdict

in_file = open("a.fa", "r")

def fun_test(dna):                  ## 定义统计函数
        d = defaultdict(int)
        for i in dna:
                d[i] += 1
        return "%d %d %d %d" % (d["A"], d["C"], d["G"], d["T"])

dna = in_file.read()
print(fun_test(dna))
[root@PC1 test01]# python3 count.py    ## 执行程序
20 12 17 21

 。

 

005、

[root@PC1 test01]# ls
test.py
[root@PC1 test01]# cat test.py         ## 测试程序
#!/usr/bin/env python
# -*- coding:utf-8 -*-

from collections import defaultdict
str1 = "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
dict1 = defaultdict(int)

for i in str1:
        dict1[i.upper()] += 1

for i in dict1:
        print(dict1[i], end = " ")
print("")
[root@PC1 test01]# python3 test.py    ## 执行程序
20 17 12 21

 

参考:https://mp.weixin.qq.com/s?__biz=MzIxMjQxMDYxNA%3D%3D&chksm=9747cb76a03042607fa7b573195dd3412908e2f55bf00dd6f97e26844c0771369c52b20924ca&idx=1&lang=zh_CN&mid=2247484143&scene=21&sn=403fdbe47688e8c1bf1768fb45940a1c&token=823599124#wechat_redirect

 

posted @ 2023-08-25 20:09  小鲨鱼2018  阅读(129)  评论(0编辑  收藏  举报