python 中 根据基因位置信息在基因组fasta文件中获取对应的基因序列

 

001、

(base) root@PC1:/home/test2# ls
a.fasta  list.txt  test.py
(base) root@PC1:/home/test2# head a.fasta                  ## 基因组fasta文件
>NC_000964.3 Bacillus subtilis subsp. subtilis str. 168 chromosome, complete genome
ATCTTTTTCGGCTTTTTTTAGTATCCACAGAGGTTATCGACAACATTTTCACATTACCAACCCCTGTGGACAAGGTTTTT
TCAACAGGTTGTCCGCTTTGTGGATAAGATTGTGACAACCATTGCAAGCTCTCGTTTATTTTGGTATTATATTTGTGTTT
TAACTCTTGATTACTAATCCTACCTTTCCTCTTTATCCACAAAGTGTGGATAAGTTGTGGATTGATTTCACACAGCTTGT
GTAGAAGGTTGTCCACAAGTTGTGAAATTTGTCGAAAAGCTATTTATCTACTATATTATATGTTTTCAACATTTAATGTG
TACGAATGGTAAGCGCCATTTGCTCTTTTTTTGTGTTCTATAACAGAGAAAGACGCCATTTTCTAAGAAAAGGAGGGACG
TGCCGGAAGATGGAAAATATATTAGACCTGTGGAACCAAGCCCTTGCTCAAATCGAAAAAAAGTTGAGCAAACCGAGTTT
TGAGACTTGGATGAAGTCAACCAAAGCCCACTCACTGCAAGGCGATACATTAACAATCACGGCTCCCAATGAATTTGCCA
GAGACTGGCTGGAGTCCAGATACTTGCATCTGATTGCAGATACTATATATGAATTAACCGGGGAAGAATTGAGCATTAAG
TTTGTCATTCCTCAAAATCAAGATGTTGAGGACTTTATGCCGAAACCGCAAGTCAAAAAAGCGGTCAAAGAAGATACATC
(base) root@PC1:/home/test2# cat list.txt                 ## 基因位置信息
gene46  NC_000964.3     42917   43660   +
NP_387934.1     NC_000964.3     59504   60070   +
yfmC    NC_000964.3     825787  826834  -
cds821  NC_000964.3     885844  886173  -
(base) root@PC1:/home/test2# cat test.py                  ## 测试程序
#!/usr/bin/python

in_file1 = open("list.txt", "r")
in_file2 = open("a.fasta", "r")
out_file = open("result.txt", "w")

dict1 = dict()
dict2 = dict()

for i in in_file1:
    i = i.strip().split()
    dict1[i[0]] = [i[1], int(i[2]) - 1, int(i[3]), i[4]]

for i in in_file2:
    i = i.strip()
    if i[0] == ">":
        key = i.split()[0]
        dict2[key] = ""
    else:
        dict2[key] += i


def com_pro(str):
    dict3 = {"a":"t", "t":"a", "c":"g", "g":"c", "n":"n", "A":"T", "T":"A", "C":"G", "G":"C", "N":"N"}
    str1 = reversed(str)
    result_list = [dict3[k] for k in str1]
    return ("".join(result_list))

for i,j in dict1.items():
    print(i,  "[" + j[0] , j[1] + 1 , j[2] , j[3] + "]", file = out_file)
    seq = dict2[">" + j[0]][j[1]:j[2]]
    if j[3] == "+":
        print(seq, file = out_file)
    if j[3] == "-":
        seq = com_pro(seq)
        print(seq, file = out_file)

in_file1.close()
in_file2.close()
out_file.close()
(base) root@PC1:/home/test2# python test.py   ## 运行程序
(base) root@PC1:/home/test2# ls
a.fasta  list.txt  result.txt  test.py
(base) root@PC1:/home/test2# cat result.txt      ## 程序运行结果
gene46 [NC_000964.3 42917 43660 +]
ATGGTTTCATTACATGATGATGAAAGATTAGATTATTTGCTGGCAGAGGACATGAAAATCATACAAAGCCCAACAGTGTTTGCTTTTTCGTTGGACGCTGTGCTTCTGTCCAAATTTGCGTACGTTCCGATTCAAAAAGGGAAAATTGTTGATTTATGCACCGGCAATGGTATTGTGCCGCTGCTGCTCAGTACAAGATCAAAAGCAGACATTCTGGGAGTCGAAATTCAAGAAAGACTGCATGATATGGCTGTTCGCAGCGTGGAGTATAATAAGTTGGACGATCAGATCCAGATCATACATGATGACCTGAAAAACATGCCGGAGAAACTTGGACATAATCGATATGATGTTGTCACCTGCAATCCGCCGTATTTTAAAACGCCGAAACAAACTGAACAAAACATGAACGAGCATCTCCGAATCGCAAGACATGAAATCCACTGCACGCTGGAGGATGTCATTTCAGTCAGCAGCAAGCTGCTCAAGCAAGGGGGAAAAGCAGCTCTTGTTCACCGGCCGGGAAGGCTTCTGGAGATTTTTGAACTGATGAAGGCTTATCAAATCGAGCCGAAACGTGTACAATTTGTCTATCCGAAGCAAGGGAAAGAAGCCAATACCATTTTGGTTGAAGGTATCAAAGGCGGGCGCCCGGATTTGAAAATTCTTCCTCCCTTATTCGTATATGATGAACAAAATGAATATACAAAAGAAATCAGGACCATTTTATATGGAGACAAATAA
NP_387934.1 [NC_000964.3 59504 60070 +]
ATGCTTGTGATTGCCGGTCTCGGAAACCCGGGGAAGAACTATGAAAATACACGGCATAATGTCGGATTTATGGTGATAGATCAGCTTGCAAAGGAATGGAATATAGAGCTGAATCAAAATAAATTTAACGGATTATACGGAACCGGATTTGTTTCCGGCAAAAAGGTTCTACTTGTTAAACCGCTTACATATATGAATTTATCAGGAGAATGTTTGCGGCCTTTAATGGACTACTATGATGTCGATAACGAAGATTTGACAGTCATTTACGACGACCTTGACCTTCCGACTGGCAAGATCCGTTTAAGAACGAAAGGAAGCGCCGGAGGGCACAATGGCATCAAATCACTGATCCAGCATCTTGGAACGTCCGAGTTTGACCGTATCCGCATCGGAATCGGCCGGCCTGTAAACGGCATGAAGGTCGTTGATTATGTGTTAGGCTCCTTTACCAAGGAGGAGGCACCTGAGATCGAAGAAGCGGTTGATAAATCTGTGAAGGCTTGTGAGGCTTCTTTGAGTAAACCGTTTTTAGAAGTCATGAACGAATTTAACGCAAAGGTATAA
yfmC [NC_000964.3 825787 826834 -]
CTTTCTTTACTAAAAAAATATTGACATGATAAGCCATGCTATTATAGTGTTACATGTGATAATGATTCTCATTACTAAATCTGAAAAAAGGAAGAATGACATGCGCACCTATTCTAACAAGTTGATTGCCATCATGAGTGTTTTATTGCTCGCCTGCCTCATTGTATCCGGCTGTTCATCAAGCCAGAATAACAACGGAAGCGGCAAAAGCGAGTCTAAGGATTCCAGAGTGATCCATGACGAAGAAGGAAAAACGACAGTAAGCGGCACACCTAAGCGGGTGGTTGTGCTTGAGCTTTCATTCTTGGATGCCGTTCACAATCTCGGCATTACGCCGGTGGGCATCGCAGATGACAACAAAAAAGATATGATTAAAAAGCTTGTCGGCAGCTCCATTGATTACACATCTGTAGGCACACGCAGCGAACCCAATCTTGAGGTCATCAGTTCCTTGAAGCCTGATTTAATCATCGCTGACGCTGAGCGCCATAAAAACATTTATAAACAGCTGAAAAAAATCGCCCCGACGATTGAATTAAAAAGCCGTGAAGCGACATATGACGAAACGATCGACAGCTTTACGACCATTGCTAAAGCATTAAATAAAGAAGATGAAGGAAAAGAAAAGCTTGCCGAGCACAAAAAAGTCATCAACGATCTAAAAGCCGAACTTCCGAAAGATGAAAACCGCAACATCGTTCTCGGCGTTGCAAGAGCGGATTCCTTCCAGCTTCATACATCATCATCCTATGACGGAGAAATCTTTAAAATGCTAGGCTTTACACACGCTGTGAAGTCAGATAACGCCTATCAAGAGGTCAGCCTTGAGCAATTGAGCAAAATCGATCCTGATATTTTGTTCATCTCAGCCAACGAAGGCAAAACCATTGTAGATGAGTGGAAAACGAACCCGCTCTGGAAAAATCTCAAAGCGGTGAAAAATGGACAAGTCTATGATGCGGACCGTGACACTTGGACAAGATTCAGAGGCATCAAGTCTAGTGAAACAAGCGCCAAAGATGTGCTTAAAAAAGTGTATAATAAATAG
cds821 [NC_000964.3 885844 886173 -]
ATGATGCTGATTACCATTCTTTTATTTCTCGCGGCAGGGCTTGCTGAAATTGGCGGCGGATATCTGGTTTGGCTATGGCTGAGAGAGGCAAAGCCAGCTGGCTACGGAATCGCCGGGGCGCTGATCCTCATTGTATACGGCATTCTTCCGACGTTTCAGTCCTTCCCATCTTTCGGCCGTGTATACGCCGCTTATGGCGGAGTATTCATCGTGCTTGCGGTCCTGTGGGGATGGCTTGTTGACCGGAAAACACCTGATCTGTATGACTGGATCGGCGCATTCATTTGTCTCATCGGTGTCTGTGTTATTTTATTTGCGCCGCGCGGATAA

 

参考:https://mp.weixin.qq.com/s?__biz=MzIxNzc1Mzk3NQ==&mid=2247491504&idx=1&sn=4ac56dfb5cae9cf101b95c64b2585915&chksm=97f5afa8a08226be7ff80e8f85093295d6370dd4f014d2bc67f0302d9c794110709de7a12818&scene=178&cur_album_id=2403674812188688386#rd

 

posted @ 2022-08-10 00:13  小鲨鱼2018  阅读(706)  评论(0编辑  收藏  举报