如何在一序列中寻找E-BOX 元件,并将其变为小写

给定一个序列,寻找所有的E-BOX motif (CNNTTG) , 并且将其变为小写, 用 python 实现如下:

 

 1 #!/bin/python
 2 # Date: 2015.8.01
 3 # Author: 
 4 """ Search E-box motif(CNNTTG) in a DNA seq and replace this motif with lower character """
 5 
 6 import re                           # using regular expression
 7 
 8 dna_seq = open("/home/genome_seq.txt")
 9 dna_test = open("/home/test_new.txt","w")
10 
11 my_seq = ""                           # create a empity string not a list
12 for line in dna_seq:
13     my_seq = my_seq + line.rstrip("\n") # create a whole string with string concencating 
14                                         # remove the "\n"
15 Ebox_motif = re.findall(r"C[TACG][TACG]TTG", my_seq) # use regular expression to find the motif
16                                                  # use findall method to return a list contains the motif
17 seq = my_seq
18 
19 for motif in Ebox_motif:
20     seq = seq.replace(motif, motif.lower())          #  use str.replace() method  **** the key step*****
21 
22 dna_test.write(seq)                            # write the string to a file
23 dna_test.close()                               # good habit
24 dna_seq.close()                                # good habit

 

posted on 2015-09-17 18:42  OA_maque  阅读(775)  评论(0编辑  收藏  举报

导航