08 Translating RNA into Protein
Problem
The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings.
The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet.
Given: An RNA string ss corresponding to a strand of mRNA (of length at most 10 kbp).
Return: The protein string encoded by ss.
Sample Dataset
AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA
Sample Output
MAMAPRTEINSTRING
方法一:
# -*- coding: utf-8 -*- ### 8. Translating RNA into Protein ### import re from collections import OrderedDict codonTable = OrderedDict() with open('rna_codon_table.txt') as f: for line in f: line = line.rstrip() lst = re.split('\s+', line) #\s+ 匹配空格1次或无限次 for i in [0, 2, 4, 6]: codonTable[lst[i]] = lst[i + 1] rnaSeq = '' with open('rosalind_prot.txt', 'rt') as f: for line in f: line = line.rstrip() rnaSeq += line.upper() aminoAcids = [] i = 0 while i < len(rnaSeq): codon = rnaSeq[i:i + 3] if codonTable[codon] != 'Stop': aminoAcids.append(codonTable[codon]) i += 3 peptide = ''.join(aminoAcids) print (peptide)
方法二:
def translate_rna(sequence): codonTable = { 'AUA':'I', 'AUC':'I', 'AUU':'I', 'AUG':'M', 'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACU':'T', 'AAC':'N', 'AAU':'N', 'AAA':'K', 'AAG':'K', 'AGC':'S', 'AGU':'S', 'AGA':'R', 'AGG':'R', 'CUA':'L', 'CUC':'L', 'CUG':'L', 'CUU':'L', 'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCU':'P', 'CAC':'H', 'CAU':'H', 'CAA':'Q', 'CAG':'Q', 'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGU':'R', 'GUA':'V', 'GUC':'V', 'GUG':'V', 'GUU':'V', 'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCU':'A', 'GAC':'D', 'GAU':'D', 'GAA':'E', 'GAG':'E', 'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGU':'G', 'UCA':'S', 'UCC':'S', 'UCG':'S', 'UCU':'S', 'UUC':'F', 'UUU':'F', 'UUA':'L', 'UUG':'L', 'UAC':'Y', 'UAU':'Y', 'UAA':'', 'UAG':'', 'UGC':'C', 'UGU':'C', 'UGA':'', 'UGG':'W', } proteinsequence = '' for n in range(0,len(sequence),3): if sequence[n:n+3] in codonTable.keys(): proteinsequence += codonTable[sequence[n:n+3]] return proteinsequence se = open('rosalind_prot.txt').read().strip('\n') #sequence
方法三:
from Bio.Seq import Seq from Bio.Alphabet import generic_dna, generic_rna # translation messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG", generic_rna) messenger_rna.translate() # reverse complement my_dna = Seq("AGTACACTGGT", generic_dna) my_dna.reverse_complement()