记录---Rosalind之problems&Solutions__0002

斐波那契数列

费波那契数列由0和1开始，之后的费波那契系数就是由之前的两数相加而得出。

首几个费波那契系数是：0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233……

起源于 Leonardo Fibonacci 描述兔子生长的数目时用上了这数列，假设：

第一个月初有一对刚诞生的兔子
第二个月之后（第三个月初）它们可以生育
每月每对可生育的兔子会诞生下一对新兔子
兔子永不死去

假设在n月有兔子总共a对，n+1月总共有b对。在n+2月必定总共有a+b对：因为在n+2月的时候，前一月（n+1月）的b对兔子可以存留至第n+2月（在当月属于新诞生的兔子尚不能生育）。而新生育出的兔子对数等于所有在n月就已存在的a对

### 方法一
def fib(n, k):
    previous1, previous2 = 1, 1
    for i in range(2, n):
        current = previous1 + k * previous2
        previous2 = previous1
        previous1 = current
    return current

with open('rosalind_fib.txt', 'r') as f:
    line = f.readlines()[0]
    n = line.split()[0] # n 个月
    k = line.split()[1]  # 每月每对可生育的兔子会诞生下 k 对新兔子
    print(int(n),int(k))
    print(fib(int(n),int(k)))

### 方法二
def fib(n, factor):
    if n < 2:
        return n
    return factor*fib(n-2, factor) + fib(n-1,factor)

fib(28,2)

计算DNA序列中GC含量

### 方法一
result = {}
with open('rosalind_gc.txt', 'r') as f:
    for line in f.readlines():
        line = line.rstrip().strip("\n")
        if line.startswith('>'):
            key = line[1:]
            dnalen,gccount = 0,0
        else:
            gccount = gccount + line.count("G") + line.count("C")
            dnalen = dnalen + len(line)
            gccontent=(gccount/dnalen)*100
            result[key] = gccontent
    print(result)
    maxgc = max(zip(result.values(),result.keys()))
    print('%s\n%.6f%%' % (maxgc[1], maxgc[0]))
        
### 方法二
max_gc_name, max_gc_content = '', 0
with open('rosalind_gc.txt', 'r') as f:
    buf = f.readline().rstrip()
    while buf:
        seq_name, seq = buf[1:], ''
        buf = f.readline().rstrip()
        while not buf.startswith('>') and buf:
            seq = seq + buf
            buf = f.readline().rstrip()
        seq_gc_content = (seq.count('C') + seq.count('G'))/float(len(seq))
        if seq_gc_content > max_gc_content:
            max_gc_name, max_gc_content = seq_name, seq_gc_content

print('%s\n%.6f%%' % (max_gc_name, max_gc_content * 100))

计算突变位点数（Counting Point Mutations）

汉明距离表示两个（相同长度）序列对应位不同的数量

在信息论中，两个等长字符串之间的汉明距离是两个字符串对应位置的不同字符的个数。

例如：

1011101 与 1001001 之间的汉明距离是 2。
2143896 与 2233796 之间的汉明距离是 3。
"toned" 与 "roses" 之间的汉明距离是 3。

### 方法一
a="GAGCCTACTAACGGGAT"
b="CATCGTAATGACGGCCT"
cnt = 0
for i in range(len(a)):
    if a[i] != b[i]: cnt+=1
print(cnt)

### 方法二
def hamming_distance(s, t):
    value = 0
    for a, b in zip(s, t):
        if a != b: value += 1
    return value

hamming_distance(a,b)

### 方法三
from operator import ne
with open("rosalind_hamm.txt", 'r') as infile:
    print(sum(map(ne, *infile.read().split())))

### 方法四
def hammingDistance(s1, s2): 
    """Return the Hamming distance between equal-length sequences""" 
    if len(s1) != len(s2): 
        raise ValueError("Undefined for sequences of unequal length") 
    return sum([a != b for a, b in zip(s1, s2)])

with open('rosalind_hamm.txt', 'r') as f:
    line = f.readlines()
    s1 = line[0][:-1]
    s2 = line[1][:-1]
    dH = hammingDistance(s1, s2)
    print(len(s1))
    print(dH)

posted @ 2018-04-13 21:51 AdaWongCorner 阅读(176) 评论(1) 编辑收藏举报

刷新页面返回顶部

Ada's Corner

keep learning......

记录---Rosalind之problems&Solutions__0002

斐波那契数列

计算DNA序列中GC含量

计算突变位点数（Counting Point Mutations）

公告