把文件内容按照染色体分开写出

测试序列如下，text.txt:

chr2    43995310    43995986
chr17   49788603    49789067
chr17   59565573    59566163
chr19   8390308 8390745
chr12   49188033    49189033
chr7    974903  975570
chr7    98878532    98879500
chr7    44044672    44045322
chr1    153634052   153634772
chr11   60905850    60906575
 
直接看代码：

#encoding = utf-8

import sys
from collections import OrderedDict

7 
def readfasta(filename):

   tmp_dict = OrderedDict()

   with open(filename) as f:

       for line in f:
           line = line.rstrip().split(' ',1)

           chr_id = line[0]

           if chr_id not in tmp_dict:
               tmp_dict[chr_id] = line[1]

           else:
               tmp_dict[chr_id] += line[1]

   return tmp_dict

def seperatefile(filename,outfile):

   data = readfasta(filename)

   for chr_id,features in data.items():
       import os
       (name,ext) = os.path.splitext(outfile)

       with open('%s_%s%s' %(name,chr_id,ext),'w') as f_out:
           f_out.write('%s\n' %chr_id)
           f_out.write('%s\n' %features)

seperatefile('test.txt','output.txt')
 推荐论坛：生信技能树，http://biotrainee.com/forum.php/

posted @ 2017-05-22 17:44 Bio-Liu 阅读(565) 评论(0) 收藏举报

刷新页面返回顶部

Bio-Liu

把文件内容按照染色体分开写出

公告