提取出一个组装基因组的gap（N）和重复序列区域，保存为bed格式

参见：

Question: How to extract all non-seqenced positions from a genome (Fasta file)?

test.fa

>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNtaaattgttt
taaattgtttctgtttgcagttgacatgatctNNNNNatagaaaacacca
ataactctgccaaaaaatttagaattcataaatgaatttagtaaagttgc
>chr2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNtaaattgttt
taaattgtttctgtttgcagttgacatgatcttatatatagaaaacacca
ataactctgccaaaaaatttagaattcataaatgaatttagtaaagttgc

perl一行命令

1	`perl -ne` `'chomp;if( />(.*)/){$head = $1; $i=0; next};@a=split("",$_); foreach(@a){$i++;if($_ eq "N" && $s ==0 ){print "$head\t$i"; $s =1}elsif($s==1 && $_ ne "N"){print "\t$i\n";$s=0}}'` `test.fa`

转为规范化的bed

1	`cat` `gap.bed \|` `awk` `'BEGIN{i=0}{i++;print $1,$5,$6,"Gap"i}'` `> gap.2.bed`

posted @ 2018-03-25 23:14 Life·Intelligence 阅读(1600) 评论(0) 编辑收藏举报

刷新页面返回顶部

（评论功能已被禁用）

2025年3月

日

一

二

三

四

五

六

Digital-LI

提取出一个组装基因组的gap（N）和重复序列区域，保存为bed格式

搜索

我的标签

积分与排名

阅读排行榜