从fasta中提取或者过滤掉多个序列

Google了一下，现成的工具不多。

自己写代码也可以，就是速度肯定不快，而且每次写也很麻烦。

偶然看到QIIME的filter_fasta.py有这个功能，从name list中提取多个序列。

1	`filter_fasta.py -f extract_no_N_200.fasta -o remain.fasta -s out.list`

[REQUIRED]
 
-f, --input_fasta_fp
Path to the input fasta file
-o, --output_fasta_fp
The output fasta filepath
[OPTIONAL]
 
-m, --otu_map
An OTU map where sequences ids are those which should be retained.
-s, --seq_id_fp
A list of sequence identifiers (or tab-delimited lines with a seq identifier in the first field) which should be retained.
-b, --biom_fp
A biom file where otu identifiers should be retained.
-a, --subject_fasta_fp
A fasta file where the seq ids should be retained.
-p, --seq_id_prefix
Keep seqs where seq_id starts with this prefix.
--sample_id_fp
Keep seqs where seq_id starts with a sample id listed in this file. Must be newline delimited and may not contain a header.
-n, --negate
Discard passed seq ids rather than keep passed seq ids. [default: False]
--mapping_fp
Mapping file path (for use with –valid_states). [default: None]
--valid_states
Description of sample ids to retain (for use with –mapping_fp). [default: None]

60w条序列瞬间就处理完了。　　

posted @ 2018-03-21 19:19 Life·Intelligence 阅读(2856) 评论(0) 编辑收藏举报

刷新页面返回顶部

（评论功能已被禁用）

2025年3月

日

一

二

三

四

五

六

Digital-LI

从fasta中提取或者过滤掉多个序列

搜索

我的标签

积分与排名

阅读排行榜