从fasta中提取或者过滤掉多个序列
Google了一下,现成的工具不多。
自己写代码也可以,就是速度肯定不快,而且每次写也很麻烦。
偶然看到QIIME的filter_fasta.py有这个功能,从name list中提取多个序列。
filter_fasta.py -f extract_no_N_200.fasta -o remain.fasta -s out.list
[REQUIRED] -f, --input_fasta_fp Path to the input fasta file -o, --output_fasta_fp The output fasta filepath [OPTIONAL] -m, --otu_map An OTU map where sequences ids are those which should be retained. -s, --seq_id_fp A list of sequence identifiers (or tab-delimited lines with a seq identifier in the first field) which should be retained. -b, --biom_fp A biom file where otu identifiers should be retained. -a, --subject_fasta_fp A fasta file where the seq ids should be retained. -p, --seq_id_prefix Keep seqs where seq_id starts with this prefix. --sample_id_fp Keep seqs where seq_id starts with a sample id listed in this file. Must be newline delimited and may not contain a header. -n, --negate Discard passed seq ids rather than keep passed seq ids. [default: False] --mapping_fp Mapping file path (for use with –valid_states). [default: None] --valid_states Description of sample ids to retain (for use with –mapping_fp). [default: None]
60w条序列瞬间就处理完了。