比对软件 - 专题
An illustration of relationships between alignment methods.
The applications / corresponding computational restrictions shown are (green) short pairwise alignment / detailed edit model;
(yellow) database search / divergent homology detection;
(red) whole genome alignment / alignment of long sequences with structural rearrangements;
and (blue) short read mapping / rapid alignment of massive numbers of short sequences. Although solely illustrative, methods with more similar data structures or algorithmic approaches are on closer branches.
The BLASR method combines data structures from short read alignment with optimization methods from whole genome alignment.
用过的比对软件不多,只知道简单的全局比对和局部比对算法,比对软件的原理基本是不知道的。
现在用过的比对软件:bwa、bowtie、blasr、SHRiMP、DALIGNER、MHAP、blast、blat、SOAP、Subread、NovoAlign、Maq
还有:MEGABLAST、Mummer、GMAP、STAR、DIAMOND、ELAND、RMAP、ZOOM、SeqMap、CloudBurst
慢慢积累,比较这些软件的不同,因为生物信息最底层的就是比对,测序拿到一堆序列,第一件要做得事情就是比对。
先看一篇好文:Aligner tutorial: GMAP, STAR, BLAT, and BLASR
常用的核酸序列比对到底有哪几种?
- 二代短reads比对到genome
- 三代长reads比对到genome
- 剪切体比对
- 二代reads与三代reads比
- genome之间比
- 多序列比对
- 数据库比对
BWA
Burrows-Wheeler Aligner
适用范围:二代测序数据快速比对到genome上
bwa作为序列比对界的模式软件,短小精悍,适用于多种场合,很有必要搞懂他内部的比对算法,最好也搞懂它是如何实现的。
Fast and accurate short read alignment with Burrows–Wheeler transform - 2009 在线pdf 原文
lh3/bwa – Github Burrow-Wheeler Aligner for pairwise alignment between DNA sequences
- BWA-backtrack:illumina reads比对,最长支持100bp(aln/samse/sampe)
- BWA-SW:long-read比对,长度为70bp-1Mbp;支持剪切性比对(bwasw)
- BWA-MEM:最新,最常用,同SW,但更准更快,与backtrack相比在70-100bp更具性能优势(mem)
BWA方面主要有三篇学术论文:
- Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]. (if you use the BWA-backtrack algorithm)
- Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]. (if you use the BWA-SW algorithm)
- Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2 [q-bio.GN]. (if you use the BWA-MEM algorithm or the fastmap command, or want to cite the whole BWA package)
新一代测序技术中的短序列比对和组装算法 - 硕士论文
Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.7.15-r1140 Contact: Heng Li <lh3@sanger.ac.uk> Usage: bwa <command> [options] Command: index index sequences in the FASTA format mem BWA-MEM algorithm fastmap identify super-maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries shm manage indices in shared memory fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and Occ Note: To use BWA, you need to first index the genome with `bwa index'. There are three alignment algorithms in BWA: `mem', `bwasw', and `aln/samse/sampe'. If you are not sure which to use, try `bwa mem' first. Please `man ./bwa.1' for the manual.
bwa mem
bwa现在大家基本只用其mem比对算法了
还是单独开一片笔记吧
SOAPaligner/soap2
soap2 - 官方
SOAP系列的没有公布源码,都是二进制执行程序,所以免除了安装,同bwa一样,也是要先建索引再比对
SOAP不是很吃内存,把人的3G的基因组读到内存大概也就需要7G的内存,后面的比对都是不耗内存的。
./2bwt-builder ~/human_genome.fa ./soap –a <reads_a> -D <index.files> -o <output></output> ./soap –a <reads_a> -b <reads_b> -D <index.files> -o <PE_output> -2 <SE_output> -m <min_insert_size> -x <max_insert_size>
之前对SOAP一点印象都没有,但是不少同事都在用SOAP系列的软件。
主要是看了一个PPT,SOAP是有其比对上的优势的
可以看出,SOAP对错误率的容忍较高,对indel的容忍也很好,这就是我现在需要的,可以尝试一下用SOAP将二代比对到三代上。Mapping.ppt
BLASR
Basic Local Alignment with Successive Refinement
Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory - BMC Bioinformatics
待续~