比对软件 - 专题

image

An illustration of relationships between alignment methods.

The applications / corresponding computational restrictions shown are (green) short pairwise alignment / detailed edit model;

(yellow) database search / divergent homology detection;

(red) whole genome alignment / alignment of long sequences with structural rearrangements;

and (blue) short read mapping / rapid alignment of massive numbers of short sequences. Although solely illustrative, methods with more similar data structures or algorithmic approaches are on closer branches.

The BLASR method combines data structures from short read alignment with optimization methods from whole genome alignment.

用过的比对软件不多,只知道简单的全局比对和局部比对算法,比对软件的原理基本是不知道的。

现在用过的比对软件:bwa、bowtie、blasr、SHRiMP、DALIGNER、MHAP、blast、blat、SOAP、Subread、NovoAlign、Maq

还有:MEGABLAST、Mummer、GMAP、STAR、DIAMOND、ELAND、RMAP、ZOOM、SeqMap、CloudBurst

慢慢积累,比较这些软件的不同,因为生物信息最底层的就是比对,测序拿到一堆序列,第一件要做得事情就是比对。

先看一篇好文:Aligner tutorial: GMAP, STAR, BLAT, and BLASR

常用的核酸序列比对到底有哪几种?

  1. 二代短reads比对到genome
  2. 三代长reads比对到genome
  3. 剪切体比对
  4. 二代reads与三代reads比
  5. genome之间比
  6. 多序列比对
  7. 数据库比对

BWA


Burrows-Wheeler Aligner

适用范围:二代测序数据快速比对到genome上

bwa作为序列比对界的模式软件,短小精悍,适用于多种场合,很有必要搞懂他内部的比对算法,最好也搞懂它是如何实现的。

Fast and accurate short read alignment with Burrows–Wheeler transform  - 2009  在线pdf    原文

lh3/bwa – Github    Burrow-Wheeler Aligner for pairwise alignment between DNA sequences

  1. BWA-backtrack:illumina reads比对,最长支持100bp(aln/samse/sampe
  2. BWA-SW:long-read比对,长度为70bp-1Mbp;支持剪切性比对(bwasw
  3. BWA-MEM:最新,最常用,同SW,但更准更快,与backtrack相比在70-100bp更具性能优势(mem

BWA方面主要有三篇学术论文:

  1. Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]. (if you use the BWA-backtrack algorithm)
  2. Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]. (if you use the BWA-SW algorithm)
  3. Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2 [q-bio.GN]. (if you use the BWA-MEM algorithm or the fastmap command, or want to cite the whole BWA package)

BWA的设计思想

新一代测序技术中的短序列比对和组装算法 - 硕士论文

image

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.15-r1140
Contact: Heng Li <lh3@sanger.ac.uk>

Usage:   bwa <command> [options]

Command: index         index sequences in the FASTA format
         mem           BWA-MEM algorithm
         fastmap       identify super-maximal exact matches
         pemerge       merge overlapping paired ends (EXPERIMENTAL)
         aln           gapped/ungapped alignment
         samse         generate alignment (single ended)
         sampe         generate alignment (paired ended)
         bwasw         BWA-SW for long queries

         shm           manage indices in shared memory
         fa2pac        convert FASTA to PAC format
         pac2bwt       generate BWT from PAC
         pac2bwtgen    alternative algorithm for generating BWT
         bwtupdate     update .bwt to the new format
         bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with `bwa index'.
      There are three alignment algorithms in BWA: `mem', `bwasw', and
      `aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
      first. Please `man ./bwa.1' for the manual.

实用算法实现-第8篇 后缀树和后缀数组 [1简介]

bwa mem

bwa现在大家基本只用其mem比对算法了

还是单独开一片笔记吧

 

SOAPaligner/soap2

soap2 - 官方

SOAP系列的没有公布源码,都是二进制执行程序,所以免除了安装,同bwa一样,也是要先建索引再比对

SOAP不是很吃内存,把人的3G的基因组读到内存大概也就需要7G的内存,后面的比对都是不耗内存的。

./2bwt-builder ~/human_genome.fa
./soap –a <reads_a> -D <index.files> -o <output></output>
./soap –a <reads_a> -b <reads_b> -D <index.files> -o <PE_output> -2 <SE_output> -m <min_insert_size> -x <max_insert_size>

之前对SOAP一点印象都没有,但是不少同事都在用SOAP系列的软件。

主要是看了一个PPT,SOAP是有其比对上的优势的

imageimage

可以看出,SOAP对错误率的容忍较高,对indel的容忍也很好,这就是我现在需要的,可以尝试一下用SOAP将二代比对到三代上。Mapping.ppt

 

 

BLASR


Basic Local Alignment with Successive Refinement

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory - BMC Bioinformatics

 

待续~

posted @ 2016-12-20 15:29  Life·Intelligence  阅读(6912)  评论(0编辑  收藏  举报
TOP