生物信息常用工具集锦

过滤

SOAPnuke:华大自主开发的一款针对fastq文件的过滤软件。

HTSeq-count:一款用于reads计数的轻便软件,作者介绍说可以用于多种mapping软件的输出结果,而我则用于tophat2的输出文件做计数。不过貌似所有能转换为sam格式文件的输出都可以用htseq-count计数。

RSeQC: An RNA-seq Quality Control Package

比对

BWA:应用最为广泛的比对软件,可以比二代,也可以比三代

Soap:华大开发的比对软件,全称SOAPaligner/soap2

bowtie2:常用于RNA-seq的比对

BLASR:专门用于比对三代reads

pynast:多重序列比对软件,主要用于处理16S序列

FastTree:超快的建树软件,同时处理1M级的序列,主要用于16S的建树

数据处理

SAMtools:专门用于处理SAM、BAM格式,SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments.

Picard:a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. 

VCFtools:a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project.

bcftools: utilities for variant calling and manipulating VCFs and BCFs.

bedtools:a powerful toolset for genome arithmetic, allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.

MAKER:an easy-to-use genome annotation pipeline designed for small research groups with little bioinformatics experience.

重测序

Reseqtools:A Toolkit for analyzing next-generation DNA Re-Sequencing data. 华大内部自己整理的工具。

组装

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs

SOAPdenovo

Platanus

DBG2OLC

CANU

Falcon

HGAP

变异检测 

GATK:the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. 最常用的call snp&indel 工具

BreakDancer:genome-wide detection of structural variants from next generation paired-end sequencing reads. 结构变异sv检测工具

CREST:(Clipping Reveals Structure), a new algorithm for detecting genomic structural variations at base-pair resolution using next-generation sequencing data.

CNVnator:a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads. 人重检测CNV

PennCNV:a free software tool for Copy Number Variation (CNV) detection from SNP genotyping arrays.

MIDAS:(Metagenomic Intra-species Diversity Analysis System), An integrated pipeline for estimating strain-level genomic variation from metagenomic data(可以对宏基因组 call variation)

GWAS

PLINK:whole genome association analysis toolset

SSR分析

MISA - MIcroSAtellite identification tool

SSRHunter - Simple Sequence Repeat Search tool

 

统计方法

DMM:(Dirichlet multinomial mixtures), probabilistic modelling of microbial metagenomics data.(宏基因组的概率建模)输入: frequency_matrix.csv,每行就是一个taxa,每一列都是其在每一个样本中的频率。输出:群体分析结果。The mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. 该方法就是群体的PCA分析,将类似的群体归于一类。

 

RNA

cd-hit:a very widely used program for clustering and comparing protein or nucleotide sequences. 去冗余

CPAT:using logistic regression model based on 4 pure sequence-based, linguistic features. 预测RNA的编码情况

GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences. RNA比对专用

 

持续添加~

posted @ 2017-02-13 11:36  Life·Intelligence  阅读(3430)  评论(0编辑  收藏  举报
TOP