GMcloser安装及使用
GMcloser, a tool that accurately closes genomic gaps with a preassembled contig set or a long read set (i.e., error-corrected PacBio reads).
PS:本文采用conda安装,官方手册Manual_GMcloser.pdf在官网安装包里,详细信息参见手册。官网地址:GMcloser安装包
一、gmcloser安装及使用
#conda创建一个新环境
conda create -n gmcloser
#依赖包安装
conda install -c bioconda blast+
conda install -c bioconda bowtie2
conda install -c bioconda yass
#安装gmcloser
conda install -c bioconda gmcloser
#测试,显示文档安装成功
gmcloser -h
二、测试demo
#Gap closing in rice scaffolds (scaf.fa) with a contig set (contig.fa)
gmcloser -t <scaf.fa> -q <contig.fa> -r <pe_read_1.fq pe_read_2.fq> -p <out1> -l 100 -i 500 -d 50 -c -n 20
#Gap closing with pre-existing alignment files(sam文件)
gmcloser -t <scaf.fa> -q <contig.fa> -p out2 -a align1.coords -st <align.t1.sam align.t2.sam> -sq <align.q1.sam align.q2.sam> -l 100 -i 500 -d 50 -c -n 12
#Gap closing with error-corrected PacBio reads (e.g., 6x coverage)
gmcloser -t <scaf.fa> -q <read.fa> -lr -it 3 -mq 2 -p <out3> -l 100 -i 500 -d 50 -c -n 12
#主要参数说明
-t input target scaffold fasta file
-q input query fasta file of a contig or long read set (if long reads are used, -lr option must be specified)
-r space-separated list of fastq or fasta files of paired-end reads (e.g., -r read_pe1.fq read_pe2.fq) [gzip compressed files (*.gz) are acceptable]
-l read length of the paired-end reads specified with the ‒r, -st, -sq, or ‒sd option (mean read length if read lengths are variable)
-i average insert size of paired-end reads. INT must be > read_len and < 20001. [default: 400]
-d standard deviation of insert sizes of paired-end reads [default: 40]
-c connect between gap-encompassing subcontig pairs with their original (not merged with query contigs) termini [default: false]
-n number of threads (for machines with multiple processors), allows all the alignment processes to be performed in parallel [default: 5]
-LR query sequence file is a fasta file of long reads [default: false] (this option must be specified when a long-read set is used. If PacBio reads are used, they must be error-corrected)
-it number of iterations [default: 3]
-p prefix name of output files to be output in the working directory (must not contain directory names)
三、实战
策略I: Gap filling use hifi-reads
#准备数据到工作目录下/home/liuxin/DATA/cbp_data
组装草图位置:</home/liuxin/DATA/cbp_data/Cbp.LG.fasta>
hifi-reads位置:</home/liuxin/DATA/cbp_data/cbp.css.fasta>
二代reads位置:</home/liuxin/DATA/cbp_data/D2124163A_122_1.fq.gz /home/liuxin/DATA/cbp_data/D2124163A_122_2.fq.gz>
#进入工作目录
cd /home/liuxin/DATA/cbp_data
#开始运行
nohup gmcloser -LR -t Cbp.LG.fasta -q cbp.ccs.fasta -r D2124163A_122_1.fq.gz D2124163A_122_2.fq.gz -l 250 -i 400 -d 40 -it 3 -c -n 20 -p hifi_fill_outfile &
#结果将生成在hifi_fill_outfiile文件夹中,以.gapclosed.fa形式出现。此外,还包括一些统计输出文件,具体查看手册。
策略II: Gap filling use ONT reads
待策略I完成再考虑补充
四、结果评估
#使用gmvalue中的subcon进行评估结果。
mkdir eval
cd eval
gmvalue subcon -r Cbp.LG.fasta -q hifi_fill_outfile.gapclosed.fa -p Cbp.gapclosed
#几分钟后可以产生统计信息,Cbp.gapclosed.stat.txt中可以查看补洞数量,错误补洞情况等。
参考文献
Kosugi S, Hirakawa H, Tabata S. GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments. Bioinformatics. 2015 Dec 1;31(23):3733-41. doi: 10.1093/bioinformatics/btv465. Epub 2015 Aug 10. PMID: 26261222.