GMcloser安装及使用

GMcloser, a tool that accurately closes genomic gaps with a preassembled contig set or a long read set (i.e., error-corrected PacBio reads).

PS:本文采用conda安装,官方手册Manual_GMcloser.pdf在官网安装包里,详细信息参见手册。官网地址:GMcloser安装包

一、gmcloser安装及使用

#conda创建一个新环境

conda create -n gmcloser 

#依赖包安装

conda install -c bioconda blast+
conda install -c bioconda bowtie2
conda install -c bioconda yass

#安装gmcloser

conda install -c bioconda gmcloser 

#测试,显示文档安装成功

gmcloser -h


二、测试demo

#Gap closing in rice scaffolds (scaf.fa) with a contig set (contig.fa)

gmcloser -t <scaf.fa> -q <contig.fa> -r <pe_read_1.fq pe_read_2.fq> -p <out1> -l 100 -i 500 -d 50 -c -n 20

#Gap closing with pre-existing alignment files(sam文件)

gmcloser -t <scaf.fa> -q <contig.fa> -p out2 -a align1.coords -st <align.t1.sam align.t2.sam> -sq <align.q1.sam align.q2.sam> -l 100 -i 500 -d 50 -c -n 12

#Gap closing with error-corrected PacBio reads (e.g., 6x coverage)

gmcloser -t <scaf.fa> -q <read.fa> -lr -it 3 -mq 2 -p <out3> -l 100 -i 500 -d 50 -c -n 12

#主要参数说明

-t input target scaffold fasta file 
-q input query fasta file of a contig or long read set (if long reads are used, -lr option must be specified)
-r space-separated list of fastq or fasta files of paired-end reads (e.g., -r read_pe1.fq read_pe2.fq) [gzip compressed files (*.gz) are acceptable]
-l read length of the paired-end reads specified with the ‒r, -st, -sq, or ‒sd option (mean read length if read lengths are variable)
-i average insert size of paired-end reads. INT must be > read_len and < 20001. [default: 400]
-d standard deviation of insert sizes of paired-end reads [default: 40]
-c connect between gap-encompassing subcontig pairs with their original (not merged with query contigs) termini [default: false]
-n number of threads (for machines with multiple processors), allows all the alignment processes to be performed in parallel [default: 5]
-LR query sequence file is a fasta file of long reads [default: false] (this option must be specified when a long-read set is used. If PacBio reads are used, they must be error-corrected)
-it number of iterations [default: 3]
-p prefix name of output files to be output in the working directory (must not contain directory names)

三、实战

策略I: Gap filling use hifi-reads

#准备数据到工作目录下/home/liuxin/DATA/cbp_data

组装草图位置:</home/liuxin/DATA/cbp_data/Cbp.LG.fasta>
hifi-reads位置:</home/liuxin/DATA/cbp_data/cbp.css.fasta>
二代reads位置:</home/liuxin/DATA/cbp_data/D2124163A_122_1.fq.gz /home/liuxin/DATA/cbp_data/D2124163A_122_2.fq.gz>

#进入工作目录

cd /home/liuxin/DATA/cbp_data

#开始运行

nohup gmcloser -LR -t Cbp.LG.fasta -q cbp.ccs.fasta -r D2124163A_122_1.fq.gz D2124163A_122_2.fq.gz -l 250 -i 400 -d 40 -it 3 -c -n 20 -p hifi_fill_outfile &

#结果将生成在hifi_fill_outfiile文件夹中,以.gapclosed.fa形式出现。此外,还包括一些统计输出文件,具体查看手册。

策略II: Gap filling use ONT reads

待策略I完成再考虑补充

四、结果评估

#使用gmvalue中的subcon进行评估结果。

mkdir eval
cd eval
gmvalue subcon -r Cbp.LG.fasta -q hifi_fill_outfile.gapclosed.fa -p Cbp.gapclosed

#几分钟后可以产生统计信息,Cbp.gapclosed.stat.txt中可以查看补洞数量,错误补洞情况等。

参考文献


Kosugi S, Hirakawa H, Tabata S. GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments. Bioinformatics. 2015 Dec 1;31(23):3733-41. doi: 10.1093/bioinformatics/btv465. Epub 2015 Aug 10. PMID: 26261222.

posted @ 2022-08-31 16:43  pd_liu  阅读(299)  评论(0编辑  收藏  举报