01.GATK人种系变异最佳实践SnakeMake流程:WorkFlow简介
<~生~信~交~流~与~合~作~请~关~注~公~众~号@生信探索>
学习的第一个GATK找变异流程,人的种系变异的短序列变异,包括SNP和INDEL。写了一个SnakeMake分析流程,从fastq文件到最后的vep注释后的VCF文件,关于VCF的介绍可以参考上一篇推文基因序列变异信息VCF (Variant Call Format)
流程代码在https://jihulab.com/BioQuest/smkhgs或https://github.com/BioQuestX/smkhgs
README
GATK best practices workflow Pipeline summary
SnakeMake workflow for Human Germline short variants (SNP+INDEL)
Reference
- Reference genome related files and GTAK budnle files (GATK)
- VEP Variarition annotation files (VEP)
Prepare
- Adapter trimming (Fastp)
- Aligner (BWA mem2)
- Mark duplicates (samblaster)
- Generates recalibration table for Base Quality Score Recalibration (BaseRecalibrator)
- Apply base quality score recalibration (ApplyBQSR)
Quality control report
- Fastp report (MultiQC)
- Alignment report (MultiQC)
Call
- Call germline SNPs and indels via local re-assembly of haplotypes (HaplotypeCaller)
- Import VCFs to GenomicsDB (GenomicsDBImport)
- Perform joint genotyping on one or more samples pre-called with HaplotypeCaller (GenotypeGVCFs)
Filter
- Select a SNP or INDEL of variants from a VCF file (SelectVariants)
- Build a recalibration model to score variant quality for filtering purposes (VariantRecalibrator)
- Apply a score cutoff to filter variants based on a recalibration table (ApplyVQSR)
- Merge all the VCF files (Picard)
Annotation
Annotate variant calls with VEP (VEP)
SnakeMake Report

Outputs
.
├── config
│ ├── captured_regions.bed
│ ├── config.yaml
│ └── samples.tsv
├── dag.svg
├── logs
│ ├── annotate
│ ├── call
│ ├── filter
│ ├── prepare
│ ├── qc
│ ├── ref
│ └── trim
├── raw
│ ├── SRR24443168.fastq.gz
│ └── SRR24443169.fastq.gz
├── README.md
├── report
│ ├── fastp_multiqc_data
│ ├── fastp_multiqc.html
│ ├── prepare_multiqc_data
│ ├── prepare_multiqc.html
│ └── vep_report.html
├── results
│ ├── called
│ ├── filtered
│ ├── prepared
│ ├── trimmed
│ └── vep_annotated.vcf.gz
├── workflow
│ ├── envs
│ ├── report
│ ├── rules
│ ├── schemas
│ ├── scripts
│ └── Snakefile
Directed Acyclic Graph
Reference
GATK best practices workflow: https://gatk.broadinstitute.org/hc/en-us/sections/360007226651-Best-Practices-Workflows
GATK: https://software.broadinstitute.org/gatk/
VEP: https://www.ensembl.org/info/docs/tools/vep/index.html
fastp: https://github.com/OpenGene/fastp
BWA mem2: http://bio-bwa.sourceforge.net/
samblaster: https://github.com/GregoryFaust/samblaster
BaseRecalibrator: https://gatk.broadinstitute.org/hc/en-us/articles/13832708374939-BaseRecalibrator
ApplyBQSR: https://github.com/GregoryFaust/samblaster
HaplotypeCaller: https://gatk.broadinstitute.org/hc/en-us/articles/13832687299739-HaplotypeCaller
GenomicsDBImport: https://gatk.broadinstitute.org/hc/en-us/articles/13832686645787-GenomicsDBImport
GenotypeGVCFs: https://gatk.broadinstitute.org/hc/en-us/articles/13832766863259-GenotypeGVCFs
SelectVariants: https://gatk.broadinstitute.org/hc/en-us/articles/13832694334235-SelectVariants
VariantRecalibrator: https://gatk.broadinstitute.org/hc/en-us/articles/13832694334235-VariantRecalibrator
ApplyVQSR: https://gatk.broadinstitute.org/hc/en-us/articles/13832694334235-ApplyVQSR
Picard: https://broadinstitute.github.io/picard
MultiQC: https://multiqc.info
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· winform 绘制太阳,地球,月球 运作规律
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 上周热点回顾(3.3-3.9)
· AI 智能体引爆开源社区「GitHub 热点速览」
· 写一个简单的SQL生成工具