gatk 对多个样本的g.vcf文件进行合并、进行变异检测

 

001、

gatk CombineGVCFs -R GCF_000001735.4_TAIR10.1_genomic.fna --variant SRR21814498.g.vcf --variant SRR21814509.g.vcf --variant SRR21814514.g.vcf -O cohort.g.vcf.gz

 

 

 

002、多个g.vcf文件可以写为一个list文件

gatk CombineGVCFs -R GCF_000001735.4_TAIR10.1_genomic.fna --variant gvcf.list -O cohort.g.vcf.gz    ## 脚本需要在g.vcf文件所在的路径中运行

 

gvcf.list格式:

SRR21814498.g.vcf
SRR21814509.g.vcf
SRR21814514.g.vcf

 

 

 

003、变异检测、生成vcf文件

 gatk --java-options "-Xmx400g -Xms400g -XX:+UseSerialGC" GenotypeGVCFs -R GCF_000001735.4_TAIR10.1_genomic.fna -V cohort.g.vcf.gz -O combine.call.vcf.gz

 

 

 

004、提取SNP

gatk --java-options "-Xmx400g -Xms400g -XX:+UseSerialGC" SelectVariants -R GCF_000001735.4_TAIR10.1_genomic.fna -V combine.call.vcf.gz -select-type SNP -O combine.SNP.vcf.gz

 

 

 

005、过滤SNP

gatk --java-options "-Xmx400g -Xms400g -XX:+UseSerialGC" VariantFiltration -R GCF_000001735.4_TAIR10.1_genomic.fna -V combine.SNP.vcf.gz --filter-expression "QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filter-name "Filter" -O combine.SNP.filter.vcf.gz

 

 

 

006、提取过滤好的SNP

gatk --java-options "-Xmx400g -Xms400g -XX:+UseSerialGC" SelectVariants -R GCF_000001735.4_TAIR10.1_genomic.fna -V combine.SNP.filter.vcf.gz --exclude-filtered -O combine.SNP.filtered.vcf.gz

 

 

参考:https://www.jianshu.com/p/7c124d5bbd4d

 

posted @ 2022-10-29 01:23  小鲨鱼2018  阅读(4600)  评论(0编辑  收藏  举报