Extracting info from VCF files

R, Bioconductor

filterVcf: Extract Variants of Interest from a Large VCF File (Paul Shannon)

We demonstrate three methods:  filtering by genomic region,  filtering on attributes of
each specific variant call, and intersecting with known regions of interest (exons, splice
sites, regulatory regions, etc.).

http://www.bioconductor.org/packages/release/bioc/vignettes/VariantAnnotation/inst/doc/filterVcf.pdf

 

Java

SelectVariants -- Select a subset of variants from a larger callset ( GATK SelectVariants )

Often, a VCF containing many samples and/or variants will need to be subset in order to facilitate certain analyses (e.g. comparing and contrasting cases vs. controls; extracting variant or non-variant loci that meet certain requirements, displaying just a few samples in a browser like IGV, etc.). SelectVariants can be used for this purpose.

https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php

 

Biostars

Question: How To Split Multiple Samples In Vcf File Generated By Gatk?
I did variant calling using BWA + PiCard + GATK and have just got the filtered VCF files from GATK. In the process of running GATK, I used list of inputs (11 samples) and for most steps, I had only one output file for each step. Now, I got two VCF files (one for SNPs and the other is for indels), each of which contains 11 samples. I can see the names of the 11 samples in the header of vcf files, and each sample seems to have one column of data. So I am wondering how to split each VCF files into individual sample vcf files?

https://www.biostars.org/p/78929/

 

bcftools

for file in *.vcf*; do
  for sample in `bcftools view -h $file | grep "^#CHROM" | cut -f10-`; do
    bcftools view -c1 -Oz -s $sample -o ${file/.vcf*/.$sample.vcf.gz} $file
  done
done

https://www.biostars.org/p/12535/#115691

 

vcf-subset

vcf-subset -c S1 bigfile.vcf > S1.vcf

https://www.biostars.org/p/78929/

http://campagnelab.org/software/goby/reference-documentation/modes/vcf-subset/

 

REF:

http://samtools.github.io/hts-specs/VCFv4.2.pdf

posted @   emanlee  阅读(970)  评论(0编辑  收藏  举报
编辑推荐:
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
阅读排行:
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
历史上的今天:
2010-06-08 勘误《新概念》IV
2010-06-08 勘误《新概念》III
点击右上角即可分享
微信分享提示