manta生成的包含structural variants（SV）结构变异的注释vcf文件，通过染色体位置获得基因symbol名称


# 首先确定流程：
# *.vcf(包含起始位点，染色体)----> *.annotated.vcf(包含基因名称)

# 通过流程可知：
# 我们需要bed文件。因为bed文件包含：
# 染色体序号，起，止位点，基因的symbol

# 确定好流程之后，我们开始搜寻需要的资料。
# 一个忠告：一定去Google上面搜索资料，百度经常搜不出来，也有不少错误

# 创建虚拟环境
conda create -n bcftools
conda activate bcftools

# 安装软件tabix和bcftools：
conda install -c bioconda bcftools
conda install -c bioconda tabix
# 这时候直接敲bcftools，出现报错，说明还不能正常使用bcftools：
# error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory
# .so文件是动态库文件，库包含的是程序运行需要的函数库，libbz2.so是bzip2的库文件，那么下载一个bzip不就有了嘛
conda install -c conda-forge bzip2
# 安装完成之后再敲bcftools，出现了该软件的说明文档
# 解决！

# 数据准备：
bgzip /biodata/pipeline/TUMOR/yln-test/hg19.refGene.edited.bed    # tabix前的必须步骤
tabix -pbed hg19.refGene.edited.bed.gz    # tabix为bed文件建立索引，搜寻更快
bed=/biodata/pipeline/TUMOR/yln-test/hg19.refGene.edited.bed.gz        # 赋值
bgzip /biodata/pipeline/TUMOR/yln-test/manta/results/variants/candidateSV.vcf    # bcftools要求是.vcf.gz文件
vcf=/biodata/pipeline/TUMOR/yln-test/manta/results/variants/candidateSV.vcf.gz        # 赋值

# 注释：
bcftools annotate \
  -a ${bed} \
  -c CHROM,FROM,TO,GENE \        # bed文件没有列名，要手动输入定义
  -h <(echo '##INFO=<ID=GENE,Number=1,Type=String,Description="Gene name">') \        # 设置注释信息
  ${vcf}

# 此时看看vcf文件info那一列是不是有基因的symbol啦：）

posted @ 2020-04-16 14:12 YlnChen 阅读(2629) 评论(0) 编辑收藏举报

刷新页面返回顶部

a152101

manta生成的包含structural variants（SV）结构变异的注释vcf文件，通过染色体位置获得基因symbol名称

公告