使用snpeff软件构建本地注释库

 

001、软件下载

官网:https://pcingola.github.io/SnpEff/

 

下载,然后解压:

(base) [root@pc1 software]# wget -c https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip          ## 下载
--2023-10-06 05:03:27--  https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip
Resolving snpeff.blob.core.windows.net (snpeff.blob.core.windows.net)... 52.239.234.228
Connecting to snpeff.blob.core.windows.net (snpeff.blob.core.windows.net)|52.239.234.228|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 66465201 (63M) [application/zip]
Saving to: ‘snpEff_latest_core.zip’

100%[===========================================================================================================>] 66,465,201   315KB/s   in 1m 43s

2023-10-06 05:05:12 (629 KB/s) - ‘snpEff_latest_core.zip’ saved [66465201/66465201]

(base) [root@pc1 software]# ls
snpEff_latest_core.zip
(base) [root@pc1 software]# unzip snpEff_latest_core.zip &> /dev/zero                                        ## 解压
(base) [root@pc1 software]# ls
snpEff  snpEff_latest_core.zip
(base) [root@pc1 software]# cd snpEff/                                                                       ## 查看软件内容
(base) [root@pc1 snpEff]# ls 
examples  exec  galaxy  LICENSE.md  scripts  snpEff.config  snpEff.jar  SnpSift.jar

 

002、java环境

snpeff软件对java的版本有一定要求,centos7默认的java1.8不行,因此安装自带的java11

 

a、查看默认安装

(base) [root@pc1 ~]# java -version                                  ## 默认安装了java1.8运行环境
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
(base) [root@pc1 ~]# javac -version
bash: javac: command not found...
Similar command is: 'java'

 

b、查看java1.8已经安装包

(base) [root@pc1 ~]# rpm -qa | grep -i java
java-1.8.0-openjdk-headless-1.8.0.181-7.b13.el7.x86_64
python-javapackages-3.4.1-11.el7.noarch
java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64
javapackages-tools-3.4.1-11.el7.noarch
tzdata-java-2018e-3.el7.noarch

 

c、卸载java1.8

(base) [root@pc1 ~]# rpm -qa | grep -i java
java-1.8.0-openjdk-headless-1.8.0.181-7.b13.el7.x86_64
python-javapackages-3.4.1-11.el7.noarch
java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64
javapackages-tools-3.4.1-11.el7.noarch
tzdata-java-2018e-3.el7.noarch
(base) [root@pc1 ~]# rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.181-7.b13.el7.x86_64
(base) [root@pc1 ~]# rpm -e --nodeps java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64
(base) [root@pc1 ~]# java -version
-bash: /usr/bin/java: No such file or directory
(base) [root@pc1 ~]# javac -version
bash: javac: command not found...

 

d、列出所有的可用java安装包

(base) [root@pc1 ~]# yum list java*        ## 可选java11

 

e、安装java11

(base) [root@pc1 ~]# yum -y install java-11-openjdk.x86_64 java-11-openjdk-devel.x86_64

 

f、验证安装效果

(base) [root@pc1 ~]# java -version                                 ## 安装成功
openjdk version "11.0.20" 2023-07-18 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.20.0.8-1.el7_9) (build 11.0.20+8-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.20.0.8-1.el7_9) (build 11.0.20+8-LTS, mixed mode, sharing)
(base) [root@pc1 ~]# javac -version
javac 11.0.20

 

  

003、构建注释库,需要参考基因组xxx.fa和基因组注释文件xxx.gff(以chicken为例)

a、随机创建一个注释的目录

(base) [root@pc1 chicken_snpeff]# cd ~                            ## 返回个人目录
(base) [root@pc1 ~]# mkdir chicken_snpeff                         ## 创建注释目录
(base) [root@pc1 ~]# ls
anaconda3  anaconda-ks.cfg  chicken_snpeff  Desktop  Documents  Downloads  initial-setup-ks.cfg  Music  Pictures  Public  Templates  Videos
(base) [root@pc1 ~]# cd chicken_snpeff/                           ## 进入该目录
(base) [root@pc1 chicken_snpeff]# ls

 

b、准备注释文件及相关目录

(base) [root@pc1 chicken_snpeff]# cp /home/software/snpEff/snpEff.config .                           ## 从软件安装目录复制一份配置文件进来
(base) [root@pc1 chicken_snpeff]# echo "chicken.genome:chicken" >> snpEff.config                     ## 修改配置文件
(base) [root@pc1 chicken_snpeff]# tail -n 1 snpEff.config                                            ## 查看最后一行
chicken.genome:chicken
(base) [root@pc1 chicken_snpeff]# mkdir data                                                         ## 创建data目录
(base) [root@pc1 chicken_snpeff]# cd data/                                                           ## 进入data目录
(base) [root@pc1 data]# ls
(base) [root@pc1 data]# mkdir chicken genomes                                                        ## 创建 chicken和genomes目录
(base) [root@pc1 data]# cp /home/GCF_016699485.2_bGalGal1.mat.broiler.GRCg7b_genomic.fna genomes/chicken.fa          ## 将chicken的参考基因组复制进genomes文件夹,并重命名未chicken.fa,跟配置文件一致,:之后的。
(base) [root@pc1 data]# cp /home/GCF_016699485.2_bGalGal1.mat.broiler.GRCg7b_genomic.gff chicken/genes.gff           ## 将注释文件复制经chicken目录,并重命名为genes.gff
(base) [root@pc1 data]# tree                                                                         ## 看一下结构
.
├── chicken
│   └── genes.gff
└── genomes
    └── chicken.fa

2 directories, 2 files

 

c、返回到data的上一级目录,并运行建库程序

(base) [root@pc1 data]# cd ..
(base) [root@pc1 chicken_snpeff]# pwd
/root/chicken_snpeff
(base) [root@pc1 chicken_snpeff]# ls
data  snpEff.config
(base) [root@pc1 chicken_snpeff]# java -jar /home/software/snpEff/snpEff.jar build -c ./snpEff.config -gff3 -v chicken

a、报错了

 

b、解决方法

java -jar /home/software/snpEff/snpEff.jar build -c ./snpEff.config -gff3 -v chicken -d -noCheckCds -noCheckProtein     ## 增加后面的参数

 

c、这是否算报错?

 

 d、程序运行结束

 

 

004、构建结果

 

(base) [root@pc1 chicken_snpeff]# ls
data  snpEff.config
(base) [root@pc1 chicken_snpeff]# cd data/
(base) [root@pc1 data]# ls
chicken  genomes
(base) [root@pc1 data]# cd chicken/
(base) [root@pc1 chicken]# ls
genes.gff                 sequence.NC_052537.1.bin  sequence.NC_052544.1.bin  sequence.NC_052551.1.bin  sequence.NC_052558.1.bin
sequence.bin              sequence.NC_052538.1.bin  sequence.NC_052545.1.bin  sequence.NC_052552.1.bin  sequence.NC_052559.1.bin
sequence.NC_052532.1.bin  sequence.NC_052539.1.bin  sequence.NC_052546.1.bin  sequence.NC_052553.1.bin  sequence.NC_052562.1.bin
sequence.NC_052533.1.bin  sequence.NC_052540.1.bin  sequence.NC_052547.1.bin  sequence.NC_052554.1.bin  sequence.NC_052565.1.bin
sequence.NC_052534.1.bin  sequence.NC_052541.1.bin  sequence.NC_052548.1.bin  sequence.NC_052555.1.bin  sequence.NC_052571.1.bin
sequence.NC_052535.1.bin  sequence.NC_052542.1.bin  sequence.NC_052549.1.bin  sequence.NC_052556.1.bin  sequence.NC_052572.1.bin
sequence.NC_052536.1.bin  sequence.NC_052543.1.bin  sequence.NC_052550.1.bin  sequence.NC_052557.1.bin  snpEffectPredictor.bin
(base) [root@pc1 chicken]# ll -h
total 1015M
-rw-r--r--. 1 root root 660M Oct  5 23:03 genes.gff
-rw-r--r--. 1 root root 1.5M Oct  5 23:16 sequence.bin
-rw-r--r--. 1 root root  32M Oct  5 23:14 sequence.NC_052532.1.bin
-rw-r--r--. 1 root root  24M Oct  5 23:14 sequence.NC_052533.1.bin
-rw-r--r--. 1 root root  19M Oct  5 23:14 sequence.NC_052534.1.bin
-rw-r--r--. 1 root root  16M Oct  5 23:15 sequence.NC_052535.1.bin

 

参考:

01、https://mp.weixin.qq.com/s?__biz=MzI2ODI2NDc2Mw==&mid=2247491093&idx=1&sn=e7d8db56d039729fb7ec8234bfcef188&chksm=eaf36a41dd84e357a80398f98afcde62fa5c92ab1026da804bedaa929cda927f62f47a73bf66&mpshare=1&scene=23&srcid=10029X3Q96AAEgvEJC2ohYmb&sharer_shareinfo=0c0d1601bde45a770b77a2ad48958dd7&sharer_shareinfo_first=0c0d1601bde45a770b77a2ad48958dd7#rd

02、https://zhuanlan.zhihu.com/p/625865035

 

posted @ 2023-10-05 22:34  小鲨鱼2018  阅读(393)  评论(0编辑  收藏  举报