使用snpeff软件构建本地注释库

 

001、软件下载

官网:https://pcingola.github.io/SnpEff/

 

下载,然后解压:

复制代码
(base) [root@pc1 software]# wget -c https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip          ## 下载
--2023-10-06 05:03:27--  https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip
Resolving snpeff.blob.core.windows.net (snpeff.blob.core.windows.net)... 52.239.234.228
Connecting to snpeff.blob.core.windows.net (snpeff.blob.core.windows.net)|52.239.234.228|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 66465201 (63M) [application/zip]
Saving to: ‘snpEff_latest_core.zip’

100%[===========================================================================================================>] 66,465,201   315KB/s   in 1m 43s

2023-10-06 05:05:12 (629 KB/s) - ‘snpEff_latest_core.zip’ saved [66465201/66465201]

(base) [root@pc1 software]# ls
snpEff_latest_core.zip
(base) [root@pc1 software]# unzip snpEff_latest_core.zip &> /dev/zero                                        ## 解压
(base) [root@pc1 software]# ls
snpEff  snpEff_latest_core.zip
(base) [root@pc1 software]# cd snpEff/                                                                       ## 查看软件内容
(base) [root@pc1 snpEff]# ls 
examples  exec  galaxy  LICENSE.md  scripts  snpEff.config  snpEff.jar  SnpSift.jar
复制代码

 

002、java环境

snpeff软件对java的版本有一定要求,centos7默认的java1.8不行,因此安装自带的java11

 

a、查看默认安装

(base) [root@pc1 ~]# java -version                                  ## 默认安装了java1.8运行环境
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
(base) [root@pc1 ~]# javac -version
bash: javac: command not found...
Similar command is: 'java'

 

b、查看java1.8已经安装包

(base) [root@pc1 ~]# rpm -qa | grep -i java
java-1.8.0-openjdk-headless-1.8.0.181-7.b13.el7.x86_64
python-javapackages-3.4.1-11.el7.noarch
java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64
javapackages-tools-3.4.1-11.el7.noarch
tzdata-java-2018e-3.el7.noarch

 

c、卸载java1.8

复制代码
(base) [root@pc1 ~]# rpm -qa | grep -i java
java-1.8.0-openjdk-headless-1.8.0.181-7.b13.el7.x86_64
python-javapackages-3.4.1-11.el7.noarch
java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64
javapackages-tools-3.4.1-11.el7.noarch
tzdata-java-2018e-3.el7.noarch
(base) [root@pc1 ~]# rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.181-7.b13.el7.x86_64
(base) [root@pc1 ~]# rpm -e --nodeps java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64
(base) [root@pc1 ~]# java -version
-bash: /usr/bin/java: No such file or directory
(base) [root@pc1 ~]# javac -version
bash: javac: command not found...
复制代码

 

d、列出所有的可用java安装包

(base) [root@pc1 ~]# yum list java*        ## 可选java11

 

e、安装java11

(base) [root@pc1 ~]# yum -y install java-11-openjdk.x86_64 java-11-openjdk-devel.x86_64

 

f、验证安装效果

(base) [root@pc1 ~]# java -version                                 ## 安装成功
openjdk version "11.0.20" 2023-07-18 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.20.0.8-1.el7_9) (build 11.0.20+8-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.20.0.8-1.el7_9) (build 11.0.20+8-LTS, mixed mode, sharing)
(base) [root@pc1 ~]# javac -version
javac 11.0.20

 

  

003、构建注释库,需要参考基因组xxx.fa和基因组注释文件xxx.gff(以chicken为例)

a、随机创建一个注释的目录

(base) [root@pc1 chicken_snpeff]# cd ~                            ## 返回个人目录
(base) [root@pc1 ~]# mkdir chicken_snpeff                         ## 创建注释目录
(base) [root@pc1 ~]# ls
anaconda3  anaconda-ks.cfg  chicken_snpeff  Desktop  Documents  Downloads  initial-setup-ks.cfg  Music  Pictures  Public  Templates  Videos
(base) [root@pc1 ~]# cd chicken_snpeff/                           ## 进入该目录
(base) [root@pc1 chicken_snpeff]# ls

 

b、准备注释文件及相关目录

复制代码
(base) [root@pc1 chicken_snpeff]# cp /home/software/snpEff/snpEff.config .                           ## 从软件安装目录复制一份配置文件进来
(base) [root@pc1 chicken_snpeff]# echo "chicken.genome:chicken" >> snpEff.config                     ## 修改配置文件
(base) [root@pc1 chicken_snpeff]# tail -n 1 snpEff.config                                            ## 查看最后一行
chicken.genome:chicken
(base) [root@pc1 chicken_snpeff]# mkdir data                                                         ## 创建data目录
(base) [root@pc1 chicken_snpeff]# cd data/                                                           ## 进入data目录
(base) [root@pc1 data]# ls
(base) [root@pc1 data]# mkdir chicken genomes                                                        ## 创建 chicken和genomes目录
(base) [root@pc1 data]# cp /home/GCF_016699485.2_bGalGal1.mat.broiler.GRCg7b_genomic.fna genomes/chicken.fa          ## 将chicken的参考基因组复制进genomes文件夹,并重命名未chicken.fa,跟配置文件一致,:之后的。
(base) [root@pc1 data]# cp /home/GCF_016699485.2_bGalGal1.mat.broiler.GRCg7b_genomic.gff chicken/genes.gff           ## 将注释文件复制经chicken目录,并重命名为genes.gff
(base) [root@pc1 data]# tree                                                                         ## 看一下结构
.
├── chicken
│   └── genes.gff
└── genomes
    └── chicken.fa

2 directories, 2 files
复制代码

 

c、返回到data的上一级目录,并运行建库程序

(base) [root@pc1 data]# cd ..
(base) [root@pc1 chicken_snpeff]# pwd
/root/chicken_snpeff
(base) [root@pc1 chicken_snpeff]# ls
data  snpEff.config
(base) [root@pc1 chicken_snpeff]# java -jar /home/software/snpEff/snpEff.jar build -c ./snpEff.config -gff3 -v chicken

a、报错了

 

b、解决方法

java -jar /home/software/snpEff/snpEff.jar build -c ./snpEff.config -gff3 -v chicken -d -noCheckCds -noCheckProtein     ## 增加后面的参数

 

c、这是否算报错?

 

 d、程序运行结束

 

 

004、构建结果

 

复制代码
(base) [root@pc1 chicken_snpeff]# ls
data  snpEff.config
(base) [root@pc1 chicken_snpeff]# cd data/
(base) [root@pc1 data]# ls
chicken  genomes
(base) [root@pc1 data]# cd chicken/
(base) [root@pc1 chicken]# ls
genes.gff                 sequence.NC_052537.1.bin  sequence.NC_052544.1.bin  sequence.NC_052551.1.bin  sequence.NC_052558.1.bin
sequence.bin              sequence.NC_052538.1.bin  sequence.NC_052545.1.bin  sequence.NC_052552.1.bin  sequence.NC_052559.1.bin
sequence.NC_052532.1.bin  sequence.NC_052539.1.bin  sequence.NC_052546.1.bin  sequence.NC_052553.1.bin  sequence.NC_052562.1.bin
sequence.NC_052533.1.bin  sequence.NC_052540.1.bin  sequence.NC_052547.1.bin  sequence.NC_052554.1.bin  sequence.NC_052565.1.bin
sequence.NC_052534.1.bin  sequence.NC_052541.1.bin  sequence.NC_052548.1.bin  sequence.NC_052555.1.bin  sequence.NC_052571.1.bin
sequence.NC_052535.1.bin  sequence.NC_052542.1.bin  sequence.NC_052549.1.bin  sequence.NC_052556.1.bin  sequence.NC_052572.1.bin
sequence.NC_052536.1.bin  sequence.NC_052543.1.bin  sequence.NC_052550.1.bin  sequence.NC_052557.1.bin  snpEffectPredictor.bin
(base) [root@pc1 chicken]# ll -h
total 1015M
-rw-r--r--. 1 root root 660M Oct  5 23:03 genes.gff
-rw-r--r--. 1 root root 1.5M Oct  5 23:16 sequence.bin
-rw-r--r--. 1 root root  32M Oct  5 23:14 sequence.NC_052532.1.bin
-rw-r--r--. 1 root root  24M Oct  5 23:14 sequence.NC_052533.1.bin
-rw-r--r--. 1 root root  19M Oct  5 23:14 sequence.NC_052534.1.bin
-rw-r--r--. 1 root root  16M Oct  5 23:15 sequence.NC_052535.1.bin
复制代码

 

参考:

01、https://mp.weixin.qq.com/s?__biz=MzI2ODI2NDc2Mw==&mid=2247491093&idx=1&sn=e7d8db56d039729fb7ec8234bfcef188&chksm=eaf36a41dd84e357a80398f98afcde62fa5c92ab1026da804bedaa929cda927f62f47a73bf66&mpshare=1&scene=23&srcid=10029X3Q96AAEgvEJC2ohYmb&sharer_shareinfo=0c0d1601bde45a770b77a2ad48958dd7&sharer_shareinfo_first=0c0d1601bde45a770b77a2ad48958dd7#rd

02、https://zhuanlan.zhihu.com/p/625865035

 

posted @   小鲨鱼2018  阅读(597)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
历史上的今天:
2022-10-05 linux 中如何安装docker
2022-10-05 如何删除使用conda安装的软件
2022-10-05 centos7.6中安装java8
2022-10-05 utils.c:33:18: fatal error: zlib.h: No such file or directory
2020-10-05 Linux系统中date命令
2020-10-05 plink格式数据依据染色体拆分数据、依据染色体合并数据
2020-10-05 linux系统中常用的转义字符 \、" "、' '、` `。
点击右上角即可分享
微信分享提示