使用snpeff软件构建本地注释库
001、软件下载
官网:https://pcingola.github.io/SnpEff/
下载,然后解压:
(base) [root@pc1 software]# wget -c https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip ## 下载 --2023-10-06 05:03:27-- https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip Resolving snpeff.blob.core.windows.net (snpeff.blob.core.windows.net)... 52.239.234.228 Connecting to snpeff.blob.core.windows.net (snpeff.blob.core.windows.net)|52.239.234.228|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 66465201 (63M) [application/zip] Saving to: ‘snpEff_latest_core.zip’ 100%[===========================================================================================================>] 66,465,201 315KB/s in 1m 43s 2023-10-06 05:05:12 (629 KB/s) - ‘snpEff_latest_core.zip’ saved [66465201/66465201] (base) [root@pc1 software]# ls snpEff_latest_core.zip (base) [root@pc1 software]# unzip snpEff_latest_core.zip &> /dev/zero ## 解压 (base) [root@pc1 software]# ls snpEff snpEff_latest_core.zip (base) [root@pc1 software]# cd snpEff/ ## 查看软件内容 (base) [root@pc1 snpEff]# ls examples exec galaxy LICENSE.md scripts snpEff.config snpEff.jar SnpSift.jar
002、java环境
snpeff软件对java的版本有一定要求,centos7默认的java1.8不行,因此安装自带的java11
a、查看默认安装
(base) [root@pc1 ~]# java -version ## 默认安装了java1.8运行环境 openjdk version "1.8.0_181" OpenJDK Runtime Environment (build 1.8.0_181-b13) OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode) (base) [root@pc1 ~]# javac -version bash: javac: command not found... Similar command is: 'java'
b、查看java1.8已经安装包
(base) [root@pc1 ~]# rpm -qa | grep -i java java-1.8.0-openjdk-headless-1.8.0.181-7.b13.el7.x86_64 python-javapackages-3.4.1-11.el7.noarch java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64 javapackages-tools-3.4.1-11.el7.noarch tzdata-java-2018e-3.el7.noarch
c、卸载java1.8
(base) [root@pc1 ~]# rpm -qa | grep -i java java-1.8.0-openjdk-headless-1.8.0.181-7.b13.el7.x86_64 python-javapackages-3.4.1-11.el7.noarch java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64 javapackages-tools-3.4.1-11.el7.noarch tzdata-java-2018e-3.el7.noarch (base) [root@pc1 ~]# rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.181-7.b13.el7.x86_64 (base) [root@pc1 ~]# rpm -e --nodeps java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64 (base) [root@pc1 ~]# java -version -bash: /usr/bin/java: No such file or directory (base) [root@pc1 ~]# javac -version bash: javac: command not found...
d、列出所有的可用java安装包
(base) [root@pc1 ~]# yum list java* ## 可选java11
e、安装java11
(base) [root@pc1 ~]# yum -y install java-11-openjdk.x86_64 java-11-openjdk-devel.x86_64
f、验证安装效果
(base) [root@pc1 ~]# java -version ## 安装成功 openjdk version "11.0.20" 2023-07-18 LTS OpenJDK Runtime Environment (Red_Hat-11.0.20.0.8-1.el7_9) (build 11.0.20+8-LTS) OpenJDK 64-Bit Server VM (Red_Hat-11.0.20.0.8-1.el7_9) (build 11.0.20+8-LTS, mixed mode, sharing) (base) [root@pc1 ~]# javac -version javac 11.0.20
003、构建注释库,需要参考基因组xxx.fa和基因组注释文件xxx.gff(以chicken为例)
a、随机创建一个注释的目录
(base) [root@pc1 chicken_snpeff]# cd ~ ## 返回个人目录 (base) [root@pc1 ~]# mkdir chicken_snpeff ## 创建注释目录 (base) [root@pc1 ~]# ls anaconda3 anaconda-ks.cfg chicken_snpeff Desktop Documents Downloads initial-setup-ks.cfg Music Pictures Public Templates Videos (base) [root@pc1 ~]# cd chicken_snpeff/ ## 进入该目录 (base) [root@pc1 chicken_snpeff]# ls
b、准备注释文件及相关目录
(base) [root@pc1 chicken_snpeff]# cp /home/software/snpEff/snpEff.config . ## 从软件安装目录复制一份配置文件进来 (base) [root@pc1 chicken_snpeff]# echo "chicken.genome:chicken" >> snpEff.config ## 修改配置文件 (base) [root@pc1 chicken_snpeff]# tail -n 1 snpEff.config ## 查看最后一行 chicken.genome:chicken (base) [root@pc1 chicken_snpeff]# mkdir data ## 创建data目录 (base) [root@pc1 chicken_snpeff]# cd data/ ## 进入data目录 (base) [root@pc1 data]# ls (base) [root@pc1 data]# mkdir chicken genomes ## 创建 chicken和genomes目录 (base) [root@pc1 data]# cp /home/GCF_016699485.2_bGalGal1.mat.broiler.GRCg7b_genomic.fna genomes/chicken.fa ## 将chicken的参考基因组复制进genomes文件夹,并重命名未chicken.fa,跟配置文件一致,:之后的。 (base) [root@pc1 data]# cp /home/GCF_016699485.2_bGalGal1.mat.broiler.GRCg7b_genomic.gff chicken/genes.gff ## 将注释文件复制经chicken目录,并重命名为genes.gff (base) [root@pc1 data]# tree ## 看一下结构 . ├── chicken │ └── genes.gff └── genomes └── chicken.fa 2 directories, 2 files
c、返回到data的上一级目录,并运行建库程序
(base) [root@pc1 data]# cd .. (base) [root@pc1 chicken_snpeff]# pwd /root/chicken_snpeff (base) [root@pc1 chicken_snpeff]# ls data snpEff.config (base) [root@pc1 chicken_snpeff]# java -jar /home/software/snpEff/snpEff.jar build -c ./snpEff.config -gff3 -v chicken
a、报错了
b、解决方法
java -jar /home/software/snpEff/snpEff.jar build -c ./snpEff.config -gff3 -v chicken -d -noCheckCds -noCheckProtein ## 增加后面的参数
c、这是否算报错?
d、程序运行结束
004、构建结果
(base) [root@pc1 chicken_snpeff]# ls data snpEff.config (base) [root@pc1 chicken_snpeff]# cd data/ (base) [root@pc1 data]# ls chicken genomes (base) [root@pc1 data]# cd chicken/ (base) [root@pc1 chicken]# ls genes.gff sequence.NC_052537.1.bin sequence.NC_052544.1.bin sequence.NC_052551.1.bin sequence.NC_052558.1.bin sequence.bin sequence.NC_052538.1.bin sequence.NC_052545.1.bin sequence.NC_052552.1.bin sequence.NC_052559.1.bin sequence.NC_052532.1.bin sequence.NC_052539.1.bin sequence.NC_052546.1.bin sequence.NC_052553.1.bin sequence.NC_052562.1.bin sequence.NC_052533.1.bin sequence.NC_052540.1.bin sequence.NC_052547.1.bin sequence.NC_052554.1.bin sequence.NC_052565.1.bin sequence.NC_052534.1.bin sequence.NC_052541.1.bin sequence.NC_052548.1.bin sequence.NC_052555.1.bin sequence.NC_052571.1.bin sequence.NC_052535.1.bin sequence.NC_052542.1.bin sequence.NC_052549.1.bin sequence.NC_052556.1.bin sequence.NC_052572.1.bin sequence.NC_052536.1.bin sequence.NC_052543.1.bin sequence.NC_052550.1.bin sequence.NC_052557.1.bin snpEffectPredictor.bin (base) [root@pc1 chicken]# ll -h total 1015M -rw-r--r--. 1 root root 660M Oct 5 23:03 genes.gff -rw-r--r--. 1 root root 1.5M Oct 5 23:16 sequence.bin -rw-r--r--. 1 root root 32M Oct 5 23:14 sequence.NC_052532.1.bin -rw-r--r--. 1 root root 24M Oct 5 23:14 sequence.NC_052533.1.bin -rw-r--r--. 1 root root 19M Oct 5 23:14 sequence.NC_052534.1.bin -rw-r--r--. 1 root root 16M Oct 5 23:15 sequence.NC_052535.1.bin
参考:
01、https://mp.weixin.qq.com/s?__biz=MzI2ODI2NDc2Mw==&mid=2247491093&idx=1&sn=e7d8db56d039729fb7ec8234bfcef188&chksm=eaf36a41dd84e357a80398f98afcde62fa5c92ab1026da804bedaa929cda927f62f47a73bf66&mpshare=1&scene=23&srcid=10029X3Q96AAEgvEJC2ohYmb&sharer_shareinfo=0c0d1601bde45a770b77a2ad48958dd7&sharer_shareinfo_first=0c0d1601bde45a770b77a2ad48958dd7#rd
02、https://zhuanlan.zhihu.com/p/625865035
。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
2022-10-05 linux 中如何安装docker
2022-10-05 如何删除使用conda安装的软件
2022-10-05 centos7.6中安装java8
2022-10-05 utils.c:33:18: fatal error: zlib.h: No such file or directory
2020-10-05 Linux系统中date命令
2020-10-05 plink格式数据依据染色体拆分数据、依据染色体合并数据
2020-10-05 linux系统中常用的转义字符 \、" "、' '、` `。