HIC simple process
1,什么是Hic数据?
Hi-C是研究染色质三维结构的一种方法。Hi-C技术源于染色体构象捕获(Chromosome Conformation Capture, 3C)技术,利用高通量测序技术,结合生物信息分析方法,研究全基因组范围内整个染色质DNA在空间位置上的关系,获得高分辨率的染色质三维结构信息。
2,Hic数据的优势
- 通过Scaffold间的交互频率大小,可以对已组装的基因组序列进行纠错。
- 基因信息不再仅仅是contig片段,而是被划分至染色体上,成为染色体水平。
- 无需辛苦的构建群体,单一一个体就能实现染色体定位。
- 相比遗传图谱,标记密度更大,序列定位更完整。
- 可以开展染色体重排等结构变异研究。
- QTL、GWAS可以定位区间到某个染色体。
- 可以解析该物种的三维基因结构、染色体互作及动态变化。
3,目前的处理流程
4,分析主要工具
目前针对Hi-c数据处理的工具主要是Hic-pro和juicer
#####HIC图谱,TAD结构,loop结构,3D-建模 ####HiC-Pro installlation#### wget -c http://github.com/nservant/HiC-Pro/archive/refs/tags/v3.1.0.tar.gz tar -zxvf HiC-Pro-3.1.0.tar.gz conda env create -f /data5/tan/zengchuanj/Software/HiC-Pro-3.1.0/environment.yml -p /data5/tan/zengchuanj/conda/conda/envs/HiC-Pro conda activate HiC-Pro #configure.install.txt: PREFIX = /data5/tan/zengchuanj/Software/HiC-Pro-3.1.0 BOWTIE2_PATH = /data5/tan/zengchuanj/conda/conda/envs/HiC-Pro/bin/bowtie2 SAMTOOLS_PATH = /data5/tan/zengchuanj/conda/conda/envs/HiC-Pro/bin/samtools R_PATH = /data5/tan/zengchuanj/conda/conda/envs/HiC-Pro/bin/R PYTHON_PATH = /data5/tan/zengchuanj/conda/conda/envs/HiC-Pro/bin/python CLUSTER_SYS = TORQUE make configure make install ref_dir = /data5/tan/zengchuanj/pipeline/Annotation/HIC/GRCm39.genome.fa.gz gunzip GRCm39.genome.fa.gz #build index pwd:/data5/tan/zengchuanj/pipeline/Annotation/HIC bowtie2-build GRCm39.genome.fa mouse samtools faidx GRCm39.genome.fa #基因组中序列大小文件 awk '{print $1 "\t" $2}' GRCm39.genome.fa.fai > mouse.genome.sizes #创建酶切位点文件 bin=/data5/tan/zengchuanj/Software/HiC-Pro-3.1.0/bin/utils/digest_genome.py #python $bin GRCm39.genome.fa -r mobi -o mouse_mobi.bed python $bin GRCm39.genome.fa -r ^GATCGATC -o mouse_mobi.bed #config-hicpro.txt: N_CPU,CPU数目; BOWTIE2_IDX_PATH,索引所在目录 REFERENCE_GENOME,比对参考基因组路径及前缀 GENOME_SIZE,chrom.sizes文件的路径 GENOME_FRAGMENT,酶切片段的bed文件的路径 LIGATION_SITE,酶切位点末端补平再次连接后形成的嵌合序列,例如HindIII,则为AAGCTAGCTT;如果是MboI则序列为GATCGATC; ## SYSTEM AND SCHEDULER - Start Editing Here !! N_CPU = 50 #CPU线程数 LOGFILE = hicpro.log #log文件名 JOB_NAME = hicpro #任务名 JOB_MEM = 100gb #占用内存 JOB_WALLTIME = JOB_QUEUE = JOB_MAIL = PAIR1_EXT = _R1 PAIR2_EXT = _R2 BOWTIE2_IDX_PATH = /data5/tan/lishix/jys/test/results/reads #比对的reads文件目录 BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder BOWTIE2_LOCAL_OPTIONS = --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder GENOME_SIZE = /data5/tan/zengchuanj/pipeline/Annotation/HIC/mouse.genome.sizes #genome.sizes的绝对路径 ## Digestion Hi-C GENOME_FRAGMENT = /data5/tan/zengchuanj/pipeline/HIC/mouse_mobi.bed #绝对路径 LIGATION_SITE = GATCGATC #限制性内切酶,具体用的什么酶可以咨询测序公司,我这里用的Mboi MIN_FRAG_SIZE = 100 MAX_FRAG_SIZE = 100000 MIN_INSERT_SIZE = 100 MAX_INSERT_SIZE = 1000 ## Contact Maps BIN_SIZE = 20000 40000 150000 500000 1000000 #根据自身需求设置 bin size MATRIX_FORMAT = upper /data5/tan/zengchuanj/Software/HiC-Pro-3.1.0/bin/HiC-Pro -c /data5/tan/zengchuanj/pipeline/HIC/HiC-Pro/config-hicpro.txt -i /data5/tan/zengchuanj/pipeline/HIC/HiC-Pro/fastq -o /data5/tan/zengchuanj/pipeline/HIC/HiC-Pro/results #目录构成: fastq/sample: sample_R1.fastq.gz sample_R2.fastq.gz #####juicer installation#### conda create -n juicer -c bioconda bwa -y conda activate jucier mkdir work && mkdir references && mkdir restriction_sites Juicer/juicer/references # 存放参考基因组相关文件的文件夹 Juicer/juicer/work # 存放样本的序列文件,和分析结果的文件夹 Juicer/juicer/restriction_sites # 存放参考基因组酶切图谱的文件夹 wget https://github.com/aidenlab/juicer/archive/refs/tags/1.6.tar.gz tar -xzvf juicer-1.6.tar.gz ln -s juicer/CPU scripts # scripts 应该在juicer目录下 cd juicer/scripts/common wget -c https://hicfiles.tc4ga.com/public/juicer/juicer_tools.1.9.9_jcuda.0.8.jar ln -s juicer_tools.1.9.9_jcuda.0.8.jar juicer_tools.jar #构建基因组索引 pwd:/data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/references bwa index GRCm39.genome.fa #生成酶切图谱文件 python /data5/tan/zengchuanj/Software/juicer/misc/generate_site_positions.py Mboi genome /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/references/GRCm39.genome.fa #生成染色体长度文件 # genome_DpnII.txt 文件由上一步生成 awk 'BEGIN{OFS="\t"}{print $1, $NF}' genome_Mboi.txt > genome.chrom.sizes cd ./references python /data5/tan/zengchaunj/pipeline/HIC/Juicer/misc/generate_site_positions.py Mboi mm9 mm9.fasta # 三个参数分别为 内切酶名称,参考基因组名称,参考基因组序列文件的路径 nohup bash scripts/juicer.sh -d /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/test -D /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer -y /data5/tan/lishix/HIC/opt/juicer/restriction_sites/mm39_MboI.txt -z /data5/tan/lishix/HIC/opt/juicer/references/Mus_musculus.GRCm39.dna.toplevel.fa -p restriction_sites/genome.chrom.sizes -s MboI -t 10 2> test.txt & Usage: # nohup 命令会将程序挂在后台运行 nohup bash /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/scripts/juicer.sh \ -z /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/references/GRCm39.genome.fa \ -p /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/restriction_sites/genome.chrom.sizes \ -y /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/restriction_sites/GRCm39.genome_MboI.txt \ -s MboI \ -d /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/work/ \ -D /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer \ -t 40 > log.txt & # -z参数指定参考基因组fasta所在路径,在该路径下必须同时存在对应的bwa索引 # -p参数指定染色体长度文件; # -y指定基因组酶切图谱的路径; # -d指定样本原始文件存放的路径; # -D指定软件的安装路径, # -t指定bwa比对使用的线程数,默认是使用全部线程。 #HIC图谱绘制 data_dir = /data5/tan/lishix/jys/test/results/ species = mouse 酶:mboi #使用HiCPlotter.py对HiC-Pro结果进行可视化 python2.7 HiCPlotter.py -o genome \ -f genome_500000_iced.matrix \ -r 500000 -tri 1 \ -bed genome_500000_abs.bed \ -n genome \ -wg 1 -chr chromosome7 -o 输出的文件名 -f _500000_iced.matrix产生的矩阵文件 -r 矩阵的分辨率 -bed _500000_abs.bed产生的bed文件 -n 输出图片最上方的名字 -chr 最后一号染色体的名字 可使用"tail -n 1 *.bed"命令查看 #使用juicer call tad ref:https://github.com/aidenlab/juicer/wiki/Arrowhead /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/scripts/common/juicer_tools arrowhead --ignore_sparsity /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/work/aligned/inter.hic ./contact_domains_list/ ##使用juicer call loop nohup java -jar /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/scripts/common/juicer_tools.jar hiccups --cpu --threads 19 -r 5000,10000 --ignore_sparsity /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/work/aligned/inter.hic inter.hic.hiccups > loop.txt & nohup java -jar /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/scripts/common/juicer_tools.jar hiccups --gpu --threads 19 -r 2500,5000,7500,10000,12500,15000,17500,20000,22500 --ignore_sparsity /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/work/aligned/inter.hic inter.hic.hiccups > loop.txt &
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 10亿数据,如何做迁移?
· 推荐几款开源且免费的 .NET MAUI 组件库
· 清华大学推出第四讲使用 DeepSeek + DeepResearch 让科研像聊天一样简单!
· c# 半导体/led行业 晶圆片WaferMap实现 map图实现入门篇
· 易语言 —— 开山篇