ChIP-seq流程结果文件解读
接前面文章:ChIP-seq | ATAC-seq | RNA-seq | 数据分析流程
前面已经把pipeline跑完了,但是关于结果的解读还是不清楚,这里来深入探讨一下。
复习:
- pipeline:https://github.com/ENCODE-DCC/chip-seq-pipeline2
- 大致流程图:https://www.encodeproject.org/pipelines/ENCPL272XAE
- chip-seq教程:Introduction to ChIP-Seq using high-performance computing
- 教程课时清单
- 课程公众号中文翻译版
输入文件:~/project/epigenetic/analysis/ChIP-seq/encode-pipeline/encc/H3K27ac/encc.chip.full.json
"chip.title" : "hENCC ChIP-seq (H3K27ac)", "chip.description" : "ENCC-K27-2_1,ENCC-I1_1 (1st); ENCC-K27-1_1,ENCC-I2_1 (2st) ", "chip.pipeline_type" : "histone", "chip.aligner" : "bowtie2", "chip.align_only" : false, "chip.true_rep_only" : false, "chip.genome_tsv" : "~/softwares/chip-seq-pipeline2/db/hg19.tsv", "chip.paired_end" : true, "chip.ctl_paired_end" : true, "chip.fastqs_rep1_R1" : [ "~/project2/analysis/ChIP-seq/encode-pipeline/encc/fastq/ENCC-K27-2_1.fastq.gz" ], "chip.fastqs_rep1_R2" : [ "~/project2/analysis/ChIP-seq/encode-pipeline/encc/fastq/ENCC-K27-2_2.fastq.gz" ], "chip.fastqs_rep2_R1" : [ "~/project2/analysis/ChIP-seq/encode-pipeline/encc/fastq/ENCC-K27-1_1.fastq.gz" ], "chip.fastqs_rep2_R2" : [ "~/project2/analysis/ChIP-seq/encode-pipeline/encc/fastq/ENCC-K27-1_2.fastq.gz" ], "chip.ctl_fastqs_rep1_R1" : [ "~/project2/analysis/ChIP-seq/encode-pipeline/encc/fastq/ENCC-I1_1.fastq.gz" ], "chip.ctl_fastqs_rep1_R2" : [ "~/project2/analysis/ChIP-seq/encode-pipeline/encc/fastq/ENCC-I1_2.fastq.gz" ], "chip.ctl_fastqs_rep2_R1" : [ "~/project2/analysis/ChIP-seq/encode-pipeline/encc/fastq/ENCC-I2_1.fastq.gz" ], "chip.ctl_fastqs_rep2_R2" : [ "~/project2/analysis/ChIP-seq/encode-pipeline/encc/fastq/ENCC-I2_2.fastq.gz" ],
输出文件:
Output specification for chip.wdl - 解释了每一个后缀的文件是什么
所有中间文件【目录里面记录了具体的脚本的输出文件,可以慢慢查看】:
call-align call-call_peak_pooled call-filter_ctl call-macs2_signal_track_pooled call-pool_ta_pr2 call-align_ctl call-call_peak_ppr1 call-filter_R1 call-overlap call-qc_report call-align_R1 call-call_peak_ppr2 call-fraglen_mean call-overlap_ppr call-read_genome_tsv call-bam2ta call-call_peak_pr1 call-gc_bias call-overlap_pr call-reproducibility_overlap call-bam2ta_ctl call-call_peak_pr2 call-idr_ppr call-pool_ta call-spr call-bam2ta_no_dedup_R1 call-choose_ctl call-jsd call-pool_ta_ctl call-xcor call-call_peak call-filter call-macs2_signal_track call-pool_ta_pr1 metadata.json
了解一下每一步干了什么
- chip.align:比对
- chip.filter:过滤
- chip.bam2ta:converts sequence alignments in BAM format into BED,参考
- chip.spr:
- chip.jsd:
- chip.xcor:cross-correlation,参考
- chip.call_peak:callpeak命令,peak calling
- chip.macs2_signal_track:bdgcmp命令,signal generation
- chip.filter_picard_java
- chip.gc_bias_picard_java
pipeline的流程图:github备份 HTML
tagAlign.gz是什么文件,干什么用的?sequencing tags
chr13 99073542 99073643 N 1000 + chr13 99073563 99073664 N 1000 - chr11 122621369 122621470 N 1000 - chr11 122621361 122621462 N 1000 + chr8 49450819 49450920 N 1000 + chr8 49450886 49450987 N 1000 -