variant变异 | Epigenome表观基因组 | Disease-susceptible gene 疾病易感基因
paper:cepip: context-dependent epigenomic weighting for prioritization of regulatory variants and disease-associated genes
Genotype-Tissue Expression Project (GTEx) - genome上的eQTL位点及其对特定组织的特定基因表达的影响,同时包含了不同eQTL之间的LD关系。这个整合多个疾病的数据。正如其名,该数据包含了genotype、tissue和gene表达的数据。
Roadmap Epigenomics Project - 测了成人各个组织以及胚胎发育过程中多个组织的表观数据(DNA甲基化、组蛋白修饰、开放染色质等),相当于ENCODE的补充,可以用于解读GWAS的变异数据。browser
这个研究的思路是什么?怎么构思的?可行性分析?预测regulatory variants在特定tissue或celltype里对基因调控的影响。
优势是什么?有的位点没有eQTL,但是我还是可以根据roadmap来预测
输入输出是什么?输入就是一个一个的SNP的数据,输出就是每一个SNP在每一个tissue里面的regulatory potential score
如何整合各种数据库的? 两个:epigenomic和eQTLs,identify chromatin features来预测变异的调控潜能。estimating a variant’s regulatory probability
RoadMap里面有疾病样本吗?没有,只有正常的组织样本。
context-dependent是什么意思?这里明显再装逼,context就是tissue。highly context-specific gene regulation,context-dependent manner,Genes are regulated in a highly context-specific manner. Both genetic and epigenetic gene regulations are tissue/cell type-specific and depend on chromatin states and interactions.
如何评价此方法的优劣?significant GWAS signal enrichment,using phenotypically relevant epigenomes to weight the GWAS singlenucleotide polymorphisms, we improve the statistical power of the gene-based association test
这个工具不针对任何疾病,只是利用了现有的GTEx的数据,最终就是输入SNP的数据,通过表观的打分,告诉你你某个组织里基因表达调控受影响的概率(是吗?)。
compute the composite likelihood of a given variant affecting the gene regulation,早就有人在做的打分工作了。
In this study, we used epigenomic maps of 127 tissues/cell types from the Roadmap Epigenomics Project [33] to develop a context-dependent model that could examine important chromatin features surrounding an eQTL and predict its regulatory potential.
对于复杂性状,通常会由很多遗传因素来控制,从而影响到表型。GWAS鉴定出了很多SNP,但是却只能解释部分heritability。
怎么鉴定带有一定effect size的causal的变异来解释缺失的heritability是现在的研究热点。大白话就是现在的GWAS只关注 pvalue < 5x10^-8 的SNP,但这些SNP只能解释很小一部分的遗传性,现在普遍认为缺失的那部分就是pvalue略小的SNP中。
这些SNP大部分都坐落在非编码区,覆盖了大量的基因调控元件,说明这些causal SNPs是通过影响基因表达来影响表型的。
Identifying causal variants with moderate effect size underlying the missing heritability is currently one of the biggest challenges
The majority of GWAS risk loci, as well as loci with subgenome-wide significance (P values between 1 × 10−5 and 5 × 10−8), localize to non-coding genomic regions with many gene regulatory signals [3], suggesting that most trait/disease causal SNPs exert their phenotypic effects by altering gene expression
另一个证据就是这些SNPs会富集在eQTL和开放染色质区域。
This is further supported by GWAS risk loci being enriched in genomic regions with many expression quantitative trait loci (eQTLs) and open chromatins
基因调控具有高度的tissues and celltypes特异性。
进阶:
epigenomic feature as predictors to predict eQTL,这是在训练的时候
tissue matching between two database,three method to do matching