GTEx简介 | eQTLGen | Blood eQTL
目录
什么是eQTL?是通过哪些数据计算得来的,数据格式是什么?
eQTL一般都富集在基因组的什么区域?
几个常见的eQTL数据库
什么是GTEx?目前第几版了?GTEx里面有哪些数据?
GTEx有哪几篇里程碑文章?
大部分课题组是如何利用GTEx数据的?
GTEx/eQTLGen数据下载download GTEx files
小知识
一个SNP与一个gene,一般就选TSS上下游的gene,blood是金标准。
因为染色体是线性的,LD的存在让所有的genetic的分析都变复杂了,找到的SNP可能不是causal的,它的邻居才是。这对eQTL来说也是一样的。【如果一个region里LD=1,那它们就可以看做是一个点,即使它们功能不同】
GWAS用的是common的SNP,causal SNP是未知的,肯定是有function的【肯定能知道起点是如何到达终点的】。
risk allele富集在了minor allele,Our statistical results revealed that risk alleles were enriched in minor alleles, especially for variants with low minor allele frequencies (MAFs < 0.1).
脑洞大开
如果是单倍体会如何遗传和发育?没有有性生殖,就没有重组重排,无性生殖,多样性无法保证,只能靠体细胞突变。genotype就是allele,GWAS和eQTL的计算单位都是allele了。
什么是eQTL?是通过哪些数据计算得来的,数据格式是什么?
google eQTL直接看图片
标准图形,三个genotype,然后就是某个基因的表达水平,近距离的就是cis,远距离的就是trans。
核心三要素:SNP、gene、tissue。
eQTL一般都富集在基因组的什么区域?
类似ATAC-seq的信号分布,主要富集在TSS上下游50kbp的范围内,在TSS附近有峰值。
在不同组织中,同一个位点的genotype和基因表达可能有相反的关系,突出了eQTL的组织特异性。
eQTL的另一个亮点,非编码区。most of the susceptible loci were found in non-coding regions of the genome
Here we describe “opposite eQTL effects”, i.e., gene expression effects of eQTLs that are in the opposite direction between different tissues, as the biologically meaningful annotations of genes and genetic variants for understanding the GWAS loci.
几个常见的eQTL数据库
GTEx
Blood eQTL
eQTLGen
什么是GTEx?目前第几版了?GTEx里面有哪些数据?
The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank. The GTEx Portal provides open access to data including gene expression, QTLs, and histology images.
翻译一下:tissue-specific gene expression and regulation,组织特异性基因表达和调控。54 non-diseased tissue sites across nearly 1000 individuals,千人、54种组织,测了WGS, WES, and RNA-Seq。gene expression, QTLs,主要数据就是基因表达和eQTL。
截至2020年09月23日,已经是v8了。
post-mortem tissues 尸体解剖的组织,全部是人的数据。
complex trait heritability/complex trait genetics
Majority of trait-associated variation is non-coding. 【coding基因只占genome 1-5%】
Using expression and epigenetic data to inform missing heritability【大部分trait的heritability很低,如何找那些missing的部分】
一般你有大量同一个个体的genotype和gene expression数据,你自然就会想到要做eQTL分析,即鉴定某个SNP的genotype是否与附近的基因表达是否有关联,如果找到感兴趣的基因,我们就可以深入挖掘。【想想很常见的genotype差异表达的boxplot】
如果样本量不够大,那么只能做简单的allelic expression,看某个SNP的某个allele是否在病人中特异或高度表达,从而继续深度挖掘。【很常见的GWAS下游分析,看risk allele是否在某个tissue里特异表达】
GTEx有哪几篇里程碑文章?
https://gtexportal.org/home/publicationsPage
The GTEx Consortium atlas of genetic regulatory effects across human tissues - Science 11 Sep 2020:
Cell type–specific genetic regulation of gene expression across human tissues - Science 11 Sep 2020:
新鲜出炉的文章,测了各种cell type的数据,根据统计学的deconvolution方法,鉴定出来了更多的eQTL。
大部分课题组是如何利用GTEx数据的?
参考:Mulin Jun Li
eQTLGen数据下载
新手建议先用这个数据库练练手,数据格式比较简单。
cis-eQTLs
This page contains the cis-eQTL results. The statistically significant cis-eQTLs and SMR-prioritised genes for several traits are browsable, the other files can be downloaded.
下载Significant cis-eQTLs文件
Pvalue SNP SNPChr SNPPos AssessedAllele OtherAllele Zscore Gene GeneSymbol GeneChr GenePos NrCohorts NrSamples FDR BonferroniP 3.2717E-310 rs12230244 12 10117369 T A 200.7534 ENSG00000172322 CLEC12A 12 10126104 34 30596 0.0 4.1662E-302 3.2717E-310 rs12229020 12 10117683 G C 200.6568 ENSG00000172322 CLEC12A 12 10126104 34 30596 0.0 4.1662E-302 3.2717E-310 rs61913527 12 10116198 T C 200.2654 ENSG00000172322 CLEC12A 12 10126104 34 30598 0.0 4.1662E-302
Files ----- File with full cis-eQTL results: 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz File with significant (FDR<0.05) cis-eQTL results: 2019-12-11-cis-eQTLsFDR0.05-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz Column Names ------------ Pvalue - P-value SNP - SNP rs ID SNPChr - SNP chromosome SNPPos - SNP position AssessedAllele - Assessed allele, the Z-score refers to this allele OtherAllele - Not assessed allele Zscore - Z-score Gene - ENSG name (Ensembl v71) of the eQTL gene GeneSymbol - HGNC name of the gene GeneChr - Gene chromosome GenePos - Centre of gene position NrCohorts - Total number of cohorts where this SNP-gene combination was tested NrSamples - Total number of samples where this SNP-gene combination was tested FDR - False discovery rate estimated based on permutations BonferroniP - P-value after Bonferroni correction Additional information ---------------------- These files contain all cis-eQTL results from eQTLGen, accompanying the article. 19,250 genes that showed expression in blood were tested. Every SNP-gene combination with a distance <1Mb from the center of the gene and tested in at least 2 cohorts was included. Associations where SNP/proxy positioned in Illumina probe were not removed from combined analysis.
GTEx数据下载download GTEx files
Data available include:
- BAM files for RNA-Seq, Whole Exome Seq, and Whole Genome Seq
- Genotype Calls (.vcf) for OMNI SNP Arrays, WES, and WGS
- OMNI SNP Array Intensity files (.idat and .gtc)
- Affymetrix Expression Array Intensity files (.cel)
- Allele Specific Expression (ASE) tables
- All expression matrices from the Portal, including samples that did not pass the Analysis Freeze QC
- Sample Attributes
- Subject Phenotypes
数据格式
下载GTEx_Analysis_v8_eQTL_EUR.tar,某个population的数据
解压后有三个文件夹:
eqtls expression_matrices expression_covariates
eqtls:按组织分文件存储,每个组织两个文件
eqtls/Vagina.v8.EUR.egenes.txt.gz:
eqtls/Vagina.v8.EUR.signif_pairs.txt.gz:
Adipose_Subcutaneous.v8.EUR.egenes.txt.gz Esophagus_Gastroesophageal_Junction.v8.EUR.signif_pairs.txt.gz Adipose_Subcutaneous.v8.EUR.signif_pairs.txt.gz Esophagus_Mucosa.v8.EUR.egenes.txt.gz Adipose_Visceral_Omentum.v8.EUR.egenes.txt.gz Esophagus_Mucosa.v8.EUR.signif_pairs.txt.gz Adipose_Visceral_Omentum.v8.EUR.signif_pairs.txt.gz Esophagus_Muscularis.v8.EUR.egenes.txt.gz Adrenal_Gland.v8.EUR.egenes.txt.gz Esophagus_Muscularis.v8.EUR.signif_pairs.txt.gz Adrenal_Gland.v8.EUR.signif_pairs.txt.gz Heart_Atrial_Appendage.v8.EUR.egenes.txt.gz Artery_Aorta.v8.EUR.egenes.txt.gz Heart_Atrial_Appendage.v8.EUR.signif_pairs.txt.gz Artery_Aorta.v8.EUR.signif_pairs.txt.gz Heart_Left_Ventricle.v8.EUR.egenes.txt.gz Artery_Coronary.v8.EUR.egenes.txt.gz Heart_Left_Ventricle.v8.EUR.signif_pairs.txt.gz Artery_Coronary.v8.EUR.signif_pairs.txt.gz Kidney_Cortex.v8.EUR.egenes.txt.gz Artery_Tibial.v8.EUR.egenes.txt.gz Kidney_Cortex.v8.EUR.signif_pairs.txt.gz Artery_Tibial.v8.EUR.signif_pairs.txt.gz Liver.v8.EUR.egenes.txt.gz Brain_Amygdala.v8.EUR.egenes.txt.gz Liver.v8.EUR.signif_pairs.txt.gz Brain_Amygdala.v8.EUR.signif_pairs.txt.gz Lung.v8.EUR.egenes.txt.gz Brain_Anterior_cingulate_cortex_BA24.v8.EUR.egenes.txt.gz Lung.v8.EUR.signif_pairs.txt.gz Brain_Anterior_cingulate_cortex_BA24.v8.EUR.signif_pairs.txt.gz Minor_Salivary_Gland.v8.EUR.egenes.txt.gz Brain_Caudate_basal_ganglia.v8.EUR.egenes.txt.gz Minor_Salivary_Gland.v8.EUR.signif_pairs.txt.gz Brain_Caudate_basal_ganglia.v8.EUR.signif_pairs.txt.gz Muscle_Skeletal.v8.EUR.egenes.txt.gz Brain_Cerebellar_Hemisphere.v8.EUR.egenes.txt.gz Muscle_Skeletal.v8.EUR.signif_pairs.txt.gz Brain_Cerebellar_Hemisphere.v8.EUR.signif_pairs.txt.gz Nerve_Tibial.v8.EUR.egenes.txt.gz Brain_Cerebellum.v8.EUR.egenes.txt.gz Nerve_Tibial.v8.EUR.signif_pairs.txt.gz Brain_Cerebellum.v8.EUR.signif_pairs.txt.gz Ovary.v8.EUR.egenes.txt.gz Brain_Cortex.v8.EUR.egenes.txt.gz Ovary.v8.EUR.signif_pairs.txt.gz Brain_Cortex.v8.EUR.signif_pairs.txt.gz Pancreas.v8.EUR.egenes.txt.gz Brain_Frontal_Cortex_BA9.v8.EUR.egenes.txt.gz Pancreas.v8.EUR.signif_pairs.txt.gz Brain_Frontal_Cortex_BA9.v8.EUR.signif_pairs.txt.gz Pituitary.v8.EUR.egenes.txt.gz Brain_Hippocampus.v8.EUR.egenes.txt.gz Pituitary.v8.EUR.signif_pairs.txt.gz Brain_Hippocampus.v8.EUR.signif_pairs.txt.gz Prostate.v8.EUR.egenes.txt.gz Brain_Hypothalamus.v8.EUR.egenes.txt.gz Prostate.v8.EUR.signif_pairs.txt.gz Brain_Hypothalamus.v8.EUR.signif_pairs.txt.gz Skin_Not_Sun_Exposed_Suprapubic.v8.EUR.egenes.txt.gz Brain_Nucleus_accumbens_basal_ganglia.v8.EUR.egenes.txt.gz Skin_Not_Sun_Exposed_Suprapubic.v8.EUR.signif_pairs.txt.gz Brain_Nucleus_accumbens_basal_ganglia.v8.EUR.signif_pairs.txt.gz Skin_Sun_Exposed_Lower_leg.v8.EUR.egenes.txt.gz Brain_Putamen_basal_ganglia.v8.EUR.egenes.txt.gz Skin_Sun_Exposed_Lower_leg.v8.EUR.signif_pairs.txt.gz Brain_Putamen_basal_ganglia.v8.EUR.signif_pairs.txt.gz Small_Intestine_Terminal_Ileum.v8.EUR.egenes.txt.gz Brain_Spinal_cord_cervical_c-1.v8.EUR.egenes.txt.gz Small_Intestine_Terminal_Ileum.v8.EUR.signif_pairs.txt.gz Brain_Spinal_cord_cervical_c-1.v8.EUR.signif_pairs.txt.gz Spleen.v8.EUR.egenes.txt.gz Brain_Substantia_nigra.v8.EUR.egenes.txt.gz Spleen.v8.EUR.signif_pairs.txt.gz Brain_Substantia_nigra.v8.EUR.signif_pairs.txt.gz Stomach.v8.EUR.egenes.txt.gz Breast_Mammary_Tissue.v8.EUR.egenes.txt.gz Stomach.v8.EUR.signif_pairs.txt.gz Breast_Mammary_Tissue.v8.EUR.signif_pairs.txt.gz Testis.v8.EUR.egenes.txt.gz Cells_Cultured_fibroblasts.v8.EUR.egenes.txt.gz Testis.v8.EUR.signif_pairs.txt.gz Cells_Cultured_fibroblasts.v8.EUR.signif_pairs.txt.gz Thyroid.v8.EUR.egenes.txt.gz Cells_EBV-transformed_lymphocytes.v8.EUR.egenes.txt.gz Thyroid.v8.EUR.signif_pairs.txt.gz Cells_EBV-transformed_lymphocytes.v8.EUR.signif_pairs.txt.gz Uterus.v8.EUR.egenes.txt.gz Colon_Sigmoid.v8.EUR.egenes.txt.gz Uterus.v8.EUR.signif_pairs.txt.gz Colon_Sigmoid.v8.EUR.signif_pairs.txt.gz Vagina.v8.EUR.egenes.txt.gz Colon_Transverse.v8.EUR.egenes.txt.gz Vagina.v8.EUR.signif_pairs.txt.gz Colon_Transverse.v8.EUR.signif_pairs.txt.gz Whole_Blood.v8.EUR.egenes.txt.gz Esophagus_Gastroesophageal_Junction.v8.EUR.egenes.txt.gz Whole_Blood.v8.EUR.signif_pairs.txt.gz
expression_matrices:bed格式的表达数据,后面每一列就是一个人,数据依旧是按组织分文件存储。
#chr start end gene_id GTEX-111CU GTEX-111FC GTEX-111VG GTEX-111YS GTEX-1122O GTEX-1128S GTEX-11DXX GTEX-11DZ1 GTEX-11EI6 GTEX-11EM3 GTEX chr1 29552 29553 ENSG00000227232.5 -0.8416212335729142 -0.1573106846101707 -0.6744897501960817 -0.1414683013821586 -0.5244005127080409 0.37970195786468147
expression_covariates:协变量,去掉confounder用的。
参考:
GTEx introduction.pdf - 入门简介必看
The Genotype-Tissue Expression Project