GTEx简介 | eQTLGen | Blood eQTL

目录

什么是eQTL?是通过哪些数据计算得来的,数据格式是什么?

eQTL一般都富集在基因组的什么区域?

几个常见的eQTL数据库

什么是GTEx?目前第几版了?GTEx里面有哪些数据?

GTEx有哪几篇里程碑文章?

大部分课题组是如何利用GTEx数据的?

GTEx/eQTLGen数据下载download GTEx files

 

小知识

一个SNP与一个gene,一般就选TSS上下游的gene,blood是金标准。

因为染色体是线性的,LD的存在让所有的genetic的分析都变复杂了,找到的SNP可能不是causal的,它的邻居才是。这对eQTL来说也是一样的。【如果一个region里LD=1,那它们就可以看做是一个点,即使它们功能不同】

GWAS用的是common的SNP,causal SNP是未知的,肯定是有function的【肯定能知道起点是如何到达终点的】。

risk allele富集在了minor allele,Our statistical results revealed that risk alleles were enriched in minor alleles, especially for variants with low minor allele frequencies (MAFs < 0.1).

脑洞大开

如果是单倍体会如何遗传和发育?没有有性生殖,就没有重组重排,无性生殖,多样性无法保证,只能靠体细胞突变。genotype就是allele,GWAS和eQTL的计算单位都是allele了。

 

什么是eQTL?是通过哪些数据计算得来的,数据格式是什么?

google eQTL直接看图片

标准图形,三个genotype,然后就是某个基因的表达水平,近距离的就是cis,远距离的就是trans。

核心三要素:SNP、gene、tissue。

 

eQTL一般都富集在基因组的什么区域?

类似ATAC-seq的信号分布,主要富集在TSS上下游50kbp的范围内,在TSS附近有峰值。

在不同组织中,同一个位点的genotype和基因表达可能有相反的关系,突出了eQTL的组织特异性。

eQTL的另一个亮点,非编码区。most of the susceptible loci were found in non-coding regions of the genome

Here we describe “opposite eQTL effects”, i.e., gene expression effects of eQTLs that are in the opposite direction between different tissues, as the biologically meaningful annotations of genes and genetic variants for understanding the GWAS loci.

参考:Biological characterization of expression quantitative trait loci (eQTLs) showing tissue-specific opposite directional effects

 

几个常见的eQTL数据库 

GTEx

Blood eQTL

eQTLGen

 

什么是GTEx?目前第几版了?GTEx里面有哪些数据?

The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank. The GTEx Portal provides open access to data including gene expression, QTLs, and histology images.

翻译一下:tissue-specific gene expression and regulation,组织特异性基因表达和调控。54 non-diseased tissue sites across nearly 1000 individuals,千人、54种组织,测了WGS, WES, and RNA-Seq。gene expression, QTLs,主要数据就是基因表达和eQTL。

截至2020年09月23日,已经是v8了。

post-mortem tissues 尸体解剖的组织,全部是人的数据。

 

complex trait heritability/complex trait genetics 

Majority of trait-associated variation is non-coding. 【coding基因只占genome 1-5%】

Using expression and epigenetic data to inform missing heritability【大部分trait的heritability很低,如何找那些missing的部分】

 

一般你有大量同一个个体的genotype和gene expression数据,你自然就会想到要做eQTL分析,即鉴定某个SNP的genotype是否与附近的基因表达是否有关联,如果找到感兴趣的基因,我们就可以深入挖掘。【想想很常见的genotype差异表达的boxplot】

如果样本量不够大,那么只能做简单的allelic expression,看某个SNP的某个allele是否在病人中特异或高度表达,从而继续深度挖掘。【很常见的GWAS下游分析,看risk allele是否在某个tissue里特异表达】

 

GTEx有哪几篇里程碑文章?

https://gtexportal.org/home/publicationsPage

The GTEx Consortium atlas of genetic regulatory effects across human tissues - Science  11 Sep 2020:

Cell type–specific genetic regulation of gene expression across human tissues - Science  11 Sep 2020:

新鲜出炉的文章,测了各种cell type的数据,根据统计学的deconvolution方法,鉴定出来了更多的eQTL。

 

大部分课题组是如何利用GTEx数据的? 

参考:Mulin Jun Li

 

eQTLGen数据下载

新手建议先用这个数据库练练手,数据格式比较简单。

cis-eQTLs
This page contains the cis-eQTL results. The statistically significant cis-eQTLs and SMR-prioritised genes for several traits are browsable, the other files can be downloaded.

下载Significant cis-eQTLs文件

1
2
3
4
Pvalue  SNP SNPChr  SNPPos  AssessedAllele  OtherAllele Zscore  Gene    GeneSymbol  GeneChr GenePos NrCohorts   NrSamples   FDR BonferroniP
3.2717E-310 rs12230244  12  10117369    T   A   200.7534    ENSG00000172322 CLEC12A 12  10126104    34  30596   0.0 4.1662E-302
3.2717E-310 rs12229020  12  10117683    G   C   200.6568    ENSG00000172322 CLEC12A 12  10126104    34  30596   0.0 4.1662E-302
3.2717E-310 rs61913527  12  10116198    T   C   200.2654    ENSG00000172322 CLEC12A 12  10126104    34  30598   0.0 4.1662E-302

  

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Files
-----
File with full cis-eQTL results: 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
File with significant (FDR<0.05) cis-eQTL results: 2019-12-11-cis-eQTLsFDR0.05-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
 
Column Names
------------
Pvalue - P-value
SNP - SNP rs ID
SNPChr - SNP chromosome
SNPPos - SNP position
AssessedAllele - Assessed allele, the Z-score refers to this allele
OtherAllele - Not assessed allele
Zscore - Z-score
Gene - ENSG name (Ensembl v71) of the eQTL gene
GeneSymbol - HGNC name of the gene
GeneChr - Gene chromosome
GenePos - Centre of gene position
NrCohorts - Total number of cohorts where this SNP-gene combination was tested
NrSamples - Total number of samples where this SNP-gene combination was tested
FDR - False discovery rate estimated based on permutations
BonferroniP - P-value after Bonferroni correction
 
Additional information
----------------------
These files contain all cis-eQTL results from eQTLGen, accompanying the article.
19,250 genes that showed expression in blood were tested.
Every SNP-gene combination with a distance <1Mb from the center of the gene and  tested in at least 2 cohorts was included.
Associations where SNP/proxy positioned in Illumina probe were not removed from combined analysis.

  

 

GTEx数据下载download GTEx files

GTEx Analysis V8

Data available include:

  • BAM files for RNA-Seq, Whole Exome Seq, and Whole Genome Seq
  • Genotype Calls (.vcf) for OMNI SNP Arrays, WES, and WGS
  • OMNI SNP Array Intensity files (.idat and .gtc)
  • Affymetrix Expression Array Intensity files (.cel)
  • Allele Specific Expression (ASE) tables
  • All expression matrices from the Portal, including samples that did not pass the Analysis Freeze QC
  • Sample Attributes
  • Subject Phenotypes

数据格式

下载GTEx_Analysis_v8_eQTL_EUR.tar,某个population的数据

解压后有三个文件夹:

1
2
3
eqtls
expression_matrices
expression_covariates

  

eqtls:按组织分文件存储,每个组织两个文件

eqtls/Vagina.v8.EUR.egenes.txt.gz:

eqtls/Vagina.v8.EUR.signif_pairs.txt.gz:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Adipose_Subcutaneous.v8.EUR.egenes.txt.gz                         Esophagus_Gastroesophageal_Junction.v8.EUR.signif_pairs.txt.gz
Adipose_Subcutaneous.v8.EUR.signif_pairs.txt.gz                   Esophagus_Mucosa.v8.EUR.egenes.txt.gz
Adipose_Visceral_Omentum.v8.EUR.egenes.txt.gz                     Esophagus_Mucosa.v8.EUR.signif_pairs.txt.gz
Adipose_Visceral_Omentum.v8.EUR.signif_pairs.txt.gz               Esophagus_Muscularis.v8.EUR.egenes.txt.gz
Adrenal_Gland.v8.EUR.egenes.txt.gz                                Esophagus_Muscularis.v8.EUR.signif_pairs.txt.gz
Adrenal_Gland.v8.EUR.signif_pairs.txt.gz                          Heart_Atrial_Appendage.v8.EUR.egenes.txt.gz
Artery_Aorta.v8.EUR.egenes.txt.gz                                 Heart_Atrial_Appendage.v8.EUR.signif_pairs.txt.gz
Artery_Aorta.v8.EUR.signif_pairs.txt.gz                           Heart_Left_Ventricle.v8.EUR.egenes.txt.gz
Artery_Coronary.v8.EUR.egenes.txt.gz                              Heart_Left_Ventricle.v8.EUR.signif_pairs.txt.gz
Artery_Coronary.v8.EUR.signif_pairs.txt.gz                        Kidney_Cortex.v8.EUR.egenes.txt.gz
Artery_Tibial.v8.EUR.egenes.txt.gz                                Kidney_Cortex.v8.EUR.signif_pairs.txt.gz
Artery_Tibial.v8.EUR.signif_pairs.txt.gz                          Liver.v8.EUR.egenes.txt.gz
Brain_Amygdala.v8.EUR.egenes.txt.gz                               Liver.v8.EUR.signif_pairs.txt.gz
Brain_Amygdala.v8.EUR.signif_pairs.txt.gz                         Lung.v8.EUR.egenes.txt.gz
Brain_Anterior_cingulate_cortex_BA24.v8.EUR.egenes.txt.gz         Lung.v8.EUR.signif_pairs.txt.gz
Brain_Anterior_cingulate_cortex_BA24.v8.EUR.signif_pairs.txt.gz   Minor_Salivary_Gland.v8.EUR.egenes.txt.gz
Brain_Caudate_basal_ganglia.v8.EUR.egenes.txt.gz                  Minor_Salivary_Gland.v8.EUR.signif_pairs.txt.gz
Brain_Caudate_basal_ganglia.v8.EUR.signif_pairs.txt.gz            Muscle_Skeletal.v8.EUR.egenes.txt.gz
Brain_Cerebellar_Hemisphere.v8.EUR.egenes.txt.gz                  Muscle_Skeletal.v8.EUR.signif_pairs.txt.gz
Brain_Cerebellar_Hemisphere.v8.EUR.signif_pairs.txt.gz            Nerve_Tibial.v8.EUR.egenes.txt.gz
Brain_Cerebellum.v8.EUR.egenes.txt.gz                             Nerve_Tibial.v8.EUR.signif_pairs.txt.gz
Brain_Cerebellum.v8.EUR.signif_pairs.txt.gz                       Ovary.v8.EUR.egenes.txt.gz
Brain_Cortex.v8.EUR.egenes.txt.gz                                 Ovary.v8.EUR.signif_pairs.txt.gz
Brain_Cortex.v8.EUR.signif_pairs.txt.gz                           Pancreas.v8.EUR.egenes.txt.gz
Brain_Frontal_Cortex_BA9.v8.EUR.egenes.txt.gz                     Pancreas.v8.EUR.signif_pairs.txt.gz
Brain_Frontal_Cortex_BA9.v8.EUR.signif_pairs.txt.gz               Pituitary.v8.EUR.egenes.txt.gz
Brain_Hippocampus.v8.EUR.egenes.txt.gz                            Pituitary.v8.EUR.signif_pairs.txt.gz
Brain_Hippocampus.v8.EUR.signif_pairs.txt.gz                      Prostate.v8.EUR.egenes.txt.gz
Brain_Hypothalamus.v8.EUR.egenes.txt.gz                           Prostate.v8.EUR.signif_pairs.txt.gz
Brain_Hypothalamus.v8.EUR.signif_pairs.txt.gz                     Skin_Not_Sun_Exposed_Suprapubic.v8.EUR.egenes.txt.gz
Brain_Nucleus_accumbens_basal_ganglia.v8.EUR.egenes.txt.gz        Skin_Not_Sun_Exposed_Suprapubic.v8.EUR.signif_pairs.txt.gz
Brain_Nucleus_accumbens_basal_ganglia.v8.EUR.signif_pairs.txt.gz  Skin_Sun_Exposed_Lower_leg.v8.EUR.egenes.txt.gz
Brain_Putamen_basal_ganglia.v8.EUR.egenes.txt.gz                  Skin_Sun_Exposed_Lower_leg.v8.EUR.signif_pairs.txt.gz
Brain_Putamen_basal_ganglia.v8.EUR.signif_pairs.txt.gz            Small_Intestine_Terminal_Ileum.v8.EUR.egenes.txt.gz
Brain_Spinal_cord_cervical_c-1.v8.EUR.egenes.txt.gz               Small_Intestine_Terminal_Ileum.v8.EUR.signif_pairs.txt.gz
Brain_Spinal_cord_cervical_c-1.v8.EUR.signif_pairs.txt.gz         Spleen.v8.EUR.egenes.txt.gz
Brain_Substantia_nigra.v8.EUR.egenes.txt.gz                       Spleen.v8.EUR.signif_pairs.txt.gz
Brain_Substantia_nigra.v8.EUR.signif_pairs.txt.gz                 Stomach.v8.EUR.egenes.txt.gz
Breast_Mammary_Tissue.v8.EUR.egenes.txt.gz                        Stomach.v8.EUR.signif_pairs.txt.gz
Breast_Mammary_Tissue.v8.EUR.signif_pairs.txt.gz                  Testis.v8.EUR.egenes.txt.gz
Cells_Cultured_fibroblasts.v8.EUR.egenes.txt.gz                   Testis.v8.EUR.signif_pairs.txt.gz
Cells_Cultured_fibroblasts.v8.EUR.signif_pairs.txt.gz             Thyroid.v8.EUR.egenes.txt.gz
Cells_EBV-transformed_lymphocytes.v8.EUR.egenes.txt.gz            Thyroid.v8.EUR.signif_pairs.txt.gz
Cells_EBV-transformed_lymphocytes.v8.EUR.signif_pairs.txt.gz      Uterus.v8.EUR.egenes.txt.gz
Colon_Sigmoid.v8.EUR.egenes.txt.gz                                Uterus.v8.EUR.signif_pairs.txt.gz
Colon_Sigmoid.v8.EUR.signif_pairs.txt.gz                          Vagina.v8.EUR.egenes.txt.gz
Colon_Transverse.v8.EUR.egenes.txt.gz                             Vagina.v8.EUR.signif_pairs.txt.gz
Colon_Transverse.v8.EUR.signif_pairs.txt.gz                       Whole_Blood.v8.EUR.egenes.txt.gz
Esophagus_Gastroesophageal_Junction.v8.EUR.egenes.txt.gz          Whole_Blood.v8.EUR.signif_pairs.txt.gz

  

expression_matrices:bed格式的表达数据,后面每一列就是一个人,数据依旧是按组织分文件存储。

1
2
#chr    start   end     gene_id GTEX-111CU      GTEX-111FC      GTEX-111VG      GTEX-111YS      GTEX-1122O      GTEX-1128S      GTEX-11DXX      GTEX-11DZ1      GTEX-11EI6      GTEX-11EM3      GTEX
chr1    29552   29553   ENSG00000227232.5       -0.8416212335729142     -0.1573106846101707     -0.6744897501960817     -0.1414683013821586     -0.5244005127080409     0.37970195786468147

  

expression_covariates:协变量,去掉confounder用的。  

 

 

 

参考: 

GTEx introduction.pdf - 入门简介必看

The Genotype-Tissue Expression Project

 

posted @   Life·Intelligence  阅读(15899)  评论(0编辑  收藏  举报
(评论功能已被禁用)
编辑推荐:
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
TOP
点击右上角即可分享
微信分享提示