GWAS研究可利用的数据库(20211008更新)
1、列表包括数据库名称、表型、是否能下载到基因型(genotype)、是否能下载到GWAS结果文件(P值、效应值、SNP位点)。目前收集到的有如下:
参考到这些数据库的文献:Genome-wide association study identifies 74 loci associated with educational attainment
2、The Japanese Genotype-phenotype Archive (JGA) :该数据拥有个体水平的基因型和表型数据,需要申请,已经有人做过GWAS了,数据库连接:https://www.ddbj.nig.ac.jp/jga/index-e.html
3、ExAC,不提供个体水平的genotype,但提供vcf、CNV、coverage等。表型只提供已经发表过的表型,比如二型糖尿病。
ExAC涉及的population和样本数:
Population |
Male Samples |
Female Samples |
Total |
African/African American (AFR) |
1,888 |
3,315 |
5,203 |
Latino (AMR) |
2,254 |
3,535 |
5,789 |
East Asian (EAS) |
2,016 |
2,311 |
4,327 |
Finnish (FIN) |
2,084 |
1,223 |
3,307 |
Non-Finnish European (NFE) |
18,740 |
14,630 |
33,370 |
South Asian (SAS) |
6,387 |
1,869 |
8,256 |
Other (OTH) |
275 |
179 |
454 |
Total |
33,644 |
27,062 |
60,706 |
ExAC可下载的数据:
FTP Link |
Description |
Sites VCF |
VCF of Variant Sites |
CNV |
CNV Counts and Intolerance Scores |
Coverage |
Per Base Coverage |
Functional Gene Constraint |
Functional Gene Constraint Scores for ExAC and Subsets |
Manuscript Data |
Variant Tables Used in Manuscript |
Resources |
Exome Calling and Purcell5k Intervals |
Subsets |
Non-TCGA VCF Subset |
数据库链接:http://exac.broadinstitute.org/downloads
4、Simons Genome Diversity Project (SGDP)
提供279个样本,涉及的群体有:美洲、非洲、东亚、南亚、西欧、大洋洲;提供vcf、Phased genotypes、STR、BAMS for Y-chromosomes
链接地址:http://reichdata.hms.harvard.edu/pub/datasets/sgdp/
5、CHINESE MILLIONOME DATABASE
网址:https://db.cngb.org/cmdb/
The Chinese Millionome Database(CMDB) is a unique large-scale Chinese genomics database produced by BGI and hosted in the National GeneBank. The CMDB delivers peridical and useful variation information and scientific insights derived from the analysis of millions of Chinese sequencing data. The results aim to promote genetic research and precision medicine actions in China.
The delivering information includes any of detected variants and the corresponding allele frequency, annotation, frequency comparison to the global populations from existing databases, etc.
6 、UK biobank
UKbiobank的GWAS summary数据:https://ctg.cncr.nl/documents/p1651/ukb2_sumstats.tar.gz
这个数据很大,下载请谨慎。
7、失眠、阿尔兹海默症、各种精神类疾病、智力等的summary数据库
https://ctg.cncr.nl/software/summary_statistics
8、日本的公共数据库National Bioscience Database Centre (NBDC) Human Database
https://humandbs.biosciencedbc.jp/
9、CVDKP Datasets
表型:人体测量、心血管疾病、心电图、房颤、血脂、血糖、精神病
http://www.kp4cd.org/datasets/mi
10、CARDIoGRAMplusC4D Consortium
表型:冠状动脉疾病、心血管疾病
http://www.cardiogramplusc4d.org/data-downloads/
11、diagram consortium
表型:T2D
http://diagram-consortium.org/downloads.html
12、GWAS公共数据以及代码存储
https://data.mendeley.com/research-data/
13、日本的GWAS summary数据
http://jenger.riken.jp/en/result
14、GWAS Catalog
https://www.ebi.ac.uk/gwas/
15、基于SAIGE的UKBB summary数据
https://www.leelabsg.org/resources
16、血糖特征相关的summary数据(N= 281,416)
https://www.magicinvestigators.org/
17、FinnGen GWAS Summary Statistics
https://www.finngen.fi/en/access_results
18、 GWAS-VCF
1)该网站囊括了34,513个GWAS摘要统计信息,其中10,000多个GWAS摘要统计信息是齐全的;
2)将所有的GWAS摘要统计信息统一整合为VCF格式 (简称GWAS-VCF) ,方便进行post-GWAS分析。
3)可在线下载GWAS摘要统计信息,也可以通过API调用。
https://gwas.mrcieu.ac.uk/
本文来自博客园,作者:橙子牛奶糖(陈文燕),转载请注明原文链接:https://www.cnblogs.com/chenwenyan/p/8969440.html