【翻译】RAINBOWR Github Repo的Readme

写在前面

原文链接：https://github.com/KosukeHamazaki/RAINBOWR/blob/master/README.md
最近看这个包的使用方法，顺手把Readme翻译了，侵权删

正文

通过使用R优化权重进行可靠的关联推理（Reliable Association INference By Optimizing Weights with R，RAINBOWR）

作者：Kosuke Hamazaki

日期：2019/03/25 （上次更新：2020/10/26）

注意！！

`RAINBOWR`的论文已经在PLOS Computational Biology上发布了（ https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007663 ）。如果您在您的文章中使用了`RAINBOWR`，请引用如下：

Hamazaki, K. and Iwata, H. (2020) RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method. PLOS Computational Biology, 16(2): e1007663.

`RAINBOWR`包的稳定版本已经在CRAN (Comprehensive R Archive Network)上可用。

`RAINBOWR`的较老版本名为`RAINBOW`，位于https://github.com/KosukeHamazaki/RAINBOW 。

我们将包名由RAINBOW修改为RAINBOWR的原因是：当我们向CRAN上传提交我们的包时，原始的RAINBOW包名与rainbow包( https://cran.r-project.org/package=rainbow ) 冲突了。

这个Repo中存储的是R包RAINBOWR的相关代码。接下来，我们将描述如何安装以及如何使用RAINBOWR

`RAINBOWR`是什么？

RAINBOWR（Reliable Association INference By Optimizing Weight with R，通过使用R优化权重进行可靠的关联推理）是一个包，用于执行以下几种类型的GWAS。

使用RGWAS.normal函数执行对单个SNP的GWAS；
使用RGWAS.multisnp函数执行对SNP集合（或者基因集合）的GWAS（同时对多个SNP进行检验）；
使用RGWAS.epistasis函数执行对表观（SNP集合与SNP集合相互作用）效应的检验（非常慢且不可靠）。

RAINBOWR还提供了一些函数来解决线性混合效应模型。

使用EMM.cpp函数求解单核线性混合效应模型；
使用EM3.cpp函数求解多核线性混合效应模型（对于一般的内核，没有那么快）；
使用EM3.linker.cpp函数求解多核线性混合效应模型（对于线性内核，速度较快）。

通过利用这些功能，你可以评估基因组遗传率并进行基因组预测（GP）。

最后，RAINBOWR还提供了其他有用的功能。

qq和manhattan函数用于绘制QQ图和曼哈顿图；
modify.data函数用于匹配表型和标记物基因型数据；
CalcThresold函数用于计算GWAS结果的阈值；
See函数用于查看数据的简要视图（类似于head函数，但更有用）；
genetrait函数用于从标记基因型（marker genotype）生成伪表型值；
SS_GWAS函数用于总结GWAS结果（仅用于模拟研究）；
estPhylo和estNetwork函数用于估计系统发育树或单倍型网络和单倍型效应，对感兴趣的单倍型块采用非线性核。

安装

RAINBOWR的稳定版本现在可以在CRAN (Comprehensive R Archive Network)上找到。RAINBOWR的最新版本也可以在GitHub的KosukeHamazaki/RAINBOWR仓库中找到，请在R控制台中运行以下代码。

#### Stable version of RAINBOWR ####
install.packages("RAINBOWR")  


#### Latest version of RAINBOWR ####
### If you have not installed yet, ...
install.packages("devtools")  

### Install RAINBOWR from GitHub
devtools::install_github("KosukeHamazaki/RAINBOWR")

如果你在安装过程中遇到一些错误，请检查以下软件包是否正确安装。(我们删除了对rgl包的依赖性！)

Rcpp,      # install `Rtools` for Windows user
plotly,
Matrix,
cluster,
MASS,
pbmcapply,
optimx,
methods,
ape,
stringr,
pegas,
ggplot2,
ggtree,      # install from Bioconducter with `BiocManager::install("ggtree")`
scatterpie,
phylobase,
haplotypes,
rrBLUP,
expm,
here,
htmlwidgets,
Rfast

在RAINBOWR中，由于部分代码是用Rcpp（R中的C++）编写的，请检查你是否能在R中使用C++。对于Windows用户，你应该安装Rtools。

如果你有一些关于安装的问题，请通过电子邮件联系我们（hamazaki@ut-biomet.org）。

使用说明

首先，导入RAINBOWR包并加载示例数据集。这些示例数据集包括标记基因型（用{-1, 0, 1}评分，1,536个SNP芯片（Zhao等人，2010; PLoS One 5(5): e10780）），带有物理位置的地图，以及表型数据（Zhao等人，2011; Nature Communications 2:467）。这两个数据集都可以从Rice Diversity主页（http://www.ricediversity.org/data/ ）上下载。

### Import RAINBOWR
require(RAINBOWR)

### Load example datasets
data("Rice_Zhao_etal")
Rice_geno_score <- Rice_Zhao_etal$genoScore
Rice_geno_map <- Rice_Zhao_etal$genoMap
Rice_pheno <- Rice_Zhao_etal$pheno

### View each dataset
See(Rice_geno_score)
See(Rice_geno_map)
See(Rice_pheno)

你可以通过See函数检查原始数据格式。然后，选择一个性状（这里是Flowering.time.at.Arkansas）为例。

### Select one trait for example
trait.name <- "Flowering.time.at.Arkansas"
y <- Rice_pheno[, trait.name, drop = FALSE]

对于GWAS，首先你可以通过MAF.cut函数去除MAF<=0.05的SNP。（译者注：这一步属于QC）

### Remove SNPs whose MAF <= 0.05
x.0 <- t(Rice_geno_score)
MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
x <- MAF.cut.res$x
map <- MAF.cut.res$map

接下来，我们通过使用calcGRM函数估计加性基因组关系矩阵（additive genomic relationship matrix, additive GRM）。

### Estimate genomic relationship matrix (GRM) 
K.A <- calcGRM(genoMat = x)

然后，我们通过modify.data函数将这些数据修改为RAINBOWR的GWAS格式。

### Modify data
modify.data.res <- modify.data(pheno.mat = y, geno.mat = x, map = map,
                               return.ZETA = TRUE, return.GWAS.format = TRUE)
pheno.GWAS <- modify.data.res$pheno.GWAS
geno.GWAS <- modify.data.res$geno.GWAS
ZETA <- modify.data.res$ZETA

### View each data for RAINBOWR
See(pheno.GWAS)
See(geno.GWAS)
str(ZETA)

ZETA是一个基因组关系矩阵（GRM）及其设计矩阵的列表。

最后，我们可以利用这些数据进行GWAS。

首先，我们通过RGWAS.normal函数进行单SNP的GWAS，具体如下：

### Perform single-SNP GWAS
normal.res <- RGWAS.normal(pheno = pheno.GWAS, geno = geno.GWAS,
                           ZETA = ZETA, n.PC = 4, P3D = TRUE)
See(normal.res$D)  ### Column 4 contains -log10(p) values for markers
### Automatically draw Q-Q plot and Manhattan by default.

接下来，我们通过RGWAS.multisnp函数进行SNP集合的GWAS。

### Perform SNP-set GWAS (by regarding 11 SNPs as one SNP-set)
SNP_set.res <- RGWAS.multisnp(pheno = pheno.GWAS, 
                              geno = geno.GWAS, 
                              ZETA = ZETA, 
                              n.PC = 4, 
                              test.method = "LR", 
                              kernel.method = "linear",
                              gene.set = NULL,
                              test.effect = "additive", 
                              window.size.half = 5, 
                              window.slide = 11)
See(SNP_set.res$D)  ### Column 4 contains -log10(p) values for markers

你可以通过设置window.slide = 1来制I型那个滑动窗口的SNP集合的GWAS。你也可以通过给gene.set参数指定以下数据集来执行基因集合的（或者基于单倍型的）GWAS。

输入数据如下：

gene (or haplotype block)	marker
gene_1	id1000556
gene_1	id1000673
gene_2	id1000830
gene_2	id1000955
gene_2	id1001516
...	...

帮助

如果你在使用RAINBOWR执行GWAS之前需要一些帮助信息，请通过?{function_name}查看每个函数的帮助。

你也可以通过以下方式检查如何确定每个参数。

RGWAS.menu()

RGWAS.menu函数会询问一些问题，通过回答这些问题，该函数会告诉你如何确定使用哪个函数以及如何设定参数。

参考文献

Kennedy, B.W., Quinton, M. and van Arendonk, J.A. (1992) Estimation of effects of single genes on quantitative traits. J Anim Sci. 70(7): 2000-2012.

Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci. 100(16): 9440-9445.

Yu, J. et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 38(2): 203-208.

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Kang, H.M. et al. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 42(4): 348-354.

Zhang, Z. et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 42(4): 355-360.

Endelman, J.B. (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome J. 4(3): 250.

Endelman, J.B. and Jannink, J.L. (2012) Shrinkage Estimation of the Realized Relationship Matrix. G3 Genes, Genomes, Genet. 2(11): 1405-1413.

Su, G. et al. (2012) Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS One. 7(9): 1-7.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Listgarten, J. et al. (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 29(12): 1526-1533.

Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.

Jiang, Y. and Reif, J.C. (2015) Modeling epistasis in genomic selection. Genetics. 201(2): 759-768.

Hamazaki, K. and Iwata, H. (2020) RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method. PLOS Computational Biology, 16(2): e1007663.

补充

2021.7.22

正在翻译这个包对应的论文（RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method），篇幅较多，先把摘要和引言部分放出来，链接

posted @ 2021-07-02 10:42 Minerw 阅读(391) 评论(0) 编辑收藏举报

刷新页面返回顶部

Loading

爱学不学

随笔一般是当笔记和草稿来用，没啥排版的，有时间会重新排版；有可能会被误认为是内容农场但实际上并不是

【翻译】RAINBOWR Github Repo的Readme

写在前面

正文

通过使用R优化权重进行可靠的关联推理（Reliable Association INference By Optimizing Weights with R，RAINBOWR）

注意！！

`RAINBOWR`的论文已经在PLOS Computational Biology上发布了（ https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007663 ）。如果您在您的文章中使用了`RAINBOWR`，请引用如下：

`RAINBOWR`包的稳定版本已经在CRAN (Comprehensive R Archive Network)上可用。

`RAINBOWR`的较老版本名为`RAINBOW`，位于https://github.com/KosukeHamazaki/RAINBOW 。

`RAINBOWR`是什么？

安装

使用说明

帮助

参考文献

补充

2021.7.22

公告

Loading

爱学不学

随笔一般是当笔记和草稿来用，没啥排版的，有时间会重新排版； 有可能会被误认为是内容农场但实际上并不是

【翻译】RAINBOWR Github Repo的Readme

写在前面

正文

通过使用R优化权重进行可靠的关联推理（Reliable Association INference By Optimizing Weights with R，RAINBOWR）

注意！！

RAINBOWR的论文已经在PLOS Computational Biology上发布了（ https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007663 ）。如果您在您的文章中使用了RAINBOWR，请引用如下：

RAINBOWR包的稳定版本已经在CRAN (Comprehensive R Archive Network)上可用。

RAINBOWR的较老版本名为RAINBOW，位于https://github.com/KosukeHamazaki/RAINBOW 。

RAINBOWR是什么？

安装

使用说明

帮助

参考文献

补充

2021.7.22

公告

随笔一般是当笔记和草稿来用，没啥排版的，有时间会重新排版；有可能会被误认为是内容农场但实际上并不是

`RAINBOWR`的论文已经在PLOS Computational Biology上发布了（ https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007663 ）。如果您在您的文章中使用了`RAINBOWR`，请引用如下：

`RAINBOWR`包的稳定版本已经在CRAN (Comprehensive R Archive Network)上可用。

`RAINBOWR`的较老版本名为`RAINBOW`，位于https://github.com/KosukeHamazaki/RAINBOW 。

`RAINBOWR`是什么？