汇总 | GSVA | 基因集变异分析 | 基因集通路打分

 

前作

 

几种常见的打分方法:

  • mean,最简单
  • PercentageFeatureSet
  • Seurat的AddModuleScore,多了一个背景基因集,消除背景的影响,没什么意义,相当于设置了一个House keeping gene set,对score做了一个normalization。
  • PROGENy,基于weight的打分,weight来自于已知的perturbation数据集
  • AUCell
  • GSVA
  • ssGSEA - 类似GSVA

 

Seurat的AddModuleScore

Calculate the average expression levels of each program (cluster) on single cell level, subtracted by the aggregated expression of control feature sets. All analyzed features are binned based on averaged expression, and the control features are randomly selected from each bin.

我们感兴趣的基因,抽出来,每一个细胞算一个这些基因表达的平均值,

背景基因的平均值在于找每个基因的所在的bin,在该bin内随机抽取相应的ctrl个基因作为背景,

最后所有的目标基因算一个平均值,所有的背景基因算一个平均值,两者相减就是该gene set 的score值。

至于生物学意义,仁者见仁智者见智了。

测试出来这个跟colMeans基本没有区别

http://localhost:17435/notebooks/projects/Perturb_seq/1.Mixscape.ipynb

for (i in names(geneset.list)) {
    print(i)
    tmp.genes <- toupper(geneset.list[[i]])
    tmp.genes <- tmp.genes[tmp.genes %in% rownames(sub2@assays$RNA@counts)]
    if (length(tmp.genes) < 10) next
    # sub2@meta.data[[i]] <- colMeans(as.matrix(sub2@assays$RNA@scale.data[tmp.genes,]))
    sub2 <- AddModuleScore(sub2, list(tmp.genes), ctrl = 5, name = i, assay = "RNA")
    # break
}

  

 

 

官方教程:GSVA: gene set variation analysis

http://localhost:17435/notebooks/projects/10x_spatial_mouse/ST-Seurat-visium.ipynb

gsva_mat <- gsva(expr = dat, 
               gset.idx.list = geneset.list, 
               kcdf = "Poisson" ,#"Gaussian" for logCPM,logRPKM,logTPM, "Poisson" for counts
               verbose = T, 
               parallel.sz = parallel::detectCores()) #调用所有核

  

 

 

 

参考:

 

posted @ 2023-06-30 02:00  Life·Intelligence  阅读(1524)  评论(0编辑  收藏  举报
TOP