单细胞imputation | MAGIC
quick code
library(Rmagic) bmmsc <- t(GBM.pair@assays$RNA@counts) bmmsc <- library.size.normalize(bmmsc) bmmsc <- sqrt(bmmsc) # bmmsc[1:5,1:5] # # run MAGIC # bmmsc_MAGIC <- magic(bmmsc, genes=c("Jarid2","Sox9","Krt20")) # genes="all_genes" # run MAGIC bmmsc_MAGIC <- magic(bmmsc, genes="all_genes") # genes="all_genes" GBM.pair$JARID2_imputation <- bmmsc_MAGIC$result[colnames(GBM.pair),"JARID2"] options(repr.plot.width=10, repr.plot.height=5) VlnPlot(GBM.pair, features = c("JARID2_imputation"), group.by = "sampleID") + stat_summary(fun.y = median.stat, geom='point', size = 5, colour = "black", shape = 21)
2023年08月21日
这会处理了已经发表的Sox9单细胞数据,发现同样的Lgr5 mouse model,Jarid2和Sox9的表达比很低, 询问后确定了,这批细胞的sequencing depth很低,这导致基因的捕获率很低,而且28 day已经是mice生存的极限了,后面就会生病致死。
表达比过低,导致表达差异显著性分析无法进行,所以必须做imputation,这次的分析再次确定了MAGIC很好用。
重新下载安装:https://github.com/KrishnaswamyLab/MAGIC
Installation from GitHub 【R和Python版本必须同时安装】
git clone git://github.com/KrishnaswamyLab/MAGIC.git cd MAGIC/python python setup.py install --user cd ../Rmagic R CMD INSTALL .
如果R代码报错,AttributeError: module 'magic' has no attribute 'MAGIC'
那就重启kernel,或者进入命令行测试。无非就是版本的问题,版本太多,有点混乱。
最新代码实例:http://localhost:17435/notebooks/data_center/2023_SOX9_Sci_Adv_PB/SOX9_Sci_Adv.ipynb
特殊情况下,需要对UMI的单细胞数据做imputation,补全缺失的数据。
工具很多,这篇paper已经帮你评估好了,直接用其推荐的工具即可。
A systematic evaluation of single-cell RNA-sequencing imputation methods
排名第一的单细胞imputation工具:
https://github.com/KrishnaswamyLab/MAGIC
教程:Rmagic Bone Marrow Tutorial
UMI的一般都是大数据,跑起来还是比较耗时的。
安装
library(Rmagic) library(ggplot2) library(readr) library(viridis) library(phateR) # check # don't "source activate py38", otherwise the python package cannot be loaded pymagic_is_available()
测试数据
# # load data # bmmsc <- read_csv("https://github.com/KrishnaswamyLab/PHATE/raw/master/data/BMMC_myeloid.csv.gz")
实际数据
bmmsc <- t(integrated.org@assays$RNA@counts) bmmsc[1:5,1:5]
QC
# keep genes expressed in at least 10 cells keep_cols <- colSums(bmmsc > 0) > 10 bmmsc <- bmmsc[,keep_cols] # look at the distribution of library sizes ggplot() + geom_histogram(aes(x=rowSums(bmmsc)), bins=50) + geom_vline(xintercept = 1000, color='red')
# keep cells with at least 1000 UMIs keep_rows <- rowSums(bmmsc) > 1000 bmmsc <- bmmsc[keep_rows,]
bmmsc <- library.size.normalize(bmmsc) bmmsc <- sqrt(bmmsc)
测试部分基因
# run MAGIC # bmmsc_MAGIC <- magic(bmmsc, genes=c("Mpo", "Klf1", "Ifitm1")) bmmsc_MAGIC <- magic(bmmsc, genes=c("NEUROG2", "NEAT1", "TFAP2A"))
获取全部基因
bmmsc_MAGIC_all <- magic(bmmsc, genes="all_genes", t=4, init=bmmsc_MAGIC)
可视化
ggplot(as.data.frame(bmmsc[,c("NEUROG2", "NEAT1", "TFAP2A")])) + geom_point(aes(NEUROG2, NEAT1, color=TFAP2A)) + scale_color_viridis(option="B")
ggplot(as.data.frame(bmmsc_MAGIC$result[,c("NEUROG2", "NEAT1", "TFAP2A")])) + geom_point(aes(NEUROG2, NEAT1, color=TFAP2A)) + scale_color_viridis(option="B")