单细胞imputation | MAGIC

 

quick code

library(Rmagic)
bmmsc <- t(GBM.pair@assays$RNA@counts)
bmmsc <- library.size.normalize(bmmsc)
bmmsc <- sqrt(bmmsc)
# bmmsc[1:5,1:5]
# # run MAGIC
# bmmsc_MAGIC <- magic(bmmsc, genes=c("Jarid2","Sox9","Krt20")) # genes="all_genes"
# run MAGIC
bmmsc_MAGIC <- magic(bmmsc, genes="all_genes") # genes="all_genes"

GBM.pair$JARID2_imputation <- bmmsc_MAGIC$result[colnames(GBM.pair),"JARID2"]

options(repr.plot.width=10, repr.plot.height=5)
VlnPlot(GBM.pair, features = c("JARID2_imputation"), group.by = "sampleID") +
    stat_summary(fun.y = median.stat, geom='point', size = 5, colour = "black", shape = 21)

  

2023年08月21日

这会处理了已经发表的Sox9单细胞数据,发现同样的Lgr5 mouse model,Jarid2和Sox9的表达比很低, 询问后确定了,这批细胞的sequencing depth很低,这导致基因的捕获率很低,而且28 day已经是mice生存的极限了,后面就会生病致死。

表达比过低,导致表达差异显著性分析无法进行,所以必须做imputation,这次的分析再次确定了MAGIC很好用。

 

重新下载安装:https://github.com/KrishnaswamyLab/MAGIC

Installation from GitHub 【R和Python版本必须同时安装】

git clone git://github.com/KrishnaswamyLab/MAGIC.git
cd MAGIC/python
python setup.py install --user
cd ../Rmagic
R CMD INSTALL .

  

如果R代码报错,AttributeError: module 'magic' has no attribute 'MAGIC'

那就重启kernel,或者进入命令行测试。无非就是版本的问题,版本太多,有点混乱。

 

最新代码实例:http://localhost:17435/notebooks/data_center/2023_SOX9_Sci_Adv_PB/SOX9_Sci_Adv.ipynb

 


 

特殊情况下,需要对UMI的单细胞数据做imputation,补全缺失的数据。 

 

工具很多,这篇paper已经帮你评估好了,直接用其推荐的工具即可。

A systematic evaluation of single-cell RNA-sequencing imputation methods

 

排名第一的单细胞imputation工具:

https://github.com/KrishnaswamyLab/MAGIC

教程:Rmagic Bone Marrow Tutorial

 

UMI的一般都是大数据,跑起来还是比较耗时的。

 

安装

library(Rmagic)
library(ggplot2)
library(readr)
library(viridis)
library(phateR)

# check
# don't "source activate py38", otherwise the python package cannot be loaded
pymagic_is_available()

  

测试数据

# # load data
# bmmsc <- read_csv("https://github.com/KrishnaswamyLab/PHATE/raw/master/data/BMMC_myeloid.csv.gz")

  

实际数据

bmmsc <- t(integrated.org@assays$RNA@counts)
bmmsc[1:5,1:5]

  

QC

# keep genes expressed in at least 10 cells
keep_cols <- colSums(bmmsc > 0) > 10
bmmsc <- bmmsc[,keep_cols]
# look at the distribution of library sizes
ggplot() +
  geom_histogram(aes(x=rowSums(bmmsc)), bins=50) +
  geom_vline(xintercept = 1000, color='red')

  

# keep cells with at least 1000 UMIs
keep_rows <- rowSums(bmmsc) > 1000
bmmsc <- bmmsc[keep_rows,]

  

bmmsc <- library.size.normalize(bmmsc)
bmmsc <- sqrt(bmmsc)

  

测试部分基因

# run MAGIC
# bmmsc_MAGIC <- magic(bmmsc, genes=c("Mpo", "Klf1", "Ifitm1"))
bmmsc_MAGIC <- magic(bmmsc, genes=c("NEUROG2", "NEAT1", "TFAP2A"))

  

获取全部基因

bmmsc_MAGIC_all <- magic(bmmsc, genes="all_genes", t=4, init=bmmsc_MAGIC)

  

可视化

ggplot(as.data.frame(bmmsc[,c("NEUROG2", "NEAT1", "TFAP2A")])) +
  geom_point(aes(NEUROG2, NEAT1, color=TFAP2A)) +
  scale_color_viridis(option="B")

  

ggplot(as.data.frame(bmmsc_MAGIC$result[,c("NEUROG2", "NEAT1", "TFAP2A")])) +
  geom_point(aes(NEUROG2, NEAT1, color=TFAP2A)) +
  scale_color_viridis(option="B")

  

 

posted @ 2021-06-23 15:53  Life·Intelligence  阅读(626)  评论(0编辑  收藏  举报
TOP