单细胞细胞注释 | scAnnotatR | cell type marker

2024年01月19日

今天刚发现一个注释神器，scAnnotatR

scAnnotatR: framework to accurately classify cell types in single-cell RNA-sequencing data - 2022 - BMC Bioinformatics

https://github.com/grisslab/scAnnotatR

基本流程就是：

根据seurat流程，鉴定top的markers
写一个循环，一个celltype就建一个单独的model
然后单独或者统一预测
就是一个简单的SVM模型，厉害的是能给出比较准确的预测，能评估模型的准确度

参考代码：

http://localhost:17435/notebooks/shared_dataset/zhixin/2023_Hickey_Snyder_human_Intestine_Nature/data_preparation.ipynb#scAnnotatR

但我个人觉得最为powerful的还是seurat自带的ref-based prediction

http://localhost:17435/notebooks/shared_dataset/zhixin/2023_Hickey_Snyder_human_Intestine_Nature/data_preparation.ipynb#cell-type-find-mapping

我称之为fine mapping

用一个旧model（旧marker）来预测一个新数据，大概率会有错误预测的部分。
我们可以根据预测来重新鉴定新marker，基于此marker来用新数据预测新数据，理论上效果是更好的。

点评：

scAnnotatR在工程设计上是用了心的！

泛化做得非常好，几乎快速应用到任何seurat数据集上，R的忠实粉丝。
考虑到了cell type的层级结构。
model这个类设计得非常合理，快速可用，各种model文件甚至可以成为一个数据库！
速度非常之快，atlas级别的数据也是瞬间就处理完了。

关于model文件的命名规范

hs_colon_major_type
mm_SI_epith_celltype

甚至可以更加细分，每个colon的区域建一个单独的model，也可以去common marker建model。

我以后应该就主要用这个工具了。

单细胞常规分析里面有几个非常大的问题：

到底聚出多少个类比较合适，参数如何选择，这个没有答案，建议从大往小，层级增加cluster的个数；
确定了cluster后，如何注释？如果分析的细胞很少，那做实验的根据经验就能注释，但一旦cluster超过5个，注释就显得很难了（尤其是organoid，最少10个cluster，人为注释非常主观）；

singleR

Assigning cell types with SingleR

SciBet

SciBet as a portable and fast single cell type identifier - 2020 - NC

https://github.com/PaulingLiu/scibet

ScType

Ianevski, A., Giri, A.K. & Aittokallio, T.

Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nat Commun 13, 1246 (2022).

https://doi.org/10.1038/s41467-022-28803-w

Celltypist

https://github.com/Teichlab/celltypist

science上《Cross-tissue immune cell analysis reveals tissue-specific features in humans》，DOI: 10.1126/science.abl5197 - 2022

ScAnno

Hongjia Liu and others,

scAnno: a deconvolution strategy-based automatic cell type annotation tool for single-cell RNA-sequencing data sets, Briefings in Bioinformatics, 2023;, bbad179,

https://doi.org/10.1093/bib/bbad179

https://github.com/liuhong-jia/scAnno

MACA

Database

panglaodb：https://panglaodb.se/index.html

CellMarker ：http://xteam.xbio.top/CellMarker/index.jsp

# T Cells (CD3D, CD3E, CD8A),
# B cells (CD19, CD79A, MS4A1 [CD20]),
# Plasma cells (IGHG1, MZB1, SDC1, CD79A),
# Monocytes and macrophages (CD68, CD163, CD14),
# NK Cells (FGFBP2, FCG3RA, CX3CR1),
# Photoreceptor cells (RCVRN),
# Fibroblasts (FGF7, MME),
# Endothelial cells (PECAM1, VWF).
# epi or tumor (EPCAM, KRT19, PROM1, ALDH1A1, CD24).
# immune (CD45+,PTPRC), epithelial/cancer (EpCAM+,EPCAM),
# stromal (CD10+,MME,fibo or CD31+,PECAM1,endo)

# TER-119 is a lineage marker for erythroid cells

实操：如何从肠道细胞里用FACS分离出表皮细胞

参考：

posted @ 2022-02-14 12:47 Life·Intelligence 阅读(721) 评论(0) 收藏举报

刷新页面返回顶部

Digital-LI