R中seurat处理流程中的函数说明

1.CreateSeuratObject

转自:https://cloud.tencent.com/developer/article/1055892

# Initialize the Seurat object with the raw (non-normalized data).  Keep all
# genes expressed in >= 3 cells (~0.1% of the data). Keep all cells with at
# least 200 detected genes
pbmc <- CreateSeuratObject(raw.data = pbmc.data, min.cells = 3, min.genes = 200, 
    project = "10X_PBMC")

这里的min.genes是对细胞来说的,只包含至少能检测到3个基因的细胞。

min.cells=200,只包含至少在200个cell中检测到的基因。

Seurat包的默认参数

CreateSeuratObject(
  counts,
  project = "SeuratProject",
  assay = "RNA",
  min.cells = 0,
  min.features = 0,
  names.field = 1,
  names.delim = "_",
  meta.data = NULL
)

https://www.rdocumentation.org/packages/Seurat/versions/3.1.4/topics/CreateSeuratObject

2.NormalizeData

https://satijalab.org/seurat/reference/normalizedata

# S3 method for Seurat
NormalizeData(
  object,
  assay = NULL,
  normalization.method = "LogNormalize",
  scale.factor = 10000,
  margin = 1,
  verbose = TRUE,
  ...
)
scale.factor:默认的对每个细胞归一化的值,即每个cell求和默认都是10000.

默认的归一化方法是先归一然后取log:

  • LogNormalize: Feature counts for each cell are divided by the total counts for that cell and multiplied by the scale.factor. This is then natural-log transformed using log1p.(先除以sum,然后乘以10000,再取log1p)

3. ScaleData

https://satijalab.org/seurat/reference/scaledata

# S3 method for Seurat
ScaleData(
  object,
  features = NULL,
  assay = NULL,
  vars.to.regress = NULL,
  split.by = NULL,
  model.use = "linear",
  use.umi = FALSE,
  do.scale = TRUE,
  do.center = TRUE,
  scale.max = 10,
  block.size = 1000,
  min.cells.to.block = 3000,
  verbose = TRUE,
  ...
)

其中do.center是将数据中心化,即去平均值,减去平均值,降数据移动到原点附近,

scale.max为缩放数据返回的最大值。 默认值为 10。设置此值有助于减少仅在极少数单元格中表达的特征的影响。

4.FindNeighbors

# S3 method for Seurat
FindNeighbors(
  object,
  reduction = "pca",
  dims = 1:10,
  assay = NULL,
  features = NULL,
  k.param = 20,
  return.neighbor = FALSE,
  compute.SNN = !return.neighbor,
  prune.SNN = 1/15,
  nn.method = "annoy",
  n.trees = 50,
  annoy.metric = "euclidean",
  nn.eps = 0,
  verbose = TRUE,
  force.recalc = FALSE,
  do.plot = FALSE,
  graph.name = NULL,
  l2.norm = FALSE,
  cache.index = FALSE,
  ...
)

默认使用通过PCA降维的dims=1:10维的结果,

k.param是指k近邻算法中的k,针对每个cell计算多少个最近邻

5.RunPCA

# S3 method for Seurat
RunPCA(
  object,
  assay = NULL,
  features = NULL,
  npcs = 50,
  rev.pca = FALSE,
  weight.by.var = TRUE,
  verbose = TRUE,
  ndims.print = 1:5,
  nfeatures.print = 30,
  reduction.name = "pca",
  reduction.key = "PC_",
  seed.use = 42,
  ...
)

运行PCA所使用的特征:

features

Features to compute PCA on. If features=NULL, PCA will be run using the variable features for the Assay. Note that the features must be present in the scaled data. Any requested features that are not scaled or have 0 variance will be dropped, and the PCA will be run using the remaining features.

需要指明使用的特征,而且看来是需要scale的,并且不能是零方差特征。

 

posted @ 2018-11-09 03:38  lypbendlf  阅读(1431)  评论(0编辑  收藏  举报