如何理解 AnnData ?

如何理解 AnnData ?

anndata是一个Python包,用于处理内存和磁盘上的带注释的数据矩阵,位于pandas和xarray之间。 anndata提供了广泛的计算效率特性,其中包括稀疏数据支持、惰性操作和PyTorch接口。

AnnData 里边的

1、obs是啥?

obs是对行(也就是Cell)的注释,比如说小明的注释是注释A,小红的注释是B,。。。整个obs

We can also subset the AnnData using these randomly generated cell types:
bdata = adata[adata.obs.cell_type == "B"]
bdata

2、var是啥?

var代表的是列,代表的是基因;代表对基因的注释结果;

3、uns是啥?

Unstructured metadata

AnnData has .uns, which allows for any unstructured metadata. This can be anything, like a list or a dictionary with some general information that was useful in the analysis of our data.

4、obsm是啥?

Observation/variable-level matrices

We might also have metadata at either level that has many dimensions to it, such as a UMAP embedding of the data. For this type of metadata, AnnData has the .obsm/.varm attributes. We use keys to identify the different matrices we insert. The restriction of .obsm/.varm are that .obsm matrices must length equal to the number of observations as .n_obs and .varm matrices must length equal to .n_vars. They can each independently have different number of dimensions.

Let’s start with a randomly generated matrix that we can interpret as a UMAP embedding of the data we’d like to store, as well as some random gene-level metadata:

adata.obsm["X_umap"] = np.random.normal(0, 1, size=(adata.n_obs, 2))

umap降维之后,相当于,每个细胞在空间中都有一个位置。这个位置是2维的!X_umap相当于一个metadata,这个metadata是二维的。当然adata还可以有个新的metada,是三维的降维结果,那么每个cell就有一个三维的注释结果,新的变量名可以是obsp

adata

 

 

5、varm是啥?

基因的metadata,高纬的;

6、obsp是啥?

obsp其实就是obsm,只不过是变量名不一样,但是对应的物理含义是完全类似的。都是高纬度的metadata。

但是对obsm是有限制的:

The restriction of .obsm/.varm are that .obsm matrices must length equal to the number of observations as .n_obs and .varm matrices must length equal to .n_vars. They can each independently have different number of dimensions.

如图所示:

 

7、Layers
Finally, we may have different forms of our original core data, perhaps one that is normalized and one that is not. These can be stored in different layers in AnnData. For example, let’s log transform the original data and store it in a layer:

-------------

这个是这个数据结构的学习教程:

https://anndata-tutorials.readthedocs.io/en/latest/getting-started.html

posted @ 2022-08-15 13:50  bH1pJ  阅读(92)  评论(0编辑  收藏  举报