降维 Dimensionality Reduction

主成分分析 Principle Components Analysis, PCA

要对数据进行零均值化预处理。

对协方差矩阵进行奇异值分解，或者进行特征分解。

零均值化的数据矩阵 $X\in \mathbb R^{m\times n}$ ：

\[\Sigma=X^TX \text{协方差} \\ [U,S,V]=\mathrm{svd}(\Sigma) \text{奇异值分解取得特征向量U} \\ P=U(:, 1:K) \text{主成分，K维（前K列向量）} \\ \begin{aligned} \text{方差损失率}&=\frac{\frac1m \sum_i^m \|\bm x^{(i)}-\bm x^{(i)}_{approx}\|^2}{\frac1m \sum_i^m\|\bm x^{(i)}\|^2} \\ &= 1 - \frac{\sum_i^k S_{ii}}{\sum_i^n S_{ii}} \end{aligned} \]

原数据降维后的新数据为 $XP$ ，降维的数据还原为 $(XP)P^T$ 。

PCA对线性的数据的降维效果是比较好的。

kPCA (kernel PCA)

A method combining PCA and kernel tricks.

Non-negative Matrix Factorization, NMF

like PCA, except the coeffients in the linear combination must be non-negative.

It will converge to a local minima (therefore starting point in optimizing matters).

Random Projection

a $d$ -dimensional original data is projected to a $k$ -dimensional ( $k\ll d$ ) subspace.

$X\in\R^{d\times N}$ : original matrix for $N$ samples with $d$ features
$R\in\R^{k\times N}$ : random transormation matrix
the resulting matrix in lower-dimensional space is $RX\in\R^{k\times N}$ .

generating the random matrix: For a Gaussian random projection, random N vectors with unit length and orthogonal each other from a Gaussian distribution.

因子分析 Factor Analysis

假定有一个不可观测的隐因子（latent fator） $z_j=(j=1,..,k)$ 的集合。

多维定标 Multi-Dimensional Scaling, MDS

应用于场景：已知N个点每对之间的距离 $d_{ij}$ ，但不知道每个点的坐标、维度、以及距离计算方法。MDS是在这种情况下将这些点映射到低维空间的方法。

线性判别式分析 Linear Distriminant Analysis, LDA

以类别数量K=2（二分类）为例。类别标签 $C_1, C_2$ . 已知式 $z=\bm w^T \bm x$ 表示 $\bm x$ 到 $\bm w$ 上的投影。记 $\bm m_1$ 和 $m_1$ 是类 $C_1$ 样本在投影前和投影后的均值，故 $\bm m_1\in \mathbb R^d, m_1\in\mathbb R$ （d为特征维度,亦有 $\bm w\in\mathbb R^d$ ），记样本集 $\mathcal X=\{\bm x^{(t)}, y^{(t)}\}$ ，其中类别信息 $y^{(t)}=1$ 时表示样本 $\bm x^{(t)}$ 类别为 $C_1$ ； $y^{(t)}=0$ 时表示样本类别为 $C_2$ 。则

\[m_1 = \frac{\sum_t \bm w^T \bm x^{(t)} y^{(t)}}{\sum_t y^{(t)}}=\bm w^T\bm m_1 \\ m_2 = \frac{\sum_t \bm w^T \bm x^{(t)} (1-y^{(t)})}{\sum_t (1-y^{(t)})}=\bm w^T \bm m_2 \]

来自 $C_1, C_2$ 的样本投影后的散布(scatter)为

\[s_1 = \sum_t(\bm w^T \bm x - m_1)^2 y^{(t)} \\ s_2 = \sum_t(\bm w^T \bm x - m_2)^2 (1-y^{(t)}) \]

投影后为使类别间能被很好地分开，我们希望均值间尽可能远离并且类实例散布在尽可能小的区域中。因此希望 $|m_1-m_2|$ 大，而 $s_1+s_2$ 小。Fisher's linear discriminant方法主要为求得使下式最小的 $\bm w$ ：

\[J(\bm w)=\frac{(m_1 - m_2)^2}{s_1^2+s_2^2} \]

等距特征映射 Isometric Feature Mapping (Isomap)

沿向量空间上的流形的距离计算（如人脸转动时的连拍图片形成的流形）。

局部线性嵌入 Locally Linear Embedding, LLE

Sammon Mapping

(non-linear) (by Lee and Verleysen, 2007)

……

Curvilinear Components Analysis, CCA

(non-linear) (by Demartines and Herault, 1997)

Stochastic Neighbor Embedding, SNE

(non-linear) (by ?)

t-distributed Stochastic Neighbor Embedding, t-SNE

tSNE

(by L Maaten & G Hinton, 2008)

（该论文中有很多解决数据实际问题的思想值得琢磨）

It's a non-linear dimensionality reduction algorithm.

$N$ high-dimensional data points $\bm x_1,...,\bm x_N$ , low-dimensional couterparts $\bm Y=[\bm y_1 ... \bm y_N]^T$ .

conditional probability for $j\ne i$ :

\[p_{j|i} := \frac{\exp(-\|\bm x_i-\bm x_j\|^2/\sigma_i^2)}{\sum_{k \ne i}\exp(-\|\bm x_i-\bm x_k\|^2/\sigma_i^2)} \\ q_{j|i} := \frac{\exp (-\| \bm y_i -\bm y_j\|^2)}{\sum_{k\ne i} \exp (-\| \bm y_i - \bm y_k \|^2)} \]

\[p_{i|i} := 0 \\ q_{i|i} := 0 \]

Note that $\sum_j p_{j|i}=1, \sum_j q_{j|i}=1, \forall i$ .

Loss function, sum of Kullback-Leibler divergence on condational probabilities over all data points:

\[J(\bm Y)=\sum_i \mathrm{KL}(P_i \| Q_i)=\sum_i \sum_j p_{j|i} \log \frac{p_{j|i}}{ q_{j|i}} \]

An alternative loss function of a single KL divergence on joint probabilities:

\[J=\mathrm{KL}(P \| Q) = \min_{\bm Y} \sum_{i\ne j} p_{ij} \log \frac{p_{ij}}{q_{ij}} \]

SNE:

joint probabilities:

\[p_{ij} := \frac{\exp(-\|\bm x_i-\bm x_j\|^2/\sigma^2)}{\sum_{k \ne l}\exp(-\|\bm x_k-\bm x_l\|^2/\sigma^2)} \\ q_{ij} := \frac{\exp (-\| \bm y_i -\bm y_j\|^2)}{\sum_{k\ne l} \exp (-\| \bm y_k - \bm y_l \|^2)} \]

but the above $p_{ij}$ causes problems when a high-dimensional datapoint xi is an outlier (i.e., all pairwise distances $|x_i − x_j|^2 $ are large for xi). For such an outlier, the values of pi j are extremely small for all j, so the location of its low-dimensional map point yi has very little effect on the cost function. As a result, the position of the map point is not well determined by the positions of the other map points.

then

\[p_{ij} := \frac{p_{i|j}+p_{j|i}}{2N} \]

which ensures that $\sum_j p_{ij}> 1/2N$ for all datapoints xi, as a result of which each datapoint xi makes a significant contribution to the cost function.

t-SNE:

\[q_{ij} := \frac{(1+\| \bm y_i - \bm y_j \|^2)^{-1}}{\sum_k\sum_{l\ne k}(1+ \| \bm y_k - \bm y_l \|^2)^{-1}} \]

\[q_{ii} := 0 \]

python tool

# sklearn用CPU计算，数据量大时比较耗时
from sklearn.manifold import TSNE
TSNE(n_components=2).fit_transform(X)       # to 2-dimensional

# `cuML`库用GPU计算，耗时大大降低，https://docs.rapids.ai/api/cuml/stable/api.html#tsne

Independent Component Analysi, ICA

Linear independent component analysis, linear ICA

general definition of linear independent component analysis:

linear noiseless ICA

linear noisy ICA

posted @ 2022-07-06 23:38 二球悬铃木阅读(11) 评论(0) 编辑收藏举报

刷新页面返回顶部

二球悬铃木

降维 Dimensionality Reduction

降维 Dimensionality Reduction

主成分分析 Principle Components Analysis, PCA

kPCA (kernel PCA)

Non-negative Matrix Factorization, NMF

Random Projection

因子分析 Factor Analysis

多维定标 Multi-Dimensional Scaling, MDS

线性判别式分析 Linear Distriminant Analysis, LDA

等距特征映射 Isometric Feature Mapping (Isomap)

局部线性嵌入 Locally Linear Embedding, LLE

Sammon Mapping

Curvilinear Components Analysis, CCA

Stochastic Neighbor Embedding, SNE

t-distributed Stochastic Neighbor Embedding, t-SNE

python tool

Independent Component Analysi, ICA

公告