降维 Dimensionality Reduction
降维 Dimensionality Reduction
主成分分析 Principle Components Analysis, PCA
要对数据进行零均值化预处理。
对协方差矩阵进行奇异值分解,或者进行特征分解。
零均值化的数据矩阵 \(X\in \mathbb R^{m\times n}\) :
原数据降维后的新数据为 \(XP\) ,降维的数据还原为 \((XP)P^T\) 。
PCA对线性的数据的降维效果是比较好的。
kPCA (kernel PCA)
A method combining PCA and kernel tricks.
Non-negative Matrix Factorization, NMF
like PCA, except the coeffients in the linear combination must be non-negative.
It will converge to a local minima (therefore starting point in optimizing matters).
Random Projection
a \(d\) -dimensional original data is projected to a \(k\) -dimensional ( \(k\ll d\) ) subspace.
\(X\in\R^{d\times N}\) : original matrix for \(N\) samples with \(d\) features
\(R\in\R^{k\times N}\) : random transormation matrix
the resulting matrix in lower-dimensional space is \(RX\in\R^{k\times N}\) .
generating the random matrix: For a Gaussian random projection, random N vectors with unit length and orthogonal each other from a Gaussian distribution.
因子分析 Factor Analysis
假定有一个不可观测的隐因子(latent fator) \(z_j=(j=1,..,k)\) 的集合。
多维定标 Multi-Dimensional Scaling, MDS
应用于场景:已知N个点每对之间的距离 \(d_{ij}\) ,但不知道每个点的坐标、维度、以及距离计算方法。MDS是在这种情况下将这些点映射到低维空间的方法。
线性判别式分析 Linear Distriminant Analysis, LDA
以类别数量K=2(二分类)为例。类别标签 \(C_1, C_2\) . 已知式 \(z=\bm w^T \bm x\) 表示 \(\bm x\) 到 \(\bm w\) 上的投影。记 \(\bm m_1\) 和 \(m_1\) 是类 \(C_1\) 样本在投影前和投影后的均值,故 \(\bm m_1\in \mathbb R^d, m_1\in\mathbb R\) (d为特征维度,亦有 \(\bm w\in\mathbb R^d\) ),记样本集 \(\mathcal X=\{\bm x^{(t)}, y^{(t)}\}\) ,其中类别信息 \(y^{(t)}=1\) 时表示样本 \(\bm x^{(t)}\) 类别为 \(C_1\) ; \(y^{(t)}=0\) 时表示样本类别为 \(C_2\) 。则
来自 \(C_1, C_2\) 的样本投影后的散布(scatter)为
投影后为使类别间能被很好地分开,我们希望均值间尽可能远离并且类实例散布在尽可能小的区域中。因此希望 \(|m_1-m_2|\) 大,而 \(s_1+s_2\) 小。Fisher's linear discriminant方法主要为求得使下式最小的 \(\bm w\) :
等距特征映射 Isometric Feature Mapping (Isomap)
沿向量空间上的流形的距离计算(如人脸转动时的连拍图片形成的流形)。
局部线性嵌入 Locally Linear Embedding, LLE
Sammon Mapping
(non-linear) (by Lee and Verleysen, 2007)
……
Curvilinear Components Analysis, CCA
(non-linear) (by Demartines and Herault, 1997)
Stochastic Neighbor Embedding, SNE
(non-linear) (by ?)
t-distributed Stochastic Neighbor Embedding, t-SNE
tSNE
(by L Maaten & G Hinton, 2008)
(该论文中有很多解决数据实际问题的思想值得琢磨)
It's a non-linear dimensionality reduction algorithm.
\(N\) high-dimensional data points \(\bm x_1,...,\bm x_N\) , low-dimensional couterparts \(\bm Y=[\bm y_1 ... \bm y_N]^T\) .
conditional probability for \(j\ne i\) :
Note that \(\sum_j p_{j|i}=1, \sum_j q_{j|i}=1, \forall i\) .
Loss function, sum of Kullback-Leibler divergence on condational probabilities over all data points:
An alternative loss function of a single KL divergence on joint probabilities:
SNE:
joint probabilities:
but the above \(p_{ij}\) causes problems when a high-dimensional datapoint xi is an outlier (i.e., all pairwise distances $|x_i − x_j|^2 $ are large for xi). For such an outlier, the values of pi j are extremely small for all j, so the location of its low-dimensional map point yi has very little effect on the cost function. As a result, the position of the map point is not well determined by the positions of the other map points.
then
which ensures that \(\sum_j p_{ij}> 1/2N\) for all datapoints xi, as a result of which each datapoint xi makes a significant contribution to the cost function.
t-SNE:
python tool
# sklearn用CPU计算,数据量大时比较耗时
from sklearn.manifold import TSNE
TSNE(n_components=2).fit_transform(X) # to 2-dimensional
# `cuML`库用GPU计算,耗时大大降低,https://docs.rapids.ai/api/cuml/stable/api.html#tsne
Independent Component Analysi, ICA
Linear independent component analysis, linear ICA
general definition of linear independent component analysis:
linear noiseless ICA
linear noisy ICA