主成分分析

Principal Components Analysis

Intuition

PCA tries to identify the subspace in which the data approximately lies.

Intuitively, we choose a direction for projection and we reserve the most variance / difference.

Formalization

\[\frac{1}{m}\sum_{i=1}^m (x^{{(i)}^T} u)^2=u^T(\frac{1}{m}\sum_{i=1}^m x^{(i)}x^{(i)^T})u = u^T(\frac{1}{m}XX^T)u \]

\[X = (x^{(1)},x^{(2)},...,x^{(m)}) \]

\(XX^T\) is the covariance matrix, and PCA is try to diagonalize it so that variables in different dimensions are independent in the new low-dimension data.

so the problem is transferred to choosing a eigenvector that maximize eigenvalue.

choose top k eigenvalue to reduce data dimension from \(\R^n\) down to \(\R^k\)

posted @ 2022-08-25 19:49  19376273  阅读(9)  评论(0编辑  收藏  举报