【智应数】Singular Value Decomposition
SVD
Def (eigenvalue\eigenvector). Eigenvalue \(\lambda\) and eigenvector \(\bm{v}\) of matrix \(A\) satisfy $$A\bm{v}=\lambda\bm{v}.$$
Lem 1. Let \(M\in\mathbb{R}^{n\times n}\) is a symmetric matrix. Let \(\lambda_i\) and \(\bm{u}_i\) be the eigenvalues and eigenvectors of \(M\),
i.e., let \(U\) be the orthonormal matrix spanned by eigenvectors, \(D\) be the diagonal matrix generated by eigenvalues,
Pf. Only need to prove \(U^TMU=D\). By definition, \(\bm{u_1}^TM\bm{u_1}=\lambda_1\) and \(W_1^TM\bm{u}_1=\vec{0}\). Using induction immediately prove it.
Lem 2. Suppose \(\bm{v}\) is a eigenvector of \(A^TA\), then \(A\bm{v}\) is a eigenvector of \(AA^T\), with same eigenvalue. \(A^TA\) and \(AA^T\) share same non-negative eigenvalues.
Pf.
Thm (SVD). Let \(M\in\mathbb{R}^{m\times n}\) is a arbitrary matrix. Let \(\sigma_i,\bm{u}_i\bm{v}_i\) be the square root of singular values, left singular vectors and right singular vectors,
i.e., let \(S,U,V\) be corresponding matrix ( \(U\) and \(V\) is orthonormal, \(S\) is diagonal),
Pf. By intuition, \(A^TA=VS^2V^T\) and \(AA^T=US^2U^T\). Thus \(V,U,S\) should be the eigenvectors of \(A^TA\), the eigenvectors of \(AA^T\), and the square root of eigenvalues respectively.
In Lem 2, $$\Vert A\bm{v}\Vert=\sqrt{\bm{v}^T A^TA\bm{v}}= \sqrt{\bm{v}^T\lambda \bm{v}}=\sqrt{\lambda} \Vert\bm{v}\Vert.$$
It means that if \(V\) is the orthonormal matrix spanned by eigenvectors of \(A^TA\), then \(U=AVS^{-1}\) is the orthonormal matrix spanned by eigenvectors of \(AA^T\). It immediately gives $$U=MVS^{-1}\Rightarrow M=USV^T.$$
Prop. Let \(r=\text{rank}(A)\). We have
- The first \(r\) columns of \(U\), \(\bm{u}_1,...,\bm{u}_r\) form a orthonormal basis of \(\text{Col}(A)\).
- The first \(r\) rows of \(V\), \(\bm{v}_1,...,\bm{v}_r\) form a orthonormal basis of \(\text{Row}(A)\).
- The last \(m-r\) columns of \(U\) form a orthonormal basis of \(\text{Null}(A^T)\).
- The last \(n-r\) rows of \(V\) form a orthonormal basis of \(\text{Null}(A)\).
Best Rank-\(k\) Approximations
Let \(\sigma_1\ge \sigma_2\ge...\) be the square root of eigenvalues of \(A\). Define
Lem. $$\Vert M\Vert_F ^2 = \text{trace}(M^TM)=\sum\limits_i\sigma_i(M) ^2.$$
Lem. For any \(M\) with \(\text{rank}(M)=k\),
Pf. Since \(\text{rank}(M)=k,\dim(\text{Null}(M))=n-k\), \(\exists \bm{w}\in \text{Null}(M)\cap \text{Span}(\bm{v}_1,...,\bm{v}_k)\).
Thm. Let \(\Vert M\Vert_F=\sqrt{\sum\limits_{i,j}m_{i,j}^2}\). For any matrix \(B\) of rank at most \(k\), we have $$\Vert A-A_k\Vert_F\le \Vert A-B\Vert_F.$$
Pf. $$\Vert A-A_k\Vert_F^2=\sum\limits_{i=k+1} ^r \sigma_i(A)^2\le \sum\limits_{i=1} ^{r-k}\sigma_i(A-M) ^2\le \Vert A-M\Vert_F^2.$$
Thm. Let \(\Vert M\Vert_2=\sup\limits_{|\bm{x}|=1}|M\bm{x}|\). For any matrix \(B\) of rank at most \(k\), we have $$\Vert A-A_k\Vert_2\le \Vert A-B\Vert_2.$$
Principal Component Analysis (PCA): \(A\rightarrow AV_k\).
Power Method
Let \(B=A^TA=\sum\limits_{i=1}^n\sigma_i^2\bm{v}_i\bm{v}_i^T\). Since \(v_i\) are perpendicular to each others,$$B^k=\sum\limits_{i=1} ^n \sigma_i^{2k} \bm{v}_i\bm{v}_i^T.$$
Let \(\bm{x}=\sum\limits_{i=1}^nc_i\bm{v}_i\) be any vector. When \(k\) is large,
By following the above process, we can obtain \(\sigma_1,\bm{u}_1,\bm{v}_1\). Then let \(B'=B-\sigma_1\bm{v}_1\bm{v}_1^T\) and repeat the process to get \(\sigma_2,...,\sigma_r\).
Thm 3.11. Let \(A\) be an \(n\times d\) matrix and \(\bm{x}\) a unit length vector in \(\mathbb{R}^d\) with \(|\bm{x}^T \bm{v}_1|\ge\delta\), where \(\delta>0\). Let \(V\) be the space spanned by the right singular vectors of \(A\) corresponding to singular values greater than \((1-\varepsilon)\delta_1\). Let \(\bm{w}\) be the unit vector after \(k = \frac{\ln(1/\varepsilon)}{2\varepsilon}\) iterations of the power method, namely,
Then \(w\) has a component of at most \(\varepsilon\) perpendicular to \(V\).