【智应数】Singular Value Decomposition

SVD

Def (eigenvalue\eigenvector). Eigenvalue \(\lambda\) and eigenvector \(\bm{v}\) of matrix \(A\) satisfy $$A\bm{v}=\lambda\bm{v}.$$

Lem 1. Let \(M\in\mathbb{R}^{n\times n}\) is a symmetric matrix. Let \(\lambda_i\) and \(\bm{u}_i\) be the eigenvalues and eigenvectors of \(M\),

\[M=\sum_i\lambda_i\bm{u}_i\bm{u}_i^T. \]

i.e., let \(U\) be the orthonormal matrix spanned by eigenvectors, \(D\) be the diagonal matrix generated by eigenvalues,

\[M=UDU^T. \]

Pf. Only need to prove \(U^TMU=D\). By definition, \(\bm{u_1}^TM\bm{u_1}=\lambda_1\) and \(W_1^TM\bm{u}_1=\vec{0}\). Using induction immediately prove it.

Lem 2. Suppose \(\bm{v}\) is a eigenvector of \(A^TA\), then \(A\bm{v}\) is a eigenvector of \(AA^T\), with same eigenvalue. \(A^TA\) and \(AA^T\) share same non-negative eigenvalues.

Pf.

\[A^TA\bm{v}=\lambda\bm{v} \]

\[\Rightarrow AA^TA\bm{v}=A\lambda\bm{v} \]

\[\Rightarrow (AA^T)(A\bm{v})=\lambda (A\bm{v}). \]

Thm (SVD). Let \(M\in\mathbb{R}^{m\times n}\) is a arbitrary matrix. Let \(\sigma_i,\bm{u}_i\bm{v}_i\) be the square root of singular values, left singular vectors and right singular vectors,

\[M=\sum\limits_{i}\sigma_i\bm{u}_i\bm{v}_i^T. \]

i.e., let \(S,U,V\) be corresponding matrix ( \(U\) and \(V\) is orthonormal, \(S\) is diagonal),

\[M=USV^T. \]

Pf. By intuition, \(A^TA=VS^2V^T\) and \(AA^T=US^2U^T\). Thus \(V,U,S\) should be the eigenvectors of \(A^TA\), the eigenvectors of \(AA^T\), and the square root of eigenvalues respectively.

In Lem 2, $$\Vert A\bm{v}\Vert=\sqrt{\bm{v}^T A^TA\bm{v}}= \sqrt{\bm{v}^T\lambda \bm{v}}=\sqrt{\lambda} \Vert\bm{v}\Vert.$$

It means that if \(V\) is the orthonormal matrix spanned by eigenvectors of \(A^TA\), then \(U=AVS^{-1}\) is the orthonormal matrix spanned by eigenvectors of \(AA^T\). It immediately gives $$U=MVS^{-1}\Rightarrow M=USV^T.$$

Prop. Let \(r=\text{rank}(A)\). We have

  • The first \(r\) columns of \(U\), \(\bm{u}_1,...,\bm{u}_r\) form a orthonormal basis of \(\text{Col}(A)\).
  • The first \(r\) rows of \(V\), \(\bm{v}_1,...,\bm{v}_r\) form a orthonormal basis of \(\text{Row}(A)\).
  • The last \(m-r\) columns of \(U\) form a orthonormal basis of \(\text{Null}(A^T)\).
  • The last \(n-r\) rows of \(V\) form a orthonormal basis of \(\text{Null}(A)\).

Best Rank-\(k\) Approximations

Let \(\sigma_1\ge \sigma_2\ge...\) be the square root of eigenvalues of \(A\). Define

\[A_k=\sum\limits_{i=1}^k \sigma_i \bm{u}_i\bm{v}_i^T. \]

Lem. $$\Vert M\Vert_F ^2 = \text{trace}(M^TM)=\sum\limits_i\sigma_i(M) ^2.$$

Lem. For any \(M\) with \(\text{rank}(M)=k\),

\[\forall i, \sigma_{k+i}(A)\le \sigma_i(A-M). \]

Pf. Since \(\text{rank}(M)=k,\dim(\text{Null}(M))=n-k\), \(\exists \bm{w}\in \text{Null}(M)\cap \text{Span}(\bm{v}_1,...,\bm{v}_k)\).

\[\sigma_{k+1}(A)\Vert \bm{w}\Vert \le \Vert A\bm{w}\Vert\le \Vert (A-M)\bm{w}\Vert\le\sigma_{1}(A-M)\Vert \bm{w}\Vert. \]

Thm. Let \(\Vert M\Vert_F=\sqrt{\sum\limits_{i,j}m_{i,j}^2}\). For any matrix \(B\) of rank at most \(k\), we have $$\Vert A-A_k\Vert_F\le \Vert A-B\Vert_F.$$

Pf. $$\Vert A-A_k\Vert_F^2=\sum\limits_{i=k+1} ^r \sigma_i(A)^2\le \sum\limits_{i=1} ^{r-k}\sigma_i(A-M) ^2\le \Vert A-M\Vert_F^2.$$

Thm. Let \(\Vert M\Vert_2=\sup\limits_{|\bm{x}|=1}|M\bm{x}|\). For any matrix \(B\) of rank at most \(k\), we have $$\Vert A-A_k\Vert_2\le \Vert A-B\Vert_2.$$

Principal Component Analysis (PCA): \(A\rightarrow AV_k\).

Power Method

Let \(B=A^TA=\sum\limits_{i=1}^n\sigma_i^2\bm{v}_i\bm{v}_i^T\). Since \(v_i\) are perpendicular to each others,$$B^k=\sum\limits_{i=1} ^n \sigma_i^{2k} \bm{v}_i\bm{v}_i^T.$$

Let \(\bm{x}=\sum\limits_{i=1}^nc_i\bm{v}_i\) be any vector. When \(k\) is large,

\[B^k\bm{x}=\sum\limits_{i=1} ^n \sigma_i^{2k} \bm{v}_ic_i\bm{x}\approx\sigma_1^{2k}c_1\bm{v}_1. \]

By following the above process, we can obtain \(\sigma_1,\bm{u}_1,\bm{v}_1\). Then let \(B'=B-\sigma_1\bm{v}_1\bm{v}_1^T\) and repeat the process to get \(\sigma_2,...,\sigma_r\).

Thm 3.11. Let \(A\) be an \(n\times d\) matrix and \(\bm{x}\) a unit length vector in \(\mathbb{R}^d\) with \(|\bm{x}^T \bm{v}_1|\ge\delta\), where \(\delta>0\). Let \(V\) be the space spanned by the right singular vectors of \(A\) corresponding to singular values greater than \((1-\varepsilon)\delta_1\). Let \(\bm{w}\) be the unit vector after \(k = \frac{\ln(1/\varepsilon)}{2\varepsilon}\) iterations of the power method, namely,

\[w=\frac{(A^TA)^k\bm{x}}{|(A^TA)^k\bm{x}|}. \]

Then \(w\) has a component of at most \(\varepsilon\) perpendicular to \(V\).

posted @ 2024-05-14 20:10  xcyle  阅读(63)  评论(0编辑  收藏  举报