【智应数】High Dimensional Space
The Geometry of High Dimensions
High dimensional geometry is surprisingly different from low dimensional geometry.
-
Example 1: Volume concentrates on shell.
-
Example 2: As \(d\rightarrow \infty\), the area and the volumn of \(d\)-dimensional unit ball \(\rightarrow 0\).
-
Example 3:Volume (of ball) concentrates on equator.
Thm 2.7. Let \(A\) denote the portion of the ball with \(x_1\ge \frac{c}{\sqrt{d-1}}\) and let \(H\) denote the upper hemisphere. Then
-
Example 4:
Thm 2.8. Consider drawing \(n\) i.i.d. samples \(x_1,...,x_n\) from a unit ball. Then with probability \(1-O(\frac{1}{n})\) we have
- \(\Vert x_i\Vert\ge 1-\frac{2\ln n}{d}\) for all \(i\).
- \(\Vert x_i^Tx_j\Vert\le\frac{\sqrt{6\ln n}}{\sqrt{d-1}}\) for all \(i\neq j\).
Gaussians in High Dimension
Def (Sub-gaussian variable). For some \(k>0\),
Equivalent statement: For some \(\sigma^2>0\),
Def (Sub-gaussian norm).
e.g. \(X\sim N(0,\sigma^2)\), \(\Vert X\Vert_{\psi_2}=\sigma^2\).
Thm. Let \(X_1,...,X_n\) be i.i.d., \(\mathbb{E}(X_i)=0\), then \(\forall t>0\),
Def (Sub-exponential variable). For some \(k>0\),
Def (Sub-exponential norm).
e.g. \(X\sim N(0,\sigma^2)\), \(\Vert X^2\Vert_{\psi_1}=\sigma^2\).
Thm (Bernstein's Inequality). Let \(X_1,...,X_n\) be i.i.d., \(\mathbb{E}(X_i)=0\), then \(\forall t>0\),
Now we want to show gaussian concentrates near the annulus "shell".
Consider a \(X\sim N(0,I_d)\), by the fact that \(X_i^2\) is 1-subexponential, we can apply Bernstein's Inequality to obtain
Let \(t=\sqrt{\log\frac{1}{\delta}d}<d\) , then
It means that if \(X'\sim N(0,\frac{I_d}{\sqrt{d}})\), then \(\Vert X'\Vert^2\) concentrates near a shell of thickness \(\frac{1}{\sqrt{d}}\).
Thm 2.9 (Gaussian Annulus Theorem). For \(X\sim N(0,I_d)\), for any \(\beta\le\sqrt{d}\), with probability \(\ge 1-3e^{-c\beta^2}\) we have
Random Projection and Johnson-Lindenstrauss Lemma
Projection \(f:\mathbb{R}^d\rightarrow\mathbb{R}^k\). Pick \(\bm{u_1},...,\bm{u_k}\) as i.i.d. \(N(0,I_d)\). For any vector \(\bm{v}\),
Thm 2.10 (The Random Projection Theorem). There exists constant \(c>0\) such that for \(\varepsilon\in(0,1)\),
Proof: Only need to consider the case where \(|\bm{v}|=1\), then \(f(\bm{v})\sim N(0,I_k)\). Applying Thm 2.9. immediately gives it.
Thm 2.11 (Johnson-Lindenstrauss Lemma). Let \(k\ge \frac{3}{c\varepsilon^2}\ln n\). With probability \(\ge 1 − \frac{3}{2n}\), for any \(i, j\), we have