高斯分布

一维正态分布

常用英文：univariate Gaussian, normal distribution, Gaussian distribution等

定义

如果一个随机变量的概率密度函数为：

\[{\displaystyle f(x;\mu, \sigma ^ 2)={1 \over \sigma {\sqrt {2\pi }}}\,e^{-{(x-\mu )^{2} \over 2\sigma ^{2}}}} \]

则称一个随机变量\(\boldsymbol{X}\)服从正态分布，记为\({\displaystyle X\sim N(\mu ,\sigma ^{2})}\)，其中的\(\mu\)和\(\sigma^{2}\)分别称为均值和方差。
当\(\mu = 0\)，\(\sigma ^ 2 = 1\)时称为标准正态分布。

简单性质

一般正态分布可以通过标准正态分布线性变换得到
如果\({\displaystyle X\sim N(\mu _{X},\sigma _{X}^{2})}\)和\({\displaystyle Y\sim N(\mu _{Y},\sigma _{Y}^{2})}\)是两个相互独立的标准正态分布，那么:
- \({\displaystyle U=X+Y\sim N(\mu _{X}+\mu _{Y},\sigma _{X}^{2}+\sigma _{Y}^{2})}\)
- \({\displaystyle V=X-Y\sim N(\mu _{X}-\mu _{Y},\sigma _{X}^{2}+\sigma _{Y}^{2})}\)
- \(U\)和\(V\)也相互独立（如果\(X\)与\(Y\)的方差相等）

多维正态分布

英文：Multivariate normal distribution

定义

如果多维随机变量\({\displaystyle \ \mathbf{X}=[X_{1},\dots ,X_{k}]^{T}}\)的概率密度函数由下式给出：

\[f(x ; \mu, \Sigma)=\frac{1}{(2 \pi)^{n / 2}|\Sigma|^{1 / 2}} \exp \left(-\frac{1}{2}(x-\mu)^{T} \Sigma^{-1}(x-\mu)\right) \]

说明\(\mathbf{X}\)服从多元正态分布，记为 \(\displaystyle \mathbf {X} \ \sim \ {\mathcal {N}}({\boldsymbol {\mu }},\,{\boldsymbol {\Sigma }})\)，其中要求其协方差矩阵\({\boldsymbol {\Sigma }}\)是正定矩阵，\(x, \mu \in \mathbb{R}^n\)

其中: \({\displaystyle {\boldsymbol {\mu }}=\operatorname {E} [\mathbf {X} ]=[\operatorname {E} [X_{1}],\operatorname {E} [X_{2}],\ldots ,\operatorname {E} [X_{k}]]^{\rm {T}}}\)称为均值向量。
\({\displaystyle {\boldsymbol {\Sigma }}=:\operatorname {E} [(\mathbf {X} -{\boldsymbol {\mu }})(\mathbf {X} -{\boldsymbol {\mu }})^{\rm {T}}]=[\operatorname {Cov} [X_{i},X_{j}];1\leq i,j\leq k]}\)称为协方差矩阵，其中\({\displaystyle \Sigma _{i,j}=:\operatorname {E} [(X_{i}-\mu _{i})(X_{j}-\mu _{j})]=\operatorname {Cov} [X_{i},X_{j}]}\)

补充定义

Standard normal random vector

如果\({\displaystyle \mathbf {X} =(X_{1},\ldots ,X_{k})^{\mathrm {T} }}\)被称为standard normal random vector ,如果他的每个分量相互独立且都是标准正态分布，即\({\displaystyle X_{n}\sim \ {\mathcal {N}}(0,1)},n=1,...,k\)

Centered normal random vector

A real random vector \({\displaystyle \mathbf {X} =(X_{1},\ldots ,X_{k})^{\mathrm {T} }}\)is called a centered normal random vector if there exists a deterministic \({\displaystyle k\times \ell }\) matrix \({\displaystyle {\boldsymbol {A}}}\) such that \({\displaystyle {\boldsymbol {A}}\mathbf {Z} }\) has the same distribution as \({\displaystyle \mathbf {X} }\) where \({\displaystyle \mathbf {Z} }\) is a standard normal random vector with \({\displaystyle \ell }\) components

Normal random vector

有定理

\[{\displaystyle \mathbf {X} \ \sim \ {\mathcal {N}}(\mathbf {\mu } ,{\boldsymbol {\Sigma }})\quad \iff \quad {\text{there exist }}\mathbf {\mu } \in \mathbb {R} ^{k},{\boldsymbol {A}}\in \mathbb {R} ^{k\times \ell }{\text{ such that }}\mathbf {X} ={\boldsymbol {A}}\mathbf {Z} +\mathbf {\mu } {\text{ for }}Z_{n}\sim \ {\mathcal {N}}(0,1),n=1,...,k,{\text{i.i.d.}}} \]

且协方差矩阵\({\displaystyle {\boldsymbol {\Sigma }}={\boldsymbol {A}}{\boldsymbol {A}}^{\mathrm {T} }}\)

性质

性质1：
\(\begin{array}{l}{\text { Theorem 1. Let } X \sim \mathcal{N}(\mu, \Sigma) \text { for some } \mu \in \mathbf{R}^{n} \text { and } \Sigma \in \mathbf{S}_{++}^{n} . \text { Then, there exists a matrix}} \\ {B \in \mathbf{R}^{n \times n} \text { such that if we define } Z=B^{-1}(X-\mu), \text { then } Z \sim \mathcal{N}(0, I)}\end{array}\)

性质2：
\({\displaystyle D_{\text{KL}}({\mathcal {N}}_{0}\|{\mathcal {N}}_{1})={1 \over 2}\left\{\operatorname {tr} \left({\boldsymbol {\Sigma }}_{1}^{-1}{\boldsymbol {\Sigma }}_{0}\right)+\left({\boldsymbol {\mu }}_{1}-{\boldsymbol {\mu }}_{0}\right)^{\rm {T}}{\boldsymbol {\Sigma }}_{1}^{-1}({\boldsymbol {\mu }}_{1}-{\boldsymbol {\mu }}_{0})-k+\ln {|{\boldsymbol {\Sigma }}_{1}| \over |{\boldsymbol {\Sigma }}_{0}|}\right\}}\)

Ric's Blog

韬光养晦，以下克上