上节我们通过四种方式定义了一个服从多维正态分布的随机向量,而这一节我们开始讨论随机向量的独立性和条件分布。
- 将\(p\)维随机向量\(X\sim N_p(\mu,\Sigma)\)进行分割:
\[X=
\left[
\begin{array}{c}
X^{(1)}_r\\
X^{(2)}_{p-r}
\end{array}
\right],
\mu=
\left[
\begin{array}{c}
\mu^{(1)}_r\\
\mu^{(2)}_{p-r}
\end{array}
\right],
\Sigma=
\left[
\begin{array}{c|c}
\Sigma_{11} &\Sigma_{12}\\ \hline
\Sigma_{21} &\Sigma_{22}
\end{array}
\right]>0,(\Sigma_{11}为r\times r方阵)
\]
一、独立性
设 \(p\) 维随机向量 \(X\sim N_p(\mu,\Sigma)\),
\[X=
\left[
\begin{array}{c}
X^{(1)}\\
X^{(2)}
\end{array}
\right]\sim
\left(
\left[
\begin{array}{c}
\mu^{(1)}\\
\mu^{(2)}
\end{array}
\right],
\left[
\begin{array}{cc}
\Sigma_{11} &\Sigma_{12}\\
\Sigma_{21} &\Sigma_{22}
\end{array}
\right]
\right)
\]
则
\[X^{(1)}与 X^{(2)}相互独立\ \leftrightarrows\ \Sigma_{12}=O
\]
- 这则充要条件说的是,对于一个服从正态分布的随机向量,若将其划分为两部分,那两个子量互不相关的充要条件是他们的协方差为\(O\).
(证明)
设\(\Sigma_{12}=O\),则\(X\)的联合密度函数为:
\[\begin{align}
f(x^{(1)},x^{(2)})=&
\frac1{(2\pi)^{p/2}|\Sigma|^{1/2}}exp\left(-\frac12(x-\mu)'
\left[
\begin{array}{cc}
\Sigma_{11}&O\\
O&\Sigma_{22}
\end{array}
\right]^{-1}
(x-\mu)
\right)\\
=&
\frac1{(2\pi)^{r/2}|\Sigma_{11}|^{1/2}}exp\left(-\frac12(x^{(1)}-\mu^{(1)})'
\Sigma_{11}^{-1}
(x^{(1)}-\mu^{(1)})
\right)\\
&\cdot
\frac1{(2\pi)^{(p-r)/2}|\Sigma_{22}|^{1/2}}exp\left(-\frac12(x^{(2)}-\mu^{(2)})'
\Sigma_{22}^{-1}
(x^{(2)}-\mu^{(2)})
\right)\\
=&f_1(x^{(1)})\cdot f_2(x^{(2)})
\end{align}
\]
因此\(X^{(1)},X^{(2)}\)相互独立。
(推论)
- 设\(r_i\geq1,(i=1,\dots,k)\),且\(r_1+r_2+\dots+r_k=p\),则有
\[X=
\left[
\begin{array}{c}
X^{(1)}\\
\vdots\\
X^{(k)}
\end{array}
\right]\sim
N_p
\left(
\left[
\begin{array}{c}
\mu^{(1)}\\
\vdots\\
\mu^{(k)}
\end{array}
\right],
\left[
\begin{array}{ccc}
\Sigma_{11} &\cdots &\Sigma_{1k}\\
\vdots&&\vdots\\
\Sigma_{k1} &\cdots &\Sigma_{kk}
\end{array}
\right]_{p\times p}
\right)
\]
则\(X^{(1)},X^{(2)},\dots,X^{(k)}\)相互独立 \(\leftrightarrows\) \(\Sigma_{ij}=O,(i\neq j)\).
- 设\(X=(X_1,\dots,X_p)'\sim N_p(\mu,\Sigma)\),若\(\Sigma\)为对角矩阵,则\(X_1,\dots,X_p\)相互独立。
二、条件分布
对于一个二元正态分布,由条件分布的定义我们知道:当\(X_2\)给定时,\(X_1\)的条件密度为:
\[f_1(x_1|x_2)=\frac{f(x_1,x_2)}{f_2(x_2)}
\]
由于我们还不知道\(f(x_1|x_2)\)的通式,但由二元正态分布的联合密度函数我们有:
\[f(x_1,x_2)
=(*)\\
=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}exp\left\{-\frac{1}{2(1-\rho^2)}[(\frac{x_1-\mu_1}{\sigma_1})^2-2\rho(\frac{x_1-\mu_1}{\sigma_1})(\frac{x_2-\mu_2}{\sigma_2})+(\frac{x_2-\mu_2}{\sigma_2})^2]\right\}
\]
简单变形,在指数项内\(\left(+\rho^2(\frac{x_2-\mu_2}{\sigma_2})^2-\rho^2(\frac{x_2-\mu_2}{\sigma_2})^2\right)\)则可得:
\[(*)=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}exp\left\{-\frac{1}{2(1-\rho^2)}[(\frac{x_1-\mu_1}{\sigma_1})^2-2\rho(\frac{x_1-\mu_1}{\sigma_1})(\frac{x_2-\mu_2}{\sigma_2})\\+(1-\rho^2)(\frac{x_2-\mu_2}{\sigma_2})^2+\rho^2(\frac{x_2-\mu_2}{\sigma_2})^2]\right\}
\]
由指数运算性质,我们可以将\(Exp\left[-\frac1{2(1-\rho^2)}(1-\rho^2)(\frac{x_2-\mu_2}{\sigma_2})^2\right]\)项提出:
\[(*)=\frac{1}{\sqrt{2\pi}\sigma_2}exp\left\{-\frac{1}{2}(\frac{x_2-\mu_2}{\sigma_2})^2\right\}\\
\cdot\frac{1}{\sqrt{2\pi}\sigma_1\sqrt{1-\rho^2}}exp\left\{-\frac{1}{2(1-\rho^2)}[(\frac{x_1-\mu_1}{\sigma_1})^2-2\rho(\frac{x_1-\mu_1}{\sigma_1})(\frac{x_2-\mu_2}{\sigma_2})+\rho^2(\frac{x_2-\mu_2}{\sigma_2})^2]\right\}\\
\]
可以看到第一项就是服从\(X_2\sim N(\mu_2,\sigma_2^2)\)的一元概率密度函数\(f_2(x_2)\),而第二项经过简单整理可以得出下式:
\[(*)=f_2(x_2)\cdot\frac{1}{\sqrt{2\pi}\sigma_1\sqrt{1-\rho^2}}\cdot exp\left\{-\frac{1}{2(1-\rho^2)}[(\frac{x_1-\mu_1}{\sigma_1})-\rho(\frac{x_2-\mu_2}{\sigma_2})]^2\right\}
\]
由于\(k^2(a+\frac{a}{k})^2=(ka+b)^2\),经过简单整理得:
\[(*)=f_2(x_2)\cdot\frac{1}{\sqrt{2\pi}\sigma_1\sqrt{1-\rho^2}}\cdot exp\left\{-\frac{1}{2(1-\rho^2)\sigma_1^2}[x_1-\mu_1-\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2)]^2\right\}\\
\]
于是我们得到了二元正态分布全概率公式:
\[f(x_1,x_2)=f_2(x_2)\cdot f(x_1|x_2)
\]
其中,\(f(x_1|x_2)\)为给定\(x_2\)条件下,\(x_1\)的条件概率密度函数:
\[f(x_1|x_2)=\frac{1}{\sqrt{2\pi}\sigma_1\sqrt{1-\rho^2}}\cdot
exp\left\{
-\frac{1}{2(1-\rho^2)\sigma_1^2}[x_1-\left(\mu_1
+\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2)\right)]^2
\right\}\\
\]
则可以得到\((X_1|X_2)\)服从正态分布,且:
\[(X_1|X_2)\sim N_1\left(\mu_1+\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2),\sigma^2(1-\rho^2)\right)
\]
将其推广到多维:
设
\[X=
\left[
\begin{array}{c}
X^{(1)}_r\\
X^{(2)}_{p-r}
\end{array}
\right]\sim N_p(\mu,\Sigma),(\Sigma>0)
\]
则当\(X^{(2)}\)给定时,\(X^{(1)}\)的条件分布为:
\[(X^{(1)}|X^{(2)})\sim N_r(\mu_{1\cdot2},\Sigma_{11\cdot2})
\]
其中
\[\mu_{1\cdot2}=\mu^{(1)}+\Sigma_{12}\Sigma_{22}^{-1}(x^{(2)}-\mu^{(2)})\\
\Sigma_{11\cdot2}=\Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}
\]
下附证明,而这段证明对于做题事实上非常具有启发性,后面会附上书上的一道课后习题:
(引理-\(\Sigma\)的分块求逆公式)
\[\left[
\begin{array}{c|c}
\Sigma_{11}&\Sigma_{12}\\\hline
\Sigma_{21}&\Sigma_{22}
\end{array}
\right]^{-1}
=\Sigma^{-1}=\left[
\begin{array}{c|c}
\Sigma_{11.2}^{-1}&-\Sigma_{11.2}^{-1}\Sigma_{12}\Sigma_{22}^{-1}\\\hline
-\Sigma_{22}^{-1}\Sigma_{21}\Sigma_{11.2}^{-1}&\Sigma_{22}^{-1}+\Sigma_{22}^{-1}\Sigma_{21}\Sigma_{11.2}^{-1}\Sigma_{12}\Sigma_{22}^{-1}
\end{array}
\right]
\]
其中:\(\Sigma_{11.2}=\Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}\).
(证明)
我们若想求出\((X^{(1)}|X^{(2)})\)的分布只需要构造出其概率其密度函数,而由条件分布的定义可知:
\[f(X_1,X_2)=f(X_1|X_2)f(X_2)
\]
而我们可以通过求解二元条件分布的时候使用的方法一样,通过构造一个非奇异的线性变换:
\[\begin{align}
Z=\left[\begin{array}{c}Z^{(1)}\\Z^{(2)}\end{array}\right]=&\left[\begin{array}{c}X^{(1)}-\Sigma_{12}\Sigma_{22}^{-1}X^{(2)}\\X^{(2)}\end{array}\right]\\
=&\left[\begin{array}{c|c}I_r&-\Sigma_{12}\Sigma_{22}^{-1}\\\hline O&I_{p-r}\end{array}\right]\left[\begin{array}{c}X^{(1)}\\X^{(2)}\end{array}\right]\\
=&BX
\end{align}
\]
则我们可以得出\(Z\sim N_p(B\mu,B\Sigma B')\),即:
\[\begin{align}
B\Sigma B'=&\left[\begin{array}{c|c}I_r&-\Sigma_{12}\Sigma_{22}^{-1}\\\hline O&I_{p-r}\end{array}\right]\left[
\begin{array}{c|c}
\Sigma_{11}&\Sigma_{12}\\\hline
\Sigma_{21}&\Sigma_{22}
\end{array}
\right]\left[\begin{array}{c|c}I_r&O\\\hline -\Sigma_{12}\Sigma_{22}^{-1}&I_{p-r}\end{array}\right]\\
=&\left[\begin{array}{c|c}\Sigma_{11.2}&O\\\hline O&\Sigma_{22}\end{array}\right]
\end{align}
\]
于是我们可以得出\(Z^{(1)},Z^{(2)}\)相互独立的结论,于是就可以写出\(Z\)的联合密度函数\(g(z^{(1)},z^{(2)})\),同时应注意到\(Z^{(2)}=X^{(2)}\):
\[g(z^{(1)},z^{(2)})=g_1(z^{(1)})g_2(z^{(2)})=g_1(z^{(1)})f_2(z^{(2)})
\]
另外,因为\(Z=BX\),利用雅可比行列式,我们可以用\(g(z)\)来表示\(X\)的密度函数\(f(x)\):
\[\begin{align}
f(x^{(1)},x^{(2)})=&g(Bx)\cdot J(z\to x)\\
=&g_1(x^{(1)}-\Sigma_{12}\Sigma_{22}^{-1}x^{(2)})f_2(x^{(2)})
\end{align}
\]
再次我们进行总结:
- 我们构造了一个非奇异线性变换,并且证明了\(Z\)是服从正态分布的随机变量,而且\(Z^{(1)},Z^{(2)}=X^{(2)}\)相互独立;
- 还是通过线性变换的性质,我们借助雅可比行列式,将\(X,Z\)的密度函数建立起了等式关系。
于是我们通过条件分布的定义,可以轻松写出变量\((X_1|X_2)\)的密度函数为:
\[\begin{align}
f_1(x^{(1)}|x^{(2)})=&\frac{f(x^{(1)},x^{(2)})}{f_2(x^{(2)})}=g_1(x^{(1)}-\Sigma_{12}\Sigma_{22}^{-1}x^{(2)})\\
=&\frac{1}{(2\pi)^{r/2}|\Sigma_{11.2}|^{1/2}}Exp\left[-\frac12(x^{(1)}-\mu_{1.2})'\Sigma_{11.2}^{-1}(x^{(1)}-\mu_{1.2})\right]
\end{align}
\]
由定义得知,该式符合正态分布,即:
\[(X^{(1)}|X^{(2)})\sim N_r(\mu_{1.2},\Sigma_{11.2})
\]
重要推论!!
- \(X^{(1)}-\Sigma_{12}\Sigma_{22}^{-1}X^{(2)}\)与\(X^{(2)}\)相互独立;
- \(X^{(2)}-\Sigma_{21}\Sigma_{11}^{-1}X^{(1)}\)与\(X^{(1)}\)相互独立;
- \((X^{(2)}|X^{(1)})\sim N_{p-r}(\mu_{2.1},\Sigma_{22.1})\)且
\[\mu_{2.1}=\mu^{(2)}+\Sigma_{21}\Sigma_{11}^{-1}(x^{(1)}-\mu^{(1)})\\
\Sigma_{22.1}=\Sigma_{22}-\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}
\]