数理统计学复习

格式：$\newcommand{\dif}{\mathop{}\\!\mathrm{d}}$
/ 连接的是同一个概念的两个名称
，表示列举
？表示TODO
（）表示注解

chap 1

概念

概率空间

A probability space is a triple $(\Omega, \mathcal{F}, P)$ where $\Omega$ is a set of "outcomes," $\mathcal{F}$ is a set of "events," and $P\colon \mathcal{F} \to [0,1] $ is a function that assigns probabilities to events. We assume that $\mathcal{F}$ is a $\sigma$-field (or $\sigma$-algebra), i.e., a (nonempty) collection of subsets of $\Omega$ that satisfy

(i) if $A\in\mathcal{F}$ then $A^c \in\mathcal{F}$, and
(ii) if $A_i\in\mathcal{F}$ is a countable sequence of sets then $\bigcup_iA_i\in\mathcal{F}$.

Here and in what follows, countable means finite or countably infinite. Since $\bigcap_iA_i = (\bigcup_iA_i^c)^c$, it follows that a $\sigma$-field is closed under countable intersections.

Without $P$, $(\Omega,\mathcal{F})$ is called a measurable space, i.e., it is a space on which we can put a measure. A measure is a nonnegative countably additive set of function; that is, a function $\mu\colon\mathcal{F}\to\mathbb{R}$ with

(i) $\mu(A)\ge\mu(\emptyset) = 0$ for all $A\in\mathcal{F}$, and
(ii) if $A_i\in\mathcal{F}$ is a countable sequence of disjoint sets, then
\[
\mu(\bigcup_iA_i) = \sum_i\mu(A_i)
\]
If $\mu(\Omega) = 1$, we call $\mu$ a probability measure. In this book, probability measures are usually denoted by $P$.

随机试验/随机现象
样本空间，样本点
随机事件 abbr. 事件
事件 $A$ 与 $B$ 相互独立
$n$ 个事件相互独立，两两独立

?随机变量 abbr. RV： $X$，$Y$，……

A real valued function $X$ defined on $\Omega$ is said to be a random variable if for every Borel set $B\in\mathbb R$ we have $X^{-1}(B) = \\{\omega\colon X(\omega)\in B\\}\in\mathcal{F}$. When we need to emphasize the $\sigma$-field, we will say that $X$ is $\mathcal{F}$-measurable or write $X\in\mathcal{F}$.（A Borel set is an element of a Borel sigma-algebra.）

这个定义我还不能完全理解，我不理解 Berel set 究竟是什么。我本科概统教材上给出的随机变量的定义是：

设 $\Omega$ 为一个样本空间，若对任意 $\omega\in\Omega$，都有一个实数 $X(\omega)$ 与之对应，则称 $X(\omega)$ 为一个随机变量，并简记为 $X$ 。

分布函数/DF $F(x) := P(X\le x)\quad x\in\mathbb R$
概率密度函数 abbr. 密度函数/PDF（在下文中，对于连续 RV，“分布”一词一般指概率密度函数）

正态分布：$X\sim N(\mu,\sigma^2)\quad f(x) = \dfrac{1}{\sqrt{2\pi}\sigma} e^{{-\dfrac{(x-\mu)}2}{2\sigma^2}}, x\in\mathbb R, \mu\in\mathbb R, \sigma > 0 $

高斯积分

考虑广义积分

\[A = \int_{-\infty}^{\infty} \dfrac{1}{\sqrt{2\pi}\sigma} e^{-\dfrac{(x-\mu)^2}{2\sigma^2}} \dif x \]

做变量替换，令 $ t = \frac{x - \mu} {\sqrt2\sigma}$，得

\[A = \frac1{\sqrt\pi} \boxed{\color{blue} {\int_{-\infty}^{\infty} e^{-t^2} \dif t} } \]

$\int_{-\infty}^{\infty} e^{-t^2} \dif t$ 称作高斯积分，也称概率积分。需要证明

\[I = \int_{-\infty}^{\infty} e^{-t^2} \dif t = \sqrt{\pi} \]

为证此式，考虑

\[I^2 = \int_{-\infty}^{\infty} e^{-t^2} \dif t \int_{-\infty}^{\infty} e^{-u^2} \dif u = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{-(t^2+u^2)}\dif t\dif u \]

转化成极坐标 $ t = r\cos\theta, u = r\sin\theta$，上式转化为

\[I^2 = \int_0^{2\pi}\dif\theta\int_0^\infty e^{-r^2}r\dif r = \pi, \]

明所欲证。

二维随机变量 $(X,Y)$（可以理解为两个随机变量）
$(X,Y)$ 的联合分布函数 $F(X,Y) := P(X\le x, Y\le y)$
边际分布函数：$F_X(x)$，$F_Y(y)$
随机变量的函数的分布：$X$ 的 PDF 为 $f(x)$，$Y=g(X)$，求 $Y$ 的 PDF。

随机变量的数字特征

数学期望 abbr. 期望：$E(X)$

和的期望等于期望的和，不论独立不独立。

$k$ 阶原点矩： $E(X^k)$
$k$ 阶中心矩：$E\\{[X-E(X)]^k \\}$
方差：$\mathrm{Var}(X) := E\\{[X-E(X)]^2 \\} =\boxed{\color{blue}{ E(X^2) - [E(X)]^2 }}$（二阶中心矩）
标准差：$\sigma_X = \sqrt{\mathrm{Var}(X)}$
$X$，$Y$ 的协方差：$\mathrm{Cov}(X,Y) := E\\{[X-E(X)] [Y-E(Y)] \\} = \boxed{E(XY) - E(X)E(Y)}$

\begin{align*}
\mathrm{Var}(X+Y)& = E[(X+Y)^2] - [E(X+Y)]^2 \\
&= E(X^2) - [E(X)]^2 + E(Y^2) - [E(Y)]^2 + 2[E(XY)- E(X)E(Y)] \\
&= \mathrm{Var}(X) + \mathrm{Var}(Y) + 2\mathrm{Cov}(X,Y)
\end{align*}

$X$，$Y$ 的线性相关系数：$\rho_{XY} = \dfrac{\mathrm{Cov}(X,Y)}{\sqrt{\mathrm{Var}(X)}\sqrt{\mathrm{Var}(Y)}}$
$X$ 的标准化随机变量：$X^\* := \dfrac{X-E(X)} {\sqrt{\mathrm{Var}(X)}}$
$n$ 个随机变量的协方差矩阵：$\Sigma := (\sigma_{ij})\_{n\times n}$，$\sigma_{ij} = \mathrm{Cov}(X_i,X_j)$

泊松分布：$P(X = k) = \dfrac{\lambda^k}{k!}e^{-\lambda}$，记做 $X\sim P(\lambda)$，$\lambda > 0$ 。
$E(X) = e^{-\lambda}\sum_{k\ge 0} k\dfrac{\lambda^k}{k!} = e^{-\lambda}\sum_{k\ge 1} k\dfrac{\lambda^k}{k!} = \lambda e^{-\lambda}\sum_{k\ge 1} \dfrac{\lambda^{k-1}}{(k-1)!} = \lambda e^{-\lambda}\sum_{k\ge 0} \dfrac{\lambda^{k}}{k!} = \lambda$

$ E(X^2) = e^{-\lambda} \sum_{k>=0} k^2 \dfrac{\lambda^k}{k!} = \lambda e^{-\lambda} \sum_{k\ge 0} (k+1) \dfrac{\lambda^{k}}{k!} = \lambda(\lambda + 1)$

从而 $ \mathrm{Var}(X) = E(X^2) - [E(X)]^2 = \lambda $

chap 2

总体，个体，个体的数量指标（个体的出现是随机的 $\implies$ 个体的数量指标是随机变量，记做 $X$）
抽样
样本，样本容量 $n$（样本是一个复数（plural）概念，抽到的个体是随机得到的，其数量指标是 $n$ 个随机变量 $X_1, \dots, X_n$）
样本容量 $n$，样本观测值 $x_1, \dots x_n$ 。
简单随机抽样，简单随机样本

统计量：函数 $T = T(X_1, \dots, X_n)$，其中不含未知参数。
样本均值：$\overline{X} = \frac1n \sum_{1\le i\le n} X_i$
样本方差：$S^2 = \frac{1}{n-1}\sum_{1\le i\le n} \left(X_i - \overline{X}\right)^2$，样本标准差 $S$
样本 $k$ 阶原点矩：$A_k = \frac{1}{n}\sum_{1\le i\le n} X_i^k$
极大次序统计量：$X_{(n)} = \max\\{X_1, \dots, X_n\\}$
极小次序统计量：$X_{(1)} = \min\\{X_1, \dots, X_n\\}$

抽样分布：统计量的分布（统计量也是一个随机变量）
$\chi^2$ 分布：$\chi^2 = X_1^2 + \dots + X_n^2, \quad X_i\sim N(0,1)$

$\chi^2$ 分布的推导

设 RV $X\sim \chi^2(n)$ 。考虑 $X$ 的 DF

\[P(X \le x) = \frac1{\left(\sqrt{2\pi}\right)^n} \int_{V} e^{-\frac12(\sum_i x_i^2)} \dif x_1 \dif x_2 \dots \dif x_n \]

其中积分区域 $V$ 为 $n$ 维球 $\sum_i x_i^2 \le x$ 。

由于积分区域和被积函数具有球对称性，上述积分在 $n$ 维球坐标系下表示为

\[P(X \le x) = c_n \int_{0}^{\sqrt{x}} e^{-\frac{r^2}{2}} r^{n-1}\dif r \]

其中 $c_n$ 是与 $n$ 有关的常数。

考虑上述积分在 $x\to \infty$ 时的极限，有

\[1 = c_n \boxed{\color{blue}{\int_{0}^{\infty} e^{-\frac{r^2}{2}} r^{n-1}\dif r}} \]

形如 $ \int_{0}^{\infty} e^{-\frac{r2}{2}} r^{n-1}\dif r $ 的广义积分没有解析形式，引入一种特殊函数来表示这一类积分。

$\Gamma$ 函数

\[\Gamma(x) = \int_0^\infty t^{x-1} e^{-t} \dif t , \quad x > 0 \]

注意：并非对任意 $x\in\mathbb{R}$ 上述广义积分都收敛，$\Gamma(0)$ 就不收敛。

做变量替换，令 $u = r^2/2$ ，则 $r = (2u)^{\frac12}, \dif r = (2u)^{-\frac12}\dif u$ 。于是

\[\int_{0}^{\infty} e^{-\frac{r^2}{2}} r^{n-1}\dif r = \int_{0}^{\infty} e^{-u} (2u)^{n/2-1} \dif u = 2^{n/2-1} \Gamma(\frac n2) \]

于是

\[
c_n = \frac1{2^{n/2-1} \Gamma(\frac n2)}
\]

从而 $\chi^2(n)$ 的 PDF 为

\[ f(x) = \frac{\dif P(X\le x)}{\dif x} = c_n \frac { e^{-\frac{x}{2}} x^{\frac{n-1}{2}} \dif\sqrt{x} } {\dif x} = \frac1{2^{n/2} \Gamma(\frac n2)} e^{-\frac{x}{2}} x^{\frac{n}{2}-1} \]

设 RV $X$ 的 PDF 为 $f(x)$，$Y$ 的 PDF 为 $g(y)$，求 $Z=X+Y$ 的 PDF $h(z)$，考虑下面几种解法是否正确。
(1) $ h(z) = \int_{-\infty}^\infty f(x)g(z-x)\dif x$
(2) $ h(z) = \int_{-\infty}^\infty f(x)g_{Y\mid X}(z-x\mid x)\dif x$
其中 $g_{Y\mid X}(z-x\mid x)$ 为给定 $X=x$ 的条件下 $Y$ 的条件密度函数，$g_{Y\mid X}(y\mid x) = \dfrac{f(x,y)}{f_X(x)}$ 。
注意：这里的 $f(x,y)$ 不能由 $f(x)$ 和 $g(y)$ 算出来，需要另外给出。

posted @ 2018-04-12 22:04 Pat 阅读(726) 评论(0) 编辑收藏举报

刷新页面返回顶部

Pat

「以解决问题为乐」

真的喜欢么？真的喜欢就去做吧。

Lost Boy Calling 。。。。

... Many of these issues are best dealt with at the algorithmic level, rather than by "tweaking" the code.

This is an obscurity that catches the unwary.

原来我什么都不懂。

数理统计学复习

chap 1

概念

高斯积分

随机变量的数字特征

chap 2

\(\chi^2\) 分布的推导

\(\Gamma\) 函数

公告

Pat

「以解决问题为乐」 真的喜欢么？真的喜欢就去做吧。 Lost Boy Calling 。。。。 ... Many of these issues are best dealt with at the algorithmic level, rather than by "tweaking" the code. This is an obscurity that catches the unwary. 原来我什么都不懂。

数理统计学复习

chap 1

概念

高斯积分

随机变量的数字特征

chap 2

\(\chi^2\) 分布的推导

\(\Gamma\) 函数

公告

「以解决问题为乐」

真的喜欢么？真的喜欢就去做吧。

Lost Boy Calling 。。。。

... Many of these issues are best dealt with at the algorithmic level, rather than by "tweaking" the code.

This is an obscurity that catches the unwary.

原来我什么都不懂。