抽样调查:证明与练习

设计效应抽样调查 证明与练习

证明部分

证明1

证明:对简单随机估计\(\bar{y}\),有\({E}(\bar{y})=\bar{Y}\)\({V}(\bar{y})=\dfrac{1-f}{N}S^2\)

\(a_i\)表示总体中\(Y_i\)入样这一事件,则\(a_i\)是随机变量,且

\[{E}(a_i)=f,\quad {V}(a_i)={E}(a_i^2)-[{E}(a_i)]^2=f(1-f),\\ {E}(a_ia_j)=\frac{n(n-1)}{N(N-1)},\\ \mathrm{cov}(a_i,a_j)={E}(a_ia_j)-{E}(a_i){E}(a_j)=\frac{-f(1-f)}{(N-1)}. \]

同时可以对\(\bar{y}\)作变换为

\[\bar{y}=\frac{1}{n}\sum_{i=1}^{n}y_i=\frac{1}{n}\sum_{i=1}^{N}a_iY_i. \]

因此对期望,有

\[\begin{aligned} E(\bar{y})&=\frac{1}{n}E\left(\sum_{i=1}^{N}a_iY_i \right)\\ &=\frac{1}{n}\sum_{i=1}^{N}E(a_i)Y_i\\ &=\frac{f}{n}\sum_{i=1}^{N}Y_i\\ &=\bar{Y}; \end{aligned} \]

对方差,有

\[\begin{aligned} V(\bar{y})&=\frac{1}{n^2}V\left(\sum_{i=1}^{N}a_iY_i \right)\\ &=\frac{1}{n^2}\left[\sum_{i=1}^{N}Y_i^2V(a_i)+2\sum_{i<j}^{N}Y_iY_j\mathrm{cov}(a_i,a_j) \right]\\ &=\frac{1}{n^2}\left[f(1-f)\sum_{i=1}^{N}Y_i^2-2\frac{f(1-f)}{N-1}\sum_{i<j}^{N}Y_iY_j \right]\\ &=\frac{f(1-f)}{n^2}\left[\sum_{i=1}^{N}Y_i^2-\frac{1}{N-1}\sum_{i<j}^{N}2Y_iY_j \right]\\ &=\frac{f(1-f)}{n^2}\left[\sum_{i=1}^{N}Y_i^2-\frac{1}{N-1}\left(\sum_{i=1}^{N}Y_i \right)^2+\frac{1}{N-1}\sum_{i=1}^{N}Y_i^2 \right] \\ &=\frac{f(1-f)}{n^2}\left[\frac{N}{N-1}\sum_{i=1}^{N}Y_i^2-\frac{1}{N-1}\left(\sum_{i=1}^{N}Y_i \right)^2 \right]\\ &=\frac{f(1-f)}{n^2}\frac{N}{N-1}\left[\sum_{i=1}^{N}Y_i^2-N\bar{Y}^2 \right]\\ &=\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{n}(Y_i-\bar{Y})^2\\ &=\frac{1-f}{n}S^2. \end{aligned} \]

证明2

证明:样本方差是总体方差的无偏估计,即\(E(s^2)=S^2\);样本协方差是总体协方差的无偏估计,即\(E(s_{yx})=S_{yx}\)

沿用上题的记号,有

\[\begin{aligned} E(s^2)&=E\left[\frac{1}{n-1}\sum_{i=1}^{n}(y_i-\bar{y})^2 \right]\\ &=\frac{1}{n-1}E\left(\sum_{i=1}^{n}y_i^2\right)-\frac{n}{n-1}E(\bar{y}^2)\\ &=\frac{1}{n-1}E\left(\sum_{i=1}^{N}a_iY_i^2 \right)-\frac{n}{n-1}\left[\frac{1-f}{n}S^2+\bar{Y}^2 \right]\\ &=\frac{f}{n-1}\sum_{i=1}^{N}Y_i^2-\frac{1-f}{n-1}S^2+\frac{n}{n-1}\bar{Y}^2\\ &=\frac{f}{n-1}\left[(N-1)S^2+N\bar{Y}^2 \right]-\frac{1-f}{n-1}S^2+\frac{n}{n-1}\bar{Y}^2\\ &=S^2\left[\frac{f(N-1)-(1-f)}{n-1} \right]+\bar{Y}^2\left(\frac{fN-n}{n-1} \right)\\ &=S^2. \end{aligned} \]

为证下一个结论,需要先计算\(\mathrm{cov}(\bar{y},\bar{x})\)。为此,引进变换\(U=Y+X\),类似定义\(u_i\)\(\bar{u}\)\(S_u^2\),于是

\[V(\bar u)=V(\bar y)+V(\bar x)+2\mathrm{cov}(\bar y, \bar x),\\ \begin{aligned} \mathrm{cov}(\bar y,\bar x)&=\frac{1}{2}[V(\bar u)-V(\bar y)-V(\bar x)]\\ &=\frac{1}{2}\frac{1-f}{n}\frac{1}{N-1}\left[\sum_{i=1}^{N}[(U_i-\bar{U})^2-(Y_i-\bar{Y})^2-(X_i-\bar{X})^2 \right]\\ &=\frac{1-f}{2n}\frac{1}{N-1}\cdot \sum_{i=1}^{N}2(Y_i-\bar{Y})(X_i-\bar{X})\\ &=\frac{1-f}{n}S_{yx}. \end{aligned} \]

这时就有

\[\begin{aligned} E(s_{yx})&=E\left[\frac{1}{n-1}\sum_{i=1}^{n}(y_i-\bar{y})(x_i-\bar{x}) \right]\\ &=\frac{1}{n-1}E\left(\sum_{i=1}^{n}y_ix_i \right)-\frac{n}{n-1}E(\bar{y}\bar{x})\\ &=\frac{f}{n-1}\sum_{i=1}^{N}Y_iX_i-\frac{n}{n-1}\bar{Y}\bar{X}-\frac{n}{n-1}\frac{1-f}{n}S_{yx}\\ &=\frac{f}{n-1}\left[(N-1)S_{yx}+N\bar{Y}\bar{X} \right]-\frac{n}{n-1}\bar{Y}\bar{X}-\frac{n}{n-1}\frac{1-f}{n}S_{yx}\\ &=S_{yx}\left[\frac{f(N-1)-n(1-f)}{n-1}\right]+\bar{Y}\bar{X}\left(\frac{fN-n}{n-1}\right)\\ &=S_{yx}. \end{aligned} \]

证明3

证明:比率估计量\(r\)的方差为

\[V(r)\approx \frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(Y_i-RX_i)^2=\frac{1}{\bar{X}^2}\frac{1-f}{n}(S^2-2RS_{yx}+R^2S_x^2). \]

定义\(G=Y-RX\),类似定义\(g_i\)\(\bar{g}\)\(\bar{G}\),容易验证\(\bar{G}=0\),从而

\[\begin{aligned} V(r)&\approx E(r-R)^2\\ &=E\left(\frac{\bar{y}-R\bar{x}}{\bar{x}} \right)^2\\ &\approx\frac{1}{\bar{X}^2}E(\bar{y}-R\bar{x})^2\\ &=\frac{1}{\bar{X}^2}E(\bar g^2)=\frac{1}{\bar{X}^2}V(\bar g)\\ &=\frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(G_i-\bar{G})^2\\ &=\frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(Y_i-RX_i)^2. \end{aligned} \]

对后面的等式,有

\[\begin{aligned} V(r)&\approx \frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(Y_i-RX_i)^2\\ &=\frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}[(Y_i-\bar{Y})-R(X_i-\bar{X})]^2\\ &=\frac{1}{\bar{X}^2}\frac{1-f}{n}\left[S^2-2RS_{yx}+R^2S_{x}^2 \right]. \end{aligned} \]

证明4

证明:对\(\bar{y}_{RC}=\dfrac{\bar{y}_{st}}{\bar{x}_{st}}\bar{X}\),有\(E(\bar{y}_{RC})\approx \bar{Y}\)\(\displaystyle{V(\bar{y}_{RC})\approx\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_h^2-2RS_{yxh}+R^2S_{xh}^2) }\)

\(E(\bar{x}_{st})\approx \bar{X}\),有

\[E(\bar{y}_{RC})=\bar{X}E\left(\frac{\bar{y}_{st}}{\bar{x}_{st}} \right)\approx E(\bar{y}_{st})=\bar{Y}. \]

作变换\(G=Y-RX\),类似定义\(G_{hi}\)\(\bar{G}_h\)\(\bar{g}_{st}\),我们有\(\bar{G}_h=\bar{Y}_h-R\bar{X}_h\)\(\bar{g}_{st}=\bar{y}_{st}-R\bar{x}_{st}\),故\(E(\bar{g}_{st})=0\)。因此

\[\begin{aligned} V(\bar{y}_{RC})&\approx E(\bar{y}_{RC}-\bar{Y})^2\\ &\approx E(\bar{y}_{st}-R\bar{x}_{st})^2\\ &=V(\bar{g}_{st})\\ &=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}S_{gh}^2\\ &=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}\left[\frac{1}{N_h-1}\sum_{i=1}^{N_h}(G_{hi}-\bar{G}_h)^2 \right]\\ &=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_{yh}^2-2RS_{yxh}+R^2S_{xh}^2). \end{aligned} \]

证明5

证明分层抽样的最优分配比例为

\[n_h\propto\frac{W_hS_h}{\sqrt{c_h}}. \]

这里\(c_h\)为调查第\(h\)层样本的单位成本。

我们有

\[z=\left(\sum_{h=1}^{L}n_hc_h \right)\left(\sum_{h=1}^{L}\frac{W_hS_h^2}{n_h} \right). \]

由柯西不等式,有

\[z\ge \left(\sum_{h=1}^{L}\sqrt{c_hW_hS_h^2} \right)^2, \]

当且仅当各层都有

\[\frac{n_h^2c_h}{W_hS_h^2}=K, \]

\(K\)为某一常数时等号成立,即

\[n_h\propto \frac{W_hS_h^2}{\sqrt{c_h}}. \]

证明6

证明整群抽样的设计效应约为

\[deff=\frac{V(\bar{\bar{y}})}{V_{srs}(\bar{\bar{y}})}\approx 1+(M-1)\rho_{c}. \]

这里\(\rho_c\)为群内相关系数,即

\[\rho_c=\frac{2\sum\limits_{i=1}^{N}\sum\limits_{j<k}^{M}(Y_{ij}-\bar{\bar{Y}})(Y_{ik}-\bar{\bar{Y}})}{(M-1)(NM-1)S^2}. \]

我们假设\(N,M\)都很大,这样\(N-1\approx N\)\(NM-1\approx NM\),于是

\[\begin{aligned} V(\bar{\bar{y}})&=\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(\bar{Y}_i-\bar{\bar{Y}})^2\\ &=\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}\left(\frac{1}{M}\sum_{j=1}^{M}Y_{ij}-\bar{\bar{Y}} \right)^2\\ &=\frac{1}{M^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}\left[\sum_{j=1}^{M}(Y_{ij}-\bar{\bar{Y}}) \right]^2\\ &=\frac{1-f}{nM}\frac{1}{M(N-1)}\sum_{i=1}^{N}\left[\sum_{j=1}^{M}(Y_{ij}-\bar{\bar{Y}})^2+2\sum_{j<k}^{M}(Y_{ij}-\bar{\bar{Y}})(Y_{ik}-\bar{\bar{Y}}) \right]\\ &=\frac{1-f}{nM}\frac{1}{M(N-1)}\left[(NM-1)S^2+(M-1)(NM-1)S^2\rho_c \right]\\ &=\frac{1-f}{nM}\frac{(NM-1)S^2}{M(N-1)}[1+(M-1)\rho_c]\\ &\approx \frac{1-f}{nM}S^2[1+(M-1)\rho_c]. \end{aligned} \]

注意到\(V_{srs}(\bar{\bar{y}})=\dfrac{1-f}{nM}S^2\),所以

\[deff\approx [1+(M-1)\rho_c]. \]

证明7

对于两阶段抽样,有

\[E(\hat\theta)=E_1E_2(\hat\theta),\\ V(\hat\theta)=V_1[E_2(\hat\theta)]+E_1[V_2(\hat\theta)]. \]

均值公式就是全期望公式。记\(E(\hat\theta)=\theta\),对方差有

\[\begin{aligned} V(\hat\theta)&=E(\hat\theta-\theta)^2\\ &=E_1E_2(\hat\theta-\theta)^2\\ &=E_1[E_2(\hat\theta^2)-2\theta E_2(\hat\theta)+\theta^2]\\ &=E_1[V_2(\hat\theta)+[E_2(\hat\theta)]^2]-\theta^2\\ &=E_1V_2(\hat\theta)+E_1[E_2(\hat\theta)]^2-[E_1E_2(\hat\theta)]^2\\ &=E_1V_2(\hat\theta)+V_1E_2(\hat\theta). \end{aligned} \]

证明8

对于两阶段抽样,证明:

\[V(\bar{\bar{y}})=\frac{1-f_1}{n}S_1^2+\frac{1-f_2}{nm}S_2^2. \]

我们有\(V(\bar{\bar{y}})=V_1E_2(\bar{\bar{y}}_2)+E_1V_2(\bar{\bar{y}}_2)\),针对两项分别计算。对第一项,有

\[\begin{aligned} V_1E_2(\bar{\bar{y}})&=V_1E_2\left(\frac{1}{n}\sum_{i=1}^{n}\bar{y}_i \right)\\ &=V_1\left(\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i \right)\\ &=\frac{1-f_1}{n}\frac{1}{N-1}\sum_{i=1}^{N}(\bar{Y}_i-\bar{\bar{Y}})^2\\ &=\frac{1-f_1}{n}S_1^2, \end{aligned} \]

对第二项,有

\[\begin{aligned} E_1V_2(\bar{\bar{y}})&=E_1V_2\left(\frac{1}{n}\sum_{i=1}^{n}\bar{y}_i \right)\\ &=E_1\left[\frac{1}{n^2}\sum_{i=1}^{n}\frac{1-f_2}{m}\frac{1}{M-1}\sum_{j=1}^{M}(Y_{ij}-\bar{Y}_i)^2 \right]\\ &=\frac{1}{n}E_1\left[\frac{1}{n}\sum_{i=1}^{n}\frac{1-f_2}{m}S_{2i}^2 \right]\\ &=\frac{1-f_2}{nm}E_1\left(\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2 \right)\\ &=\frac{1-f_2}{nm}\left(\frac{1}{N}\sum_{i=1}^{N}S_{2i}^2 \right)\\ &=\frac{1-f_2}{nm}S_2^2. \end{aligned} \]

原式得证。

证明9

对两阶段抽样,有

\[E(s_1^2)=S_1^2+\frac{1-f_2}{m}S_2^2,\\ E(s_2^2)=S_2^2. \]

\(s_1^2\),有

\[\begin{aligned} E_2[(n-1)s_1^2]&=E_2\left[\sum_{i=1}^{n}(\bar{y}_i-\bar{\bar{y}})^2 \right]\\ &=\sum_{i=1}^{n}E_2(\bar{y}_i^2)-nE_2(\bar{\bar{y}}^2)\\ &=\sum_{i=1}^{n}\{[E_2(\bar{y}_i)]^2+V_2(\bar{y}_i)\}-n\left\{[E_2(\bar{\bar{y}})]^2+V_2(\bar{\bar{y}}) \right\}\\ &=\sum_{i=1}^{n}\bar{Y}_i^2+\sum_{i=1}^{n}\frac{1-f_2}{m}S_{2i}^2-n\left(\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i \right)^2-\frac{1-f_2}{nm}\sum_{i=1}^{n}S_{2i}^2, \end{aligned} \]

引入\(\bar{Y}_n=\displaystyle{\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i}\),我们有

\[\begin{aligned} E_2[(n-1)s_2^2]&=\sum_{i=1}^{n}(\bar{Y}_i-\bar{Y}_{n})^2+\frac{(n-1)(1-f_2)}{nm}\sum_{i=1}^{n}S_{2i}^2,\\ E(s_2^2)&=E_1E_2(s_2^2)\\ &=E_1\left[\frac{1}{n-1}\sum_{i=1}^{n}(\bar{Y}_i-\bar{Y}_n)^2+\frac{1-f_2}{nm}\sum_{i=1}^{n}S_{2i}^2 \right]\\ &=\frac{1}{N-1}\sum_{i=1}^{N}(\bar{Y}_i-\bar{\bar{Y}})^2+\frac{1-f_2}{m}E_1\left(\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2 \right)\\ &=S_1^2+\frac{1-f_2}{m}\frac{1}{N}\sum_{i=1}^{N}S_{2i}^2\\ &=S_1^2+\frac{1-f_2}{m}S_{2}^2. \end{aligned} \]

\(s_2^2\),有

\[\begin{aligned} E_2(s_2^2)&=E_2\left(\frac{1}{n}\sum_{i=1}^{n}s_{2i}^2 \right)\\ &=\frac{1}{n}\sum_{i=1}^{n}E_2(s_{2i}^2)\\ &=\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2,\\ E(s_2^2)&=E_1E_2(s_2^2)\\ &=E_1\left(\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2 \right)\\ &=\frac{1}{N}\sum_{i=1}^{N}S_{2i}^2\\ &=S_{2}^2. \end{aligned} \]

得证。

证明10

证明:对\(V(\hat{Y}_{HH})\)的无偏估计为

\[v(\hat{Y}_{HH})=\frac{1}{n}\frac{1}{n-1}\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i}-\hat{Y}_{HH} \right)^2. \]

\(t_i\)\(Y_i\)的入样次数,则\(\displaystyle{\sum_{i=1}^{N}t_i=n}\),诸\(t_i\)服从多项分布\(B(n;Z_1,Z_2,\cdots,Z_N)\),故

\[E(t_i)=nZ_i,\quad V(t_i)=nZ_i(1-Z_i),\quad \mathrm{cov}(t_i,t_j)=-nZ_iZ_j. \]

注意到\(V(\hat{Y}_{HH})=\dfrac{1}{n}\displaystyle{\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2}\),于是

\[\begin{aligned} E\left[\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i}-\hat{Y}_{HH} \right)^2\right]&=E\left[\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i} \right)^2-n\hat{Y}_{HH}^2 \right]\\ &=E\left[\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i}-Y \right)^2-n(\hat{Y}_{HH}-Y)^2 \right]\\ &=E\left[\sum_{i=1}^{N}t_i\left(\frac{Y_i}{Z_i}-Y \right)^2 \right]-nE(\hat{Y}_{HH}-Y)^2\\ &=\sum_{i=1}^{N}nZ_i\left(\frac{Y_i}{Z_i}-Y \right)^2-nV(\hat{Y}_{HH})\\ &=(n^2-n)V(\hat{Y}_{HH}). \end{aligned} \]

结论得证。

证明11

证明当\(n\)固定时,对HT统计量的方差,有

\[V(\hat{Y}_{HT})=\sum_{i=1}^{N}\frac{1-\pi_i}{\pi_i}Y_i^2+2\sum_{i<j}^{N}\frac{\pi_{ij}-\pi_i\pi_j}{\pi_i\pi_j}Y_iY_j=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i}{\pi_i}-\frac{Y_j}{\pi_j} \right)^2. \]

注意到此时对给定的\(i\),总有

\[\sum_{j\ne i}^{N}(\pi_{ij}-\pi_i\pi_j)=\sum_{j\ne i}^{N}\pi_{ij}-\pi_i\sum_{j\ne i}^{N}\pi_j=(n-1)\pi_i-\pi_i(n-\pi_i)=-\pi_i(1-\pi_i), \]

所以

\[\begin{aligned} \sum_{i=1}^{N}\frac{1-\pi_i}{\pi_i}Y_i^2&=\sum_{i=1}^{N}\frac{\pi(1-\pi_i)Y_i^2}{\pi_i^2}\\ &=\sum_{i=1}^{N}\sum_{j\ne i}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i^2}{\pi_i^2} \right)\\ &=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i^2}{\pi_i^2}+\frac{Y_j^2}{\pi_j^2} \right), \end{aligned} \]

加上第二项,就得到

\[\begin{aligned} V(\hat{Y}_{HT})&=\sum_{i<j}^{N}\left[(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i^2}{\pi_i^2}+\frac{Y_j^2}{\pi_j^2}-2\frac{Y_iY_j}{\pi_i\pi_j} \right) \right]\\ &=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i}{\pi_i}-\frac{Y_j}{\pi_j} \right)^2. \end{aligned} \]

证明12

证明Brewer抽样方法是\(\mathrm{\pi PS}\)的,即

  1. 按照\(\dfrac{Z_i(1-Z_i)}{1-2Z_i}\)的概率抽取第一个单元;
  2. 在剩下的单元中,按照和\(Z_i\)成比例的概率抽取下一个单元。

\(\pi_i=2Z_i\)\(\pi_{ij}=\dfrac{4Z_iZ_j(1-Z_i-Z_j)}{(1-2Z_i)(1-2Z_j(1+\sum\limits_{i=1}^{N}\dfrac{Z_i}{1-2Z_i})}\)

\[\begin{aligned} D&=\sum_{i=1}^{N}\frac{Z_i(1-Z_i)}{1-2Z_i}\\ &=\sum_{i=1}^{N}\left(\frac{Z_i(1-Z_i)}{1-2Z_i}-\frac{1}{2}Z_i\right)+\frac{1}{2}\\ &=\sum_{i=1}^{N}\frac{Z_i}{2(1-2Z_i)}+\frac{1}{2}\\ &=\frac{1}{2}\left(\sum_{i=1}^{N}\frac{Z_i}{1-2Z_i}+1\right), \end{aligned} \]

\[\begin{aligned} \pi_i&=\frac{Z_i(1-Z_i)}{D(1-2Z_i)}+\sum_{j\ne i}^{N}\frac{Z_iZ_j}{D(1-2Z_j)}\\ &=\frac{Z_i}{D}\left[ 1+\frac{Z_i}{1-2Z_i}+\sum_{j\ne i}^{N}\frac{Z_j}{(1-2Z_j)}\right]\\ &=\frac{Z_i}{D}(2D)\\ &=2Z_i. \end{aligned} \]

\[\begin{aligned} \pi_{ij}&=\frac{Z_i(1-Z_i)}{D(1-2Z_i)}\cdot \frac{Z_j}{1-Z_i}+\frac{Z_j(1-Z_j)}{D(1-2Z_j)}\cdot\frac{Z_i}{1-Z_j}\\ &=\frac{Z_iZ_j(1-2Z_j)+Z_iZ_j(1-2Z_j)}{D(1-2Z_i)(1-2Z_i)}\\ &=\frac{2Z_iZ_j(1-Z_i-Z_j)}{(1-2Z_i)(1-2Z_j)\displaystyle{\left(1+\sum_{i=1}^{N}\frac{Z_i}{1-2Z_i} \right)}}. \end{aligned}, \]

得证。

证明13

证明系统抽样的方差为

\[V(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2, \]

这里

\[S^2=\frac{1}{N-1}\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2,\\ S_{wsy}^2=\frac{1}{k}\sum_{r=1}^{k}\frac{1}{n-1}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_{r})^2. \]

\(S^2\)进行分解,有

\[\begin{aligned} (N-1)S^2&=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2\\ &=\sum_{r=1}^{k}\sum_{j=1}^{n}({Y}_{rj}-\bar{Y}_r)^2+\sum_{r=1}^{k}\sum_{j=1}^{n}(\bar{Y}_r-\bar{Y})^2\\ &=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_r)^2+n\sum_{r=1}^{k}(\bar{Y}_{r}-\bar{Y})^2\\ &=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_r)^2+N\left[\frac{1}{k}\sum_{r=1}^{k}(\bar{Y}_r-\bar{Y})^2 \right]\\ &=k(n-1)S_{wsy}^2+NV(\bar{y}_{sy}), \end{aligned} \]

从而

\[V(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2. \]

证明14

对分层二重抽样,有

\[E(\bar{y}_{stD})=\bar{Y},\\ V(\bar{y}_{stD})=\left(\frac{1}{n'}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_h^2S_h^2}{n'}\left(\frac{1}{f_{hD}}-1 \right). \]

对均值,注意\(\displaystyle{\sum_{h=1}^{L}w_h'\bar{y}_h'=\bar{y}'}\),且\(\bar{y}'\)是从总体中以抽样比\(f_1=\dfrac{n'}{N}\)抽取的简单随机样本,所以

\[\begin{aligned} E(\bar{y}_{stD})&=E_1E_2\left(\sum_{h=1}^{L}w_h'\bar{y}_h \right)\\ &=E_1\left(\sum_{h=1}^{L}w_h'\bar{y}_h' \right)\\ &=E_1(\bar{y}')\\ &=\bar{y}. \end{aligned} \]

对方差,有\(V(\bar{y}_{stD})=V_1E_2(\bar{y}_{stD})+E_1V_2(\bar{y}_{stD})\),分别计算(注意\(n_h=n_h'f_{hD}\)\(n_h'=w_h'n'\)),有

\[\begin{aligned} V_1E_2(\bar{y}_{stD})&=V_1\left(\sum_{h=1}^{L}w_h'\bar{y}_h' \right)\\ &=V_1(\bar{y}')\\ &=\left(\frac{1}{n'}-\frac{1}{N} \right)S^2;\\ E_1V_2(\bar{y}_{stD})&=E_1\left[\sum_{h=1}^{L}w_h'^2s_h'^2\left(\frac{1}{n_h}-\frac{1}{n_h'} \right) \right]\\ &=E_1\left[\sum_{h=1}^{L}\frac{w_h's_h'^2}{n'}\left(\frac{1}{f_{hD}}-1 \right) \right]\\ &=\frac{1}{n'}\sum_{h=1}^{L}\left(\frac{1}{f_{hD}}-1 \right)E_1(w_h's_h'^2)\\ &=\frac{1}{n'}\sum_{h=1}^{L}\left(\frac{1}{f_{hD}}-1 \right)E_1[E_1(w_h's_h'^2|w_h')]\\ &=\frac{1}{n'}\sum_{h=1}^{L}\left(\frac{1}{f_{hD}}-1 \right)S_h^2E_1(w_h')\\ &=\sum_{h=1}^{L}\frac{W_hS_h^2}{n'}\left(\frac{1}{f_{hD}}-1 \right). \end{aligned} \]

这里运用到全概率公式,再代回即可得到结果。

证明15

证明分层二重抽样在成本\(C_{T}^*=c_1+\displaystyle{\sum_{h=1}^{L}c_{2h}n_h}\)下的样本量最优分配为:

\[f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}},\\ n'=\frac{C_{T}^*}{c_1+\displaystyle{\sum_{h=1}^{L}c_{2h}W_hf_{hD}}}. \]

方差为

\[V(\bar{y}_{stD})=\left(\frac{1}{n'}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_hS_h^2}{n'}\left(\frac{1}{f_{hD}}-1 \right)=\frac{S^2}{n'}+\sum_{h=1}^{L}\frac{W_hS_h^2}{n'f_{hD}}-\sum_{h=1}^{L}\frac{W_hS_h^2}{n'}-\frac{S^2}{N}, \]

故极小化

\[C_{T}^*\left(V+\frac{S^2}{N} \right)=\left(c_1+\sum_{h=1}^{L}c_{2h}f_{hD}W_h \right)\left[\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)+\sum_{h=1}^{L}\frac{W_hS_h^2}{f_{hD}} \right], \]

由Cauchy不等式,有

\[C_{T}^{*}\left(V+\frac{S^2}{N} \right)\ge \left[\sqrt{c\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}+\sum_{h=1}^{L}\sqrt{c_{2h}}W_hS_h \right]^2, \]

等号成立当且仅当

\[\frac{c_{2h}f_{hD}W_h}{W_hS_h^2/f_{hD}}=\frac{c_1}{\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}, \]

\[f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}}. \]

为得到\(n’\),只需代回。

练习题

1. 简单随机抽样

给定如下的数据框,这里\(Y\)是待估变量,\(X\)是辅助变量。

\[\begin{array}{c|cc} \hline Y & 4 & 6 & 8 & 5 & 4 \\ X & 2 & 3 & 3 & 2 & 1 \\ \hline \end{array} \]

且知道\(N=50\)\(n=5\)\(\bar{X}=2\),求:

  1. \(\bar{Y}\)的简单估计,及其\(95\%\)置信区间。
  2. \(\bar{Y}\)的比估计,及其\(95\%\)的置信区间。
  3. \(\bar{Y}\)的回归估计,及其\(95\%\)的置信区间。
  1. 对简单估计,有

    \[\bar{y}=5.4,\quad s^2=2.8 \\ v(\bar{y})=\frac{1-f}{n}s^2=0.504 \]

    计算\(\bar{y}\pm u_{\alpha/2}\sqrt{v(\bar{y})}\),得到置信区间为

    \[[4.0085,6.7915]. \]

  2. 对比估计,先计算得

    \[\bar{x}=2.2,\quad s_x^2=0.7,\quad s_y^2=2.8,\quad s_{xy}=1.15. \]

    所以

    \[r = \frac{\bar{y}}{\bar{x}}=2.4545,\\ \bar{y}_{R}=\frac{\bar{y}}{\bar{x}}\bar{X}=4.9091,\\ v(\bar{y}_{R})=\frac{1-f}{n}(s^2-2rs_{yx}+r^2s_x^2)=0.2469, \]

    计算\(\bar{y}_{R}\pm u_{\alpha/2}\sqrt{v(\bar{y}_{R})}\),得到置信区间为

    \[[4.2779,5.8650]. \]

  3. 对回归估计,需计算回归参数,即

    \[b=\frac{s_{yx}}{s_{x}^2}=1.6429,\\ \bar{y}_{lr}=\bar{y}+b(\bar{X}-\bar{x})=5.0714. \]

    为估计其方差,需计算相关系数,即

    \[\hat\rho=\frac{s_{yx}}{s_ys_x}=0.8214,\\ v(\bar{y})=\frac{1-f}{n}s_y^2(1-\hat\rho^2)=0.1639, \]

    计算\(\bar{y}_{lr}\pm u_{\alpha/2}\sqrt{v(\bar{y}_{lr})}\),得到置信区间为

    \[[4.2779,5.8650]. \]

2. 分层随机抽样的比估计

已知两层的总体数据为\(N_1=15\)\(N_2=10\)\(\bar X_1=20\)\(\bar X_2=50\)。从两层中各抽取\(3\)个样本,结果是

\[\begin{array}{c|cc} \hline Y_1 & 30 & 35 & 40 \\ X_1 & 18 & 18 & 25 \\ \hline Y_2 & 75 & 82 & 85 \\ X_2 & 55 & 40 & 60 \\ \hline \end{array} \]

  1. 给出\(\bar{Y}\)的分别比估计结果,并估计其方差。
  2. 给出\(\bar{Y}\)的联合比估计结果,并估计其方差。
  1. 已知\(W_1=0.6\)\(W_2=0.4\)\(f_1=0.2\)\(f_2=0.3\)。对分别比估计,有

    \[\bar{r}_1=1.7213,\\ \bar{r}_2=1.5613,\\ \bar{y}_{RS}=W_1\bar{X}_1\bar{r}_1+W_2\bar{X}_2\bar{r}_2=51.8815. \]

    对其方差,有

    \[v(\bar{y}_{RS})=\sum_{h=1}^{2}W_h^2\frac{1-f_h}{n_h}(s_{yh}^2-2\bar{r}_hs_{yxh}+\bar{r}_h^2s_{xh}^2)=12.0071. \]

  2. 对联合比估计,有

    \[\bar{y}_{st}=\sum_{h=1}^{2}W_h\bar{y}_h=53.2667,\\ \bar{x}_{st}=\sum_{h=1}^{2}W_h\bar{x}_h=32.8667, \]

    \[r=\frac{\bar{y}_{st}}{\bar{x}_{st}}=1.6207,\quad \bar{X}=32,\\ \bar{y}_{RC}=\frac{\bar{y}_{st}}{\bar{x}_{st}}\bar{X}=51.8620,\\ v(\bar{y}_{RC})=\sum_{h=1}^{2}W_h^2\frac{1-f_h}{n_h}(s_h^2-2rs_{yxh}+r^2s_{xh}^2)=12.5786. \]

3. 分层随机抽样的样本分配

对一个两层总体调查比率,\(N_1=10\)\(N_2=20\)\(n_1=n_2=5\),算得\(p_1=0.4\)\(p_2=0.2\)

  1. 试使用分层随机抽样估计\(P\),并给出\(p_{st}\)的标准差。
  2. 计算Neyman分配时,以及\(c_2=4c_1\)时最优分配时,两层样本量的比值。
  1. \(p_{st}\)的估计,有

    \[p_{pst}=\frac{1}{3}p_1+\frac{2}{3}p_2=0.266667. \]

    对方差估计,有

    \[s_h^2=\frac{n_hp_h(1-p_h)}{n_h-1}, \]

    所以

    \[s_1^2=1.25\times 0.4\times 0.6=0.3,\\ s_2^2=1.25\times 0.2\times 0.8=0.2,\\ v(p_{st})=\frac{1}{9}\frac{1-0.5}{5}0.3+\frac{4}{9}\frac{1-0.25}{5}0.2=0.016667,\\ \sigma(p_{st})=0.1291. \]

  2. 对于最优分配,有\(n_h\propto W_hS_h\),所以

    \[\frac{n_1}{n_2}=\frac{1/3\times \sqrt{0.3}}{2/3\times\sqrt{0.2}}=0.6124. \]

    对于一般情况下的最优分配,有\(n_h\propto W_hS_h/\sqrt{c_h}\),所以

    \[\frac{n_1}{n_2}=\frac{1/3\times \sqrt{0.3}}{2/3\times \sqrt{0.2}\times \sqrt{4}}=0.3062. \]

4. 等概率整群抽样

现有\(10\)个等规模\(M=10\)的群,随机抽取了\(4\)个整群,调查得到其群总值分别为

\[\begin{array}{c|c} \hline i & y_i & y_{ij}\\ \hline 1 & 19 & 1,2,1,3,3,2,1,4,1,1 \\ 2 & 20 & 1,3,2,2,3,1,4,1,1,2 \\ 3 & 16 & 2,1,1,1,1,3,2,1,3,1 \\ 4 & 20 & 1,1,3,2,1,5,1,2,3,1 \\ \hline \end{array} \]

  1. \(\bar{\bar{y}}\)的估计及其标准差。
  2. 求设计效应。
  1. \(\bar{y}_1=1.9\)\(\bar{y}_{2}=2\)\(\bar{y}_3=1.6\)\(\bar{y}_4=2\)。由简单随机抽样的性质,有

    \[\bar{\bar{y}}=\frac{1}{4}\sum_{i=1}^{4}\bar{y}_i=1.875, \]

    \[v(\bar{\bar{y}})=\frac{1-0.4}{4}\frac{1}{3}\sum_{i=1}^{4}(\bar{y}_i-\bar{\bar{y}})^2=0.005375,\\ \sigma(\bar{\bar{y}})=0.07331. \]

  2. 此时

    \[s_{b}^2=\frac{1}{n-1}\sum_{i=1}^{4}M(\bar y_i-\bar{\bar y}_i)^2=0.358333,\\ s_w^2=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{M-1}\sum_{j=1}^{M}(y_{ij}-\bar{y}_i)^2=1.202778, \]

    所以

    \[\hat \rho_c=\frac{s_b^2-s_w^2}{s_b^2+(M-1)s_w^2}=-0.0755,\\ deff\approx 1+(M-1)\hat\rho^c=0.3204. \]

5. 两阶段抽样

现有\(N=10\)个等规模的的群,每个群中有\(M=50\)个个体,从中抽取\(3\)个群,每个群抽取\(5\)个样本,得到的结果如下:

\[\begin{array}{c|cc} \hline 1 & 20 & 25 & 20 & 25 & 20 \\ 2 & 18 & 20 & 22 & 25 & 20 \\ 3 & 25 & 28 & 18 & 15 & 21 \\ \hline \end{array} \]

  1. 试求\(\bar{\bar{Y}}\)的估计量及其方差,并给出\(95\%\)置信区间。
  2. 如抽取一个群的成本为\(c_1\),调查一个个体的成本为\(c_2\),其他字母同教材定义,试导出最优的\(m\)
  1. 先计算以下量:

    \[\bar{y}_1=22,\quad s_{21}=7.5;\\ \bar{y}_2=21,\quad s_{22}=7;\\ \bar{y}_3=21.4,\quad s_{23}=27.3. \]

    所以

    \[\bar{\bar{y}}=\frac{1}{3}\sum_{i=1}^{3}\bar{y}_i=21.4667,\\ s_{1}^2=\frac{1}{2}\sum_{i=1}^{3}(\bar{y}_i-\bar{\bar{y}})^2=0.253333,\\ s_2^2=\frac{1}{3}\sum_{i=1}^{3}s_{2i}^2=13.9333. \]

    得其方差为

    \[v(\bar{\bar{y}})=\frac{1-0.3}{3}s_1^2+\frac{0.3(1-0.1)}{15}s_2^2=0.3099, \]

    从而\(95\%\)置信区间是

    \[[20.3756,22.5578]. \]

  2. 两阶段抽样的方差为

    \[V=\frac{1}{n}S_1^2-\frac{1}{N}S_1^2+\frac{1}{nm}S_2^2-\frac{1}{n}\frac{S^2_2}{M}, \]

    故对下式进行最小化:

    \[(c_1n+c_2nm)\left(\frac{S_1^2-S_2^2/M}{n}+\frac{S_2^2}{nm} \right)=(c_1+c_2m)\left(S_1^2-\frac{S_2^2}{M}+\frac{S_2^2}{m} \right). \]

    从而

    \[\frac{c_1}{S_1^2-S_2^2/M}=\frac{c_2m^2}{S_2^2},\\ m_{opt}=\sqrt{\frac{c_1S_2^2}{c_2\left(S_1^2-\dfrac{S_2^2}{M}\right)}}. \]

    \[\hat{S}_1^2=s_1^2-\frac{1-f_1}{M}s_2^2,\\ \hat{S}_2^2=s_2^2. \]

    注:若代入本题数据,得出的\(m_{opt}\)将是负值,故请不要代入计算。

6. \(\mathrm{PPS}\)抽样

对一个\(N=10\)的总体执行不等概抽样,抽样结果如下:

\[\begin{array}{c|cc} \hline i & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline Z_i & 0.2 & 0.2 & 0.1 & 0.05 & 0.05 & 0.05 & 0.05 & 0.1 & 0.1 & 0.1 \\ t_i & 2 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 \\ Y_i & 35 & ? & ? & 40 & ? & ? & ? & 20 & 40 & ? \\ \hline \end{array} \]

求总体均值的估计,并给出相应的方差。

构造汉森-赫维茨统计量为

\[\hat{Y}_{HH}=\frac{1}{5}\sum_{i=1}^{5}\frac{y_i}{Z_i}=350,\\ \bar{y}_{HH}=\frac{\hat{Y}_{HH}}{N}=35. \]

方差有

\[v(\hat{Y}_{HH})=\frac{1}{5\times 4}\sum_{i=1}^{5}\left(\frac{y_i}{Z_i}-\hat{Y}_{HH} \right)^2=14437.5,\\ v(\bar{y}_{HH})=\frac{v(\hat{Y}_{HH})}{N^2}=144.375. \]

7. 两阶段放回不等概抽样

假设某总体共有\(N=10\)个群,每个群中有\(M=10\)个个体。现进行两阶段放回不等概抽样,第一阶段中抽到了两次\(Y_1\),一次\(Y_{2}\)与一次\(Y_3\),其抽选概率分别为

\[Z_1=0.5,\quad Z_2=Z_3=0.1. \]

现对\(Y_1\)执行两次简单随机抽样,对\(Y_2,Y_3\)各执行一次,取\(m=4\),抽样结果如下:

\[\begin{array}{c|cc} \hline Y_1^{(1)} & 3 & 5 & 8 & 10 \\ Y_1^{(2)} & 3 & 7 & 7 & 9 \\ Y_2 & 6 & 9 & 10 & 12 \\ Y_3 & 10 & 15 & 18 & 20 \\ \hline \end{array} \]

试作\(\bar{\bar{Y}}\)的估计,并求其方差。

对总体总值作估计,有

\[\hat{Y}_1=65,\quad \hat{Y}_2=65,\quad \hat{Y}_3=92.5,\quad \hat{Y}_4=157.5, \]

于是

\[\hat{Y}_{HH}=\frac{1}{4}\sum_{i=1}^{4}\frac{\hat{Y}_i}{Z_i}=690,\quad \bar{y}_{HH}=6.9;\\ v(\hat{Y}_{HH})=\frac{1}{n(n-1)}\sum_{i=1}^{4}\left(\frac{\hat{Y}_i}{Z_i}-\hat{Y}_{HH} \right)^2=122137.5,\quad v(\bar{y}_{HH})=12.21375. \]

8. \(\mathrm{\pi PS}\)抽样

考虑一个\(N=8\)个体的总体,欲采用Brewer抽样法获得两个样本:\(y_1=12\)\(y_2=20\),且\(Z_1=0.2\)\(Z_2=0.1\)

  1. 简述Brewer抽样方法与实施条件。
  2. 构造霍维茨-汤普森估计量,对总体总值进行估计。
  3. 如果这两个样本是按照Yates-Grundy逐个抽取法抽取的,且下一个抽取了\(y_3=15\)\(Z_3=0.05\),构造Raj估计量对总体总值进行估计,并估计其方差。
  1. Brewer抽样,第一步按与\(\dfrac{Z_i(1-Z_i)}{1-2Z_i}\)成比例的概率抽取第一个样本,抽到的样本视为\(j\);第二步按与\(Z_i\)成比例的概率即\(\dfrac{Z_i}{1-Z_j}\)抽取第二个样本。

    实施条件是\(1-2Z_i>0\),即对每一个\(i\)都有\(Z_i<1/2\)

  2. 对总体总值的估计为

    \[\hat{Y}_{HT}=\frac{1}{2}\left(\frac{y_1}{Z_1}+\frac{y_2}{Z_2} \right)=130. \]

  3. 计算得

    \[t_1=\frac{y_1}{Z_1}=60,\\ t_2=y_1+\frac{y_2}{Z_2}(1-Z_1)=172,\\ t_3=y_1+y_2+\frac{y_3}{Z_3}(1-Z_1-Z_2)=242. \]

    所以

    \[\hat{Y}_{Raj}=\frac{1}{3}\sum_{i=1}^{3}t_i=158,\\ v(\hat{Y}_{Raj})=\frac{1}{3\times 2}\sum_{i=1}^{3}(t_i-\hat{Y}_{Raj})^2=2809.333. \]

9. 系统抽样

设总体\(N=30\),欲抽取\(10\)个样本。

  1. 若样本中包含\(Y_{16}\),求所有样本。
  2. 在什么情况下,系统抽样优于简单随机抽样。
  1. \(16\%3=1\),故样本起点为\(Y_1\),所有样本是

    \[Y_1,Y_4,Y_7,Y_{10},Y_{13},\\ Y_{16},Y_{19},Y_{22},Y_{25},Y_{28}. \]

  2. \[S^2=\frac{1}{N-1}\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2,\\ S_{wsy}^2=\frac{1}{k}\sum_{r=1}^{k}\frac{1}{n-1}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_j)^2. \]

    \[\begin{aligned} (N-1)S^2&=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2\\ &=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_r)^2+\sum_{r=1}^{k}n(\bar{Y}_{r}-\bar{Y})^2\\ &=k(n-1)S_{wsy}^2+Nv(\bar{y}_{sy}), \end{aligned} \]

    因此

    \[v(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2. \]

    \[v(\bar{y}_{srs})=\frac{1-f}{n}S^2=\frac{k-1}{N}S^2, \]

    作差得

    \[v(\bar{y}_{srs})-v(\bar{y}_{sy})=\frac{(k-N)S^2+k(n-1)S_{wsy}^2}{N}=\frac{k(n-1)(S_{wsy}^2-S^2)}{N}. \]

    \(S_{wsy}^2>S^2\)时系统抽样更优。

10. 分层二重抽样

一个含\(1000000\)个体的总体可分为\(2\)层,由于总体情况未知,先抽取\(n'=10000\)个个体进行预调查,得到结果为\(n_1'=2000\)\(n_2'=8000\)。接下来又抽取了\(n_1=n_2=5\)个个体进行细致调查,得到结果为\(\bar{y}_1=200\)\(\bar{y}_2=80\),其方差分别为\(s_1^2=4500\)\(s_2^2=200\)

  1. 求总体均值\(\bar{Y}\)的估计,并给出方差估计,这里抽样方差比可忽略。
  2. 求最优方差比\(f_{hD}\)
  1. 分层二重抽样估计为

    \[\bar{y}_{stD}=w_1'\bar{y}_1+w_2'\bar{y}_2=104,\\ \]

    对其方差,有

    \[v(\bar{y}_{stD})=\sum_{h=1}^{L}\frac{w_h's_h^2}{n_h}+\frac{1}{n'}\sum_{h=1}^{L}w_h'(\bar{y}-\bar{y}_{stD})^2=212.2304. \]

  2. 由于

    \[\begin{aligned} V(\bar{y}_{stD})&=\left(\frac{1}{n'}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_hS_h^2}{n'}\left(\frac{1}{f_{hD}}-1 \right)\\ &=\frac{1}{n'}\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)+\sum_{h=1}^{L}\frac{W_hS_h^2}{n'f_{hD}}-\frac{S^2}{N}. \end{aligned} \]

    \(C_{T}^*=\displaystyle{c_1n'+n'\sum_{h=1}^{L}c_{2h}W_hf_{hD}}\),所以对下式进行最小优化:

    \[\left(c_1+\sum_{h=1}^{L}c_{2h}W_hf_{hD} \right)\left[\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)+\sum_{h=1}^{L}\frac{W_hS_h^2}{f_{hD}} \right], \]

    因此

    \[\frac{c_1}{S^2-\displaystyle{\sum_{h=1}^{L}W_hS_h^2}}=\frac{c_{2h}f_{hD}^2}{S_h^2},\\ f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}}. \]

11. 二重抽样比估计

一个\(N\)很大的总体,由于总体情况未知,先抽取\(n'=10000\)个样本调查辅助变量\(X\),得到\(\bar{x}'=50\)。接下来,第二重抽样抽取\(10\)个样本,得到\(\bar{y}=80\)\(\bar{x}=40\)\(s_x^2=1600\)\(s_{yx}=2400\)\(s_{y}^2=8000\)。求二重抽样比估计\(\bar{y}_{RD}\),并计算其估计量方差。

二重抽样比估计为

\[\bar{y}_{RD}=\frac{\bar{y}}{\bar{x}}\bar{x}'=100. \]

这里\(\hat{R}=2\),于是方差估计为

\[v(\bar{y}_{RD})=\frac{1}{n}s_y^2+\left(\frac{1}{n}-\frac{1}{n'} \right)(\hat{R}^2s_{x}^2-2\hat{R}s_{yx})=480.32. \]

12. 捕获再捕获抽样

为估计湖中有多少条鱼,从中捞出\(1000\)条,标上记号后放回湖中,然后捞出\(150\)条,发现其中有\(10\)条有记号。用Chapman估计给出湖中鱼的总数,并给出方差估计,给出\(95\%\)的区间。

计算得

\[\tilde{N}=\frac{1001\times 151}{11}-1=13740,\\ v(\tilde{N})=\frac{1001\times 151\times 990\times 140}{11^2\times 12}=14428050. \]

于是置信区间是

\[[6295,21185]. \]

总述

抽样方法

  1. 简单随机抽样的简单估计。

    \[\bar{y}=\frac{1}{n}\sum_{i=1}^{n}y_i,\\ V(\bar{y})=\frac{1-f}{n}S^2,\\ v(\bar{y})=\frac{1-f}{n}s^2. \]

  2. 简单随机抽样的比估计。

    \[\bar{y}_{R}=\frac{\bar{y}}{\bar{x}}\bar{X},\quad r=\frac{\bar{y}}{\bar{x}}, \\ V(\bar{y}_{R})\approx \frac{1-f}{n}(S^2-2RS_{yx}+R^2S_x^2),\\ v(\bar{y}_{R})=\frac{1-f}{n}(s_y^2-2rs_{yx}+r^2s_{x}^2). \]

  3. 简单随机抽样的回归估计,回归参数已知。

    \[\bar{y}_{lr}=\bar{y}+\beta_0(\bar{X}-\bar{x}),\\ V(\bar{y}_{lr})\approx \frac{1-f}{n}(S^2-2\beta_0S_{yx}+\beta_0^2S_{x}^2),\\ v(\bar{y}_{lr})=\frac{1-f}{n}(s_y^2-2\beta_0x_{yx}+\beta_0^2s_{x}^2). \]

  4. 简单随机抽样的回归估计,回归参数未知。

    \[b=\frac{s_{yx}}{s_{x}^2},\\ \bar{y}_{lr}=\bar{y}+b(\bar{X}-\bar{x}),\\ V(\bar{y}_{lr})\approx \frac{1-f}{n}S^2(1-\rho^2),\\ v(\bar{y}_{lr})\approx \frac{1-f}{n}s_y^2(1-\hat\rho^2). \]

  5. 分层随机抽样的简单估计。

    \[\bar{y}_{st}=\sum_{h=1}^{L}W_h\bar{y}_{h},\\ V(\bar{y}_{st})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}S_h^2,\\ v(\bar{y}_{st})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}s_h^2. \]

  6. 分层随机抽样的分别比估计。

    \[\bar{y}_{RS}=\sum_{h=1}^{L}W_h\frac{\bar{y}_h}{\bar{x}_h}\bar{X}_h,\quad r_h=\frac{\bar{y}_h}{\bar{x}_j},\\ V(\bar{y}_{RS})\approx \sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_{yh}^2-2R_hS_{yxh}+R_h^2S_{xh}^2),\\ v(\bar{y}_{RS})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(s_{yh}^2-2r_hs_{yxh}+r_h^2s_{xh}^2). \]

  7. 分层随机抽样的联合比估计。

    \[\bar{y}_{RC}=\frac{\bar{y}_{st}}{\bar{x}_{st}}\bar{X},\quad r=\frac{\bar{y}_{st}}{\bar{x}_{st}},\\ V(\bar{y}_{RC})\approx \sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_{yh}^2-2RS_{yxh}+R^2S_{xh}^2),\\ v(\bar{y}_{RC})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(s_{yh}^2-2rs_{yxh}+r^2s_{xh}^2). \]

  8. 等概率等规模整群抽样。

    \[\bar{\bar{y}}=\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i,\\ V(\bar{\bar{y}})=\frac{1-f}{n}\sum_{i=1}^{n}(\bar{Y}_i-\bar{\bar{Y}})^2,\\ v(\bar{\bar{y}})=\frac{1-f}{n}\sum_{i=1}^{n}(\bar{y}_i-\bar{\bar{y}})^2. \]

  9. 等概率等规模两阶段抽样。

    \[\bar{\bar{y}}=\frac{1}{n}\sum_{i=1}^{n}\bar{y}_i,\\ V(\bar{\bar{y}})=\frac{1-f_1}{n}S_1^2+\frac{1-f_2}{nm}S_2^2,\\ v(\bar{\bar{y}})=\frac{1-f_1}{n}s_1^2+\frac{f_1(1-f_2)}{nm}s_2^2. \]

  10. 放回不等概抽样的汉森-赫维茨估计量。

    \[\hat{Y}_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{y_i}{Z_i},\\ V(\hat{Y}_{HH})=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2,\\ v(\hat{Y}_{HH})=\frac{1}{n(n-1)}\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat{Y}_{HH} \right)^2. \]

  11. 两阶段放回不等概抽样的汉森-赫维茨估计量。

    \[\hat{Y}_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{\hat{Y}_i}{Z_i},\\ V(\hat{Y}_{HH})=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2+\frac{1}{n}\sum_{i=1}^{N}\frac{V_2(\hat{Y}_i)}{Z_i},\\ v(\hat{Y}_{HH})=\frac{1}{n(n-1)}\sum_{i=1}^{n}\left(\frac{\hat{Y}_i}{Z_i}-\hat{Y}_{HH} \right)^2. \]

  12. 不放回不等概抽样中严格\(\mathrm{\pi PS}\)的赫维茨-汤普森估计量,\(n\)固定

    \[\hat{Y}_{HT}=\sum_{i=1}^{n}\frac{y_i}{\pi_i},\\ V(\hat{Y}_{HT})=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i}{\pi_i}-\frac{Y_j}{\pi_j} \right)^2,\\ v_{YGS}=\sum_{i<j}^{N}\frac{\pi_i\pi_j-\pi_{ij}}{\pi_{ij}}\left(\frac{y_i}{\pi_i}-\frac{y_j}{\pi_j} \right). \]

  13. 不严格\(\mathrm{\pi PS}\)的耶茨-格伦迪抽样的Raj估计量,\(n\)不固定。

    \[t_i=\sum_{j=1}^{i-1}y_j+\frac{y_i}{Z_i}\left(1-\sum_{j=1}^{i-1}Z_i \right),\\ \hat{Y}_{Raj}=\frac{1}{n}\sum_{i=1}^{n}t_i,\\ v(\hat{Y}_{Raj})=\frac{1}{n(n-1)}\sum_{i=1}^{n}(t_i-\hat{Y}_{Raj})^2. \]

  14. 分层二重抽样。

    \[\bar{y}_{stD}=\sum_{h=1}^{L}w_h'\bar{y}_h,\\ V(\bar{y}_{stD})=\left(\frac{1}{n'}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_h^2S_h^2}{n'}\left(\frac{1}{f_{hD}}-1 \right),\\ v(\bar{y}_{stD})\approx \sum_{h=1}^{L}\frac{w_h's_h^2}{n_h}+\frac{1}{n'}\sum_{h=1}^{L}w_h'(\bar{y}-\bar{y}_{stD}). \]

  15. 分层抽样比估计。

    \[\bar{y}_{RD}=\frac{\bar{y}}{\bar{x}}\bar{x}',\quad r=\frac{\bar{y}}{\bar{x}},\\ V(\bar{y}_{RD})\approx \left(\frac{1}{n'}-\frac{1}{N} \right)S_y^2+\left(\frac{1}{n}-\frac{1}{n'} \right)(S_y^2-2RS_{yx}+R^2S_x^2),\\ v(\bar{y}_{RD})=\frac{1}{n}s_{y}^2+\left(\frac{1}{n}-\frac{1}{n'} \right)(r^2s_{x}^2-2rs_{yx}). \]

  16. 等距等概率系统抽样。

    \[\bar{y}_{sy}=\frac{1}{n}\sum_{i=1}^{n}\bar{y}_{i},\\ V(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2. \]

  17. 捕获再捕获抽样。

    \[\tilde{N}=\frac{(n_1+1)(n_2+1)}{m+1}-1,\\ v(\tilde{N})=\frac{(n_1+1)(n_2+1)(n_1-m)(n_2-m)}{(m+1)^2(m+2)}. \]

其他公式

  1. 分层抽样的最优分配与Neyman分配:

    \[n_h\propto\frac{W_hS_h}{\sqrt{c_n}}\xlongequal{c_n=c}W_hS_h. \]

  2. 整群抽样的三大方差以及相应的估计:

    \[S^2=\frac{1}{NM}\sum_{i=1}^{N}\sum_{j=1}^{M}(Y_{ij}-\bar{\bar{Y}})^2,\\ S_b^2=\frac{1}{N-1}\sum_{i=1}^{N}M(\bar{Y}_i-\bar{\bar{Y}})^2,\\ S_{w}^2=\frac{1}{N(M-1)}\sum_{i=1}^{N}\sum_{j=1}^{M}(Y_{ij}-\bar{Y}_i)^2,\\ s_b^2=\frac{1}{n-1}\sum_{i=1}^{n}M(\bar{y}_i-\bar{\bar{y}})^2,\\ s_w^2=\frac{1}{n(M-1)}\sum_{i=1}^{n}\sum_{j=1}^{M}(Y_{ij}-\bar{Y}_i)^2. \]

  3. 整群抽样的群内相关系数估计,设计效应:

    \[\hat\rho_c=\frac{s_b^2-s_w^2}{s_b^2+(M-1)s_w^2},\\ deff\approx 1+(M-1)\hat\rho_c. \]

  4. Brewer抽样方法抽取第一个样本的概率,入样概率:

    \[Z_i^*\propto\frac{Z_i(1-Z_i)}{1-2Z_i},\\ \pi_i=2Z_i,\\ \pi_{ij}=\frac{4Z_iZ_j(1-Z_i-Z_i)}{(1-2Z_i)(1-2Z_j)\displaystyle{\left(1+\sum_{i=1}^{N}\frac{Z_i}{1-2Z_i}\right)}}. \]

  5. 水野法抽取第一个样本的概率:

    \[Z_i^*=\frac{n(N-1)Z_i}{N-n}-\frac{n-1}{N-n}. \]

  6. 分层二重抽样的最优方差比:

    \[f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}}. \]

  7. 二重抽样比估计的最优二重抽样比:

    \[f=\sqrt{\frac{c_1(S_y^2+R^2S_x^2-2RS_{yx})}{c_2(2RS_{yx}-R^2S_x^2)}}. \]

posted @ 2021-07-01 01:07  江景景景页  阅读(1433)  评论(0编辑  收藏  举报