【抽样调查】三阶段等规模等概抽样
三阶段抽样
基本公式
现用\(\mathbb{E}_3,\mathbb{D}_3\)表示在固定初级单元、二级单元时,对第三阶段抽样求均值和方差;\(\mathbb{E}_2,\mathbb{D}_2\)表示在固定初级单元时,对第二阶段求均值和方差;\(\mathbb{E}_1,\mathbb{D}_1\)表示对初级单元求均值和方差。显然有
于是
注意到\(\mathbb{E}_2\mathbb{E}_3(\hat\theta^2)-[\mathbb{E}_2\mathbb{E}_3(\hat\theta)]^2\)实际上是固定初级单元时,对后面两个单元合作抽样的方差,所以
故
等概率三阶段抽样
考虑初级单元中二级单元个数相等,二级单元中三级单元个数相等的情形。第一阶段从包含\(N\)个初级单元的总体中以简单随机抽样方式抽取\(n\)个初级单元;第二阶段从包含\(M\)个二级单元的总体中以简单随机抽样方式抽取\(m\)个二级单元;第三阶段从包含\(K\)个三级单元的总体中以简单随机抽样方式抽取\(k\)个三级单元。
对总体均值的估计为
证明\(\bar{\bar{\bar y}}\)的无偏性
可以证明\(\bar{\bar{\bar y}}\)是\(\bar{\bar{\bar Y}}\)的无偏估计,即
此时\(\mathbb{E}(\bar{\bar {\bar y}})=\mathbb{E}_1\mathbb{E}_2\mathbb{E}_3(\bar{\bar {\bar y}})\),且三个阶段均是简单随机抽样。有
\[\begin{aligned} \mathbb{E}(\bar{\bar {\bar y}})&=\mathbb{E}_1\mathbb{E}_2\mathbb{E}_3(\bar{\bar {\bar y}})\\ &=\mathbb{E}_1\mathbb{E}_2\mathbb{E}_3\left(\frac{1}{nm}\sum_{i=1}^{n}\sum_{j=1}^{m}{\bar y}_{ij} \right)\\ &=\mathbb{E}_1\mathbb{E}_2\left(\frac{1}{nm}\sum_{i=1}^{n}\sum_{j=1}^{m}\bar{ Y}_{ij} \right)\\ &=\frac{1}{nm}\mathbb{E}_1\left[\sum_{i=1}^{n}\mathbb{E}_2\left(\sum_{j=1}^{m}\bar{ Y}_{ij} \right)\right] \end{aligned} \]此处,\(\displaystyle{\sum_{j=1}^{m}\bar{Y}_{ij}}\)是第二阶段简单随机抽样(将二级单元的总体均值视为抽样单元)的样本总值,故\(\displaystyle{\mathbb{E}_2\left(\frac{1}{m}\sum_{j=1}^{m}\bar{Y}_{ij} \right)=\bar {\bar Y}_{i}}\);接下来,\(\displaystyle{\sum_{i=1}^{n}\bar{\bar Y}_{i}}\)是第一阶段简单随机抽样(将一级单元总体均值视为抽样单元)的样本总值,故\(\displaystyle{\mathbb{E}_1\left(\frac{1}{n}\sum_{i=1}^{n}\bar{\bar Y_i} \right)}=\bar{\bar{\bar{Y}}}\)。从而
\[\mathbb{E}(\bar{\bar {\bar y}})=\bar{\bar{\bar Y}}. \]
计算\(\bar{\bar{\bar y}}\)的方差
下计算其方差,先给出几个记号:
并记三级抽样的抽样比分别为
此时
有
\[\mathbb{D}(\bar{\bar{\bar y}})=\mathbb{E}_1\mathbb{E}_2\mathbb{D}_3(\bar{\bar{\bar y}})+\mathbb{E}_1\mathbb{D}_2\mathbb{E}_3(\bar{\bar{\bar y}})+\mathbb{D}_1\mathbb{E}_2\mathbb{E}_3(\bar{\bar{\bar y}}). \]逐项计算。
计算第一项要用到此结果:
\[\displaystyle{\mathbb{D}_3(\bar{\bar y}_{ij})=\frac{1-f_3}{k}\frac{1}{K-1}\sum_{u=1}^{K}(Y_{iju}-\bar{\bar{Y}}_{ij})^2}, \]由于\(\displaystyle{\frac{1}{K-1}\sum_{u=1}^{K}(Y_{iju}-\bar{\bar Y}_{ij})^2}=S_{3ij}^2\),并且\(\displaystyle{S_3^2=\frac{1}{NM}\sum_{i=1}^{N}\sum_{j=1}^{M}S_{3ij}^2}\),所以
\[\begin{aligned} \mathbb{E}_1\mathbb{E}_2\mathbb{D}_3(\bar{\bar{\bar y}})&=\mathbb{E}_1\mathbb{E}_2\mathbb{D}_3\left(\frac{1}{nm}\sum_{i=1}^{n}\sum_{j=1}^{m}\bar{ y}_{ij} \right)\\ &=\frac{1}{n^2m^2}\mathbb{E}_1\mathbb{E}_2\sum_{i=1}^{n}\sum_{j=1}^{m}\mathbb{D}_3(\bar{ y}_{ij})\\ &=\frac{1-f_3}{nmk}\mathbb{E}_1\mathbb{E}_2\left(\frac{1}{nm}\sum_{i=1}^{n}\sum_{j=1}^{m}S_{3ij}^2 \right)\\ &=\frac{1-f_3}{nmk}\frac{1}{NM}\sum_{i=1}^{N}\sum_{j=1}^{M}S_{3ij}^2\\ &=\frac{1-f_3}{nmk}S_3^2. \end{aligned} \]计算第二项要用到此结果:
\[\mathbb{D}_2\left(\frac{1}{m}\sum_{j=1}^{m}\bar { Y}_{ij} \right)=\frac{1-f_2}{m}\frac{1}{M-1}\sum_{j=1}^{M}(\bar{ Y}_{ij}-\bar{\bar Y}_i)^2 \]由于\(\displaystyle{\frac{1}{M-1}\sum_{j=1}^{M}(\bar{Y}_{ij}-\bar{\bar Y}_i)^2=S_{2i}^2}\),并记\(\displaystyle{S_2^2=\frac{1}{N}\sum_{i=1}^{N}S_{2i}^2}\),则第二项为
\[\begin{aligned} \mathbb{E}_1\mathbb{D}_2\mathbb{E}_3(\bar{\bar{\bar y}})&=\mathbb{E}_1\mathbb{D}_2\left(\frac{1}{nm}\sum_{i=1}^{n}\sum_{j=1}^{m}\bar{ Y}_{ij} \right)\\ &=\frac{1}{n^2}\mathbb{E}_1\left[\sum_{i=1}^{n}\mathbb{D}_2\left(\frac{1}{m}\sum_{j=1}^{m}\bar{ Y}_{ij} \right)\right]\\ &=\frac{1}{n^2}\mathbb{E}_1\left(\sum_{i=1}^{n}\frac{1-f_2}{m}S_{2i}^2 \right)\\ &=\frac{1-f_2}{nm}\mathbb{E}_1\left(\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2 \right)\\ &=\frac{1-f_2}{nm}\frac{1}{N}\sum_{i=1}^{N}S_{2i}^2\\ &=\frac{1-f_2}{nm}S_{2}^2. \end{aligned} \]记\(\displaystyle{S_1^2=\frac{1}{N-1}\sum_{i=1}^{N}(\bar{\bar Y}_i-\bar{\bar{\bar Y}})^2}\),则第三项为
\[\begin{aligned} \mathbb{D}_1\mathbb{E}_2\mathbb{E}_3(\bar{\bar{\bar y}})&=\mathbb{D}_1\left(\frac{1}{n}\sum_{i=1}^{n}\bar{\bar Y}_i \right)\\ &=\frac{1-f_1}{n}\frac{1}{N-1}\sum_{i=1}^{N}(\bar{\bar Y}_i-\bar{\bar{\bar Y}})^2\\ &=\frac{1-f_1}{n}S_1^2. \end{aligned} \]
寻找\(\mathbb{D}(\bar{\bar{\bar y}})\)的无偏估计
先给出以下记号:
此时\(\mathbb{D}(\bar{\bar{\bar y}})\)的无偏估计为
需要分别计算\(\mathbb{E}(s_1^2),\mathbb{E}(s_2^2),\mathbb{E}(s_3^2)\),从后向前计算。
\[\begin{aligned} \mathbb{E}(s_3^2)&=\mathbb{E}_1\mathbb{E}_2\mathbb{E}_3(s_3^2)\\ &=\mathbb{E}_1\mathbb{E}_2\mathbb{E}_3\left(\frac{1}{nm(k-1)}\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{u=1}^{k}(y_{iju}-\bar{y}_{ij})^2 \right)\\ &=\mathbb{E}_1\mathbb{E}_2\left[\frac{1}{nm}\sum_{i=1}^{n}\sum_{j=1}^{m} \mathbb{E}_3\left(\frac{1}{k-1}\sum_{u=1}^{k}(y_{iju}-\bar{y}_{ij})^2 \right) \right]\\ &=\mathbb{E}_1\mathbb{E}_2\left[\frac{1}{nm}\sum_{i=1}^{n}\sum_{j=1}^{m}\frac{1}{K-1}\sum_{u=1}^{K}(Y_{iju}-\bar{Y}_{ij})^2 \right]\\ &=\mathbb{E}_1\left[\frac{1}{n}\sum_{i=1}^{n} \mathbb{E}_2\left(\frac{1}{m}\sum_{j=1}^{m}S_{3ij}^2 \right) \right]\\ &=\mathbb{E}_1\left[\frac{1}{n}\sum_{i=1}^{n}\frac{1}{M}\sum_{j=1}^{M}S_{3ij}^2 \right]\\ &=\frac{1}{NM}\sum_{i=1}^{N}\sum_{j=1}^{M}S_{3ij}^2\\ &=S_3^2. \end{aligned} \]下计算\(\mathbb{E}(s_2^2)\),可先计算\(\mathbb{E}[(m-1)s_2^2]\),在此前先计算\(\mathbb{E}_3[(m-1)s_2^2]\),有
\[\begin{aligned} \mathbb{E}_3[(m-1)s_2^2]&=\mathbb{E}_3\left[\frac{1}{n}\sum_{i=1}^{n}\sum_{j=1}^{m}(\bar y_{ij}-\bar{\bar y}_i)^2 \right]\\ &=\frac{1}{n}\sum_{i=1}^{n}\left[\sum_{j=1}^{m}\mathbb{E}_3(\bar y_{ij}^2)-m\mathbb{E}_3(\bar{\bar y}_i^2) \right]\\ &=\frac{1}{n}\sum_{i=1}^{n}\left[\sum_{j=1}^{m}\{[\mathbb{E}_3(\bar y_{ij})]^2+\mathbb{D}_3(\bar{y}_{ij})\}-m\{[\mathbb{E}_3(\bar{\bar y}_i)]^2+\mathbb{D}_3(\bar{\bar y}_i) \} \right]\\ &=\frac{1}{n}\sum_{i=1}^{n}\left[\sum_{j=1}^{m}\left(\bar Y_{ij}^2+\frac{1-f_3}{k}S_{3ij}^2 \right)-m\left(\frac{1}{m}\sum_{j=1}^{m}\bar Y_{ij} \right)^2-\frac{1-f_3}{mk}\sum_{j=1}^{m}S_{3ij}^2 \right], \end{aligned} \]记\(\displaystyle{\bar{\bar Y}_{m}=\frac{1}{m}\sum_{j=1}^{m}\bar Y_{ij}}\),则
\[\begin{aligned} \mathbb{E}_3[(m-1)s_2^2]&=\frac{1}{n}\sum_{i=1}^{n}\left[\sum_{j=1}^{m}(\bar Y_{ij}-\bar{\bar Y}_{m})^2+\frac{(1-f_3)(m-1)}{mk}\sum_{j=1}^{m}S_{3ij}^2 \right],\\ \mathbb{E}_2\mathbb{E}_3(s_2^2)&=\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}_2\left[\frac{1}{m-1}\sum_{j=1}^{m}(\bar Y_{ij}-\bar{\bar Y}_{m})^2+\frac{1-f_3}{mk}\sum_{j=1}^{m}S_{3ij}^2 \right]\\ &=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{M-1}\sum_{j=1}^{M}(\bar Y_{ij}-\bar{\bar Y})^2+\frac{1-f_3}{nMk}\sum_{i=1}^{n}\sum_{j=1}^{M}S_{3ij}^2\\ &=\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2+\frac{1-f_3}{nMk}\sum_{i=1}^{n}\sum_{j=1}^{M}S_{3ij}^2, \\ \mathbb{E}(s_2^2)&=\mathbb{E}_1\mathbb{E}_2\mathbb{E}_3(s_2^2)\\ &=\mathbb{E}_1\left[\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2+\frac{1-f_3}{nMk}\sum_{i=1}^{n}\sum_{j=1}^{M}S_{3ij}^2 \right]\\ &=\frac{1}{N}\sum_{i=1}^{n}S_{2i}^2+\frac{1-f_3}{k}\frac{1}{NM}\sum_{i=1}^{N}\sum_{j=1}^{M}S_{3ij}^2\\ &=S_{2}^2+\frac{1-f_3}{k}S_{3}^2. \end{aligned} \]类似处理\(s_1^2\)即可,先记\(\displaystyle{S_{3i}^2=\frac{1}{M}\sum_{j=1}^{M}S_{3ij}^2}\),\(\displaystyle{\bar{\bar{\bar Y}}_n=\frac{1}{n}\sum_{i=1}^{n}\bar{\bar Y}_i}\),则有
\[\begin{aligned} &\quad \mathbb{E}_2\mathbb{E}_3[(n-1)s_1^2]\\ &=\mathbb{E}_2\mathbb{E}_3\left[\sum_{i=1}^{n}(\bar{\bar y}_{i}-\bar{\bar{\bar y}})^2\right]\\ &=\sum_{i=1}^{n}\mathbb{E}_2\mathbb{E}_3(\bar{\bar y}_i^2)-n\mathbb{E}_2\mathbb{E}_3(\bar{\bar{\bar y}}^2)\\ &=\sum_{i=1}^{n}\{[\mathbb{E}_2\mathbb{E}_3(\bar{\bar y}_i)]^2+\mathbb{D}_2\mathbb{E}_3(\bar{\bar y}_i)\}-n\{[\mathbb{E}_2\mathbb{E}_3(\bar{\bar{\bar y}})]^2 +\mathbb{D}_2\mathbb{E}_3(\bar{\bar{\bar y}}) \}\\ &=\sum_{i=1}^{n}\bar{\bar Y}_i+\sum_{i=1}^{n}\left[\frac{1-f_2}{m}S_{2i}^2+\frac{1-f_3}{mk}S_{3i}^2 \right]-n\left(\frac{1}{n}\sum_{i=1}^{n}\bar{\bar Y}_i \right)^2\\ &\quad +\frac{1}{n}\sum_{i=1}^{n}\left[\frac{1-f_2}{m}S_{2i}^2+\frac{1-f_3}{mk}S_{3i}^2 \right]\\ &=\sum_{i=1}^{n}(\bar{\bar Y}_i-\bar{\bar{\bar{Y}}}_n)^2+\sum_{i=1}^{n}\left[\frac{(1-f_2)(n-1)}{nm}S_{2i}^2+\frac{(1-f_3)(n-1)}{nmk}S_{3i}^2 \right],\\ &\quad \mathbb{E}(s_1^2)\\ &=\mathbb{E}_1\mathbb{E}_2\mathbb{E}_3(s_1^2)\\ &=\mathbb{E}_1\left\{\frac{1}{n-1}\sum_{i=1}^{n}(\bar{\bar Y}_i-\bar{\bar{\bar Y}}_n)^2+\frac{1-f_2}{nm}\sum_{i=1}^{n}S_{2i}^2+\frac{1-f_3}{nmk}\sum_{i=1}^{n}S_{3i}^2 \right\}\\ &=\frac{1}{N-1}\sum_{i=1}^{N}(\bar{\bar Y}_i-\bar{\bar{\bar Y}})^2+\frac{1-f_2}{m}S_{2}^2+\frac{1-f_3}{mk}S_{3}^2\\ &=S_1^2+\frac{1-f_2}{m}S_2^2+\frac{1-f_3}{mk}S_3^2. \end{aligned} \]最后将上述结论代入,得
\[\begin{aligned} \mathbb{E}[v(\bar{\bar{\bar y}})]&=\frac{1-f_1}{n}\mathbb{E}(s_1^2)+\frac{f_1(1-f_2)}{nm}\mathbb{E}(s_2^2)+\frac{f_1f_2(1-f_3)}{nmk}\mathbb{E}(s_3^2)\\ &=\frac{1-f_1}{n}\left(S_1^2+\frac{1-f_2}{m}S_2^2+\frac{1-f_3}{mk}S_3^3 \right)\\ &\quad +\frac{f_1(1-f_2)}{nm}\left(S_2^2+\frac{1-f_3}{k}S_3^3 \right)\\ &\qquad +\frac{f_1f_2(1-f_3)}{nmk}S_3^2\\ &=\frac{1-f_1}{n}S_1^2+\frac{1-f_2}{nm}S_2^2+\frac{1-f_3}{nmk}S_3^2. \end{aligned} \]
三阶段抽样设计
假设三阶段抽样的费用函数为线性费用函数:\(C_{T}=c_0+c_1n+c_2nm+c_3nmk\)。优化问题为给定\(C_{T}\)极小化方差\(\mathbb{D}(\bar{\bar{\bar y}})\),或给定\(\mathbb{D}(\bar{\bar{\bar y}})\)极小化\(C_{T}\),最优的\(k,m\)为
当确定\(k,m\)的最优值后,如果给定了总费用\(C_{T}\),则
如果给定了估计量方差\(\mathbb{D}(\bar{\bar{\bar y}})=V\),则
将估计量方差改写为
\[\begin{aligned} \mathbb{D}(\bar{\bar{\bar y}})&=\frac{1-f_1}{n}S_1^2+\frac{1-f_2}{nm}S_2^2+\frac{1-f_3}{nmk}S_3^2\\ &=\left(\frac{1}{n}-\frac{1}{N} \right)S_1^2+\frac{1}{n}\left(\frac{1}{m}-\frac{1}{M} \right)S_2^2+\frac{1}{nm}\left(\frac{1}{k}-\frac{1}{K} \right)S_3^2\\ &=\frac{1}{n}\left(S_1^2-\frac{S_2^2}{M} \right)+\frac{1}{nm}\left(S_2^2-\frac{S_3^2}{K} \right)+\frac{S_3^2}{nmk}-\frac{S_1^2}{N}, \end{aligned} \]对\(\displaystyle{\left(V+\frac{S_1^2}{N} \right)(C_{T}-c_0)}\)进行极小化,为方便书写令
\[S_{u}=S_1^2-\frac{S_2^2}{M},\quad S_{v}=S_2^2-\frac{S_3^2}{K}. \]于是有
\[{\left(V+\frac{S_1^2}{N} \right)(C_{T}-c_0)}=\left(S_{u}+\frac{1}{m}S_{v}+\frac{S_3^2}{mk} \right)(c_1+c_2m+c_3mk), \]由柯西不等式,
\[\quad \left(S_{u}+\frac{1}{m}S_{v}+\frac{S_3^2}{mk} \right)(c_1+c_2m+c_3mk) \ge \left(\sqrt{S_{u}c_1}+\sqrt{S_{v}c_2}+\sqrt{S_3^2c_3} \right)^{2}, \]且等号成立当且仅当
\[\frac{S_{u}}{c_1}=\frac{S_{v}}{c_2m^2}=\frac{S_3^2}{c_3m^2k^2}, \]解得
\[m_{\text{opt}}=\sqrt{\frac{S_{v}c_1}{S_{u}c_2}}=\sqrt{\frac{S_2^2-\frac{S_3^2}{K}}{S_1^2-\frac{S_2^2}{M}}}\sqrt{\frac{c_1}{c_2}},\\ k_{\text{opt}}=\sqrt{\frac{S_3^2c_2}{S_vc_3}}=\sqrt{\frac{S_3^2}{S_2^2-\frac{S_3^2}{K}}}\sqrt{\frac{c_2}{c_3}}. \]最后代回即可获得\(n\)的最优值。