YUANya

导航

 

7.3 The Sampling Distribution of the Sample Mean

population:1000;Scale are normally distributed with mean 100 and standard deviation 16

sample:4;可以得到样本均值的分布图如下:

与通过公式计算得到的mean 和 标准差一致:μx¯ = μ = 100 and σx¯ = σ/√n = 16/√4 = 8;

 由图可知The histogram is shaped roughly like a normal curve (with parameters 100 and 8)

所以:

由此得到:

即在大数据量的情况下,虽然变量可能不是正态分布的,但是该变量的mean值一定是正态分布的,也就是中心极限定理:

Usually, however, a sample size of 30 or more (n ≥ 30) is large enough

example:

统计每户房子占有人数:可知该变量属于右偏分布:

household size is far from being normally distributed; it is right skewed. Nonetheless, according to the central limit theorem, the sampling distribution of the sample mean can be approximated by a normal distribution when the sample size is relatively large. Use simulation to make that fact plausible for a sample size of 30

可以计算得到该样本mean的均值和方差:

 

 

We simulated 1000 samples of 30 households each, determined the sample mean of each of the 1000 samples, and obtained a histogram (Output 7.2) of the 1000 sample means.

 从1000个样本中抽出30个样本,计算这三十个样本的均值,得到上图(即样本均值分布图,验证了中心极限定理,即该分布也是正态分布的)

变量分布/变量mean 分布(在n逐渐变大的趋势下)/

 

可见,SE也在逐渐变小

所以,取样越大,数据越集中在均值附近,相应的SE越小。

 

posted on 2019-07-26 14:56  YUANya  阅读(921)  评论(0编辑  收藏  举报