Applied Nonparametric Statistics-lec10

Ref:https://onlinecourses.science.psu.edu/stat464/print/book/export/html/14


估计CDF

The Empirical CDF

绘制empirical cdf的图像:

x = c(4, 0, 3, 2, 2)
plot.ecdf(x)

Kolmogorov-Smirnov test

testing the "sameness" of two independent samples from a continuous distribution 

大的p-value可以说明不同,但小的p-value不能说明相同

样本数量较小时,p-value可能偏大

> x = c(4, 0, 3, 2, 2)
> plot.ecdf(x)
> plot(ecdf(x))
> ecdf(x)
Empirical CDF 
Call: ecdf(x)
 x[1:4] =      0,      2,      3,      4
> ks.test(x, y="pnorm", mean(x), sd(x))

        One-sample Kolmogorov-Smirnov test

data:  x
D = 0.24637, p-value = 0.9219
alternative hypothesis: two-sided

Warning message:
In ks.test(x, y = "pnorm", mean(x), sd(x)) :
  Kolmogorov - Smirnov检验里不应该有连结

Ps:

在R中,与正态分布相关的有四个函数。dnorm是pdf,pnorm是cdf,qnorm是the inverse cumulative density function (quantiles)

rnorm是randomly generated numbers

关于qnorm,它给定一个概率,返回cdf对应的值。如果使用标准正态分布的,那么给定一个概率,返回的就是Z-score

dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)

Density Estimation  

> x
[1] 4 0 3 2 2
> density(x)

Call:
        density.default(x = x)

Data: x (5 obs.);       Bandwidth 'bw' = 0.4868

       x                 y           
 Min.   :-1.4604   Min.   :0.001837  
 1st Qu.: 0.2698   1st Qu.:0.059033  
 Median : 2.0000   Median :0.141129  
 Mean   : 2.0000   Mean   :0.144277  
 3rd Qu.: 3.7302   3rd Qu.:0.205314  
 Max.   : 5.4604   Max.   :0.351014  
> plot(density(x))

 

如果在density(x)里面加上bandwidth参数,那么图片会发生变化,如上图所示。

 

 

  

 

 

 

  

 

posted @ 2017-06-23 11:31  陆离可  阅读(248)  评论(0编辑  收藏  举报