『原创』统计建模与R软件-第五章 假设检验
2015-12-25 08:26 Digging4 阅读(13008) 评论(1) 编辑 收藏 举报摘要: 本文由digging4发表于:http://www.cnblogs.com/digging4/p/5054603.html
统计建模与R软件-第五章 假设检验
5.1正常男子血小板计数均值为\(225*10^9/L\),今测得20名男性油漆工人的血小板计数值(单位:\(10^9/L\)):220,188 ,162 ,230 ,145 ,160 ,238 ,188 ,247 ,113,126 ,245 ,164 ,231 ,256 ,183 ,190 ,158 ,224 ,175。问油漆工人的血小板计数与正常成人男子有无差异?
##
## t.test(x,y=NULL,...)提供了t检验和相应的区间估计的功能,x,y是数据向量,如果y为空,则作单个正态总体的均值检验,否则作两个总体的均值检验
x <- c(220, 188, 162, 230, 145, 160, 238, 188, 247, 113, 126, 245, 164, 231,
256, 183, 190, 158, 224, 175)
t.test(x, alternative = "two.sided", mu = 225)
##
## One Sample t-test
##
## data: x
## t = -3.478, df = 19, p-value = 0.002516
## alternative hypothesis: true mean is not equal to 225
## 95 percent confidence interval:
## 172.4 211.9
## sample estimates:
## mean of x
## 192.2
# 得到结论alternative hypothesis: true mean is not equal to 225
# 95%的置信区间为[172.4,211.9],均值估计为192.2
5.2已知某种灯泡寿命服从正态分布,在某星期所生产的该灯泡中随机抽取10只,测得其寿命(单位:小时)为:1067 ,919 ,1196 ,785 ,1126 ,936 ,918 ,1156 ,920 ,948。求这个星期生成出的灯泡能使用1000小时以上的概率。
## alternative='greater' ,表示备选假设H1:u>u0
x <- c(1067, 919, 1196, 785, 1126, 936, 918, 1156, 920, 948)
t.test(x, alternative = "greater", mu = 1000)
##
## One Sample t-test
##
## data: x
## t = -0.0697, df = 9, p-value = 0.527
## alternative hypothesis: true mean is greater than 1000
## 95 percent confidence interval:
## 920.8 Inf
## sample estimates:
## mean of x
## 997.1
##
## 95%的置信区间为[920.8,Inf),均值估计为997.1,不能认为平均使用1000小时以上
# 使用用pnorm函数来计算大于1000的概率,P(x<1000) =
# pnorm(1000,mean(x),sd(x))表示负无穷到1000的积分,后面两个参数是正态分布函数的均值和标准差
1 - pnorm(1000, mean(x), sd(x))
## [1] 0.4912
5.3为研究某铁剂治疗和饮食治疗营养性缺铁性贫血的效果,将16名患者按年龄、体重、病程和病情相近的原则配成8对,分别使用饮食疗法和补充铁剂治疗的方法,3个月后测得两种患者血红蛋白如表5.19所示,问两种方法治疗后的患者血红蛋白有无差异?
表5.19:铁剂和饮食两种方法治疗后患者血红蛋白值(\(g/L\))
铁剂治疗组 113 120 138 120 100 118 138 123
饮食治疗组 138 116 125 136 110 132 130 110
# 两个总体的情况(方差未知),检验均值是否相等 var.qual='FALSE'
# 默认两总体方差不等,paired=TRUE表示数据成对出现
x <- c(113, 120, 138, 120, 100, 118, 138, 123)
y <- c(138, 116, 125, 136, 110, 132, 130, 110)
t.test(x, y, alternative = "two.sided", var.qual = "FALSE", paired = TRUE)
##
## Paired t-test
##
## data: x and y
## t = -0.6513, df = 7, p-value = 0.5357
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -15.629 8.879
## sample estimates:
## mean of the differences
## -3.375
#
# 均值差的95%置信区间为[-15.629,8.879],该区间包含0,因此不能认为两总体有均值有差异。
5.4为研究国产四类新药阿卡波糖胶囊效果,某医院用40名二型糖尿病病人进行同期随机对照实验,实验者将这些病人随机等分到试验组(阿卡波糖胶囊组)和对照组(拜唐苹胶囊组),分布测得实验开始前和8周后空腹血糖,算得空腹血糖下降值,如表5.20所示,能否认为国产四类新药阿卡波糖胶囊与拜唐苹胶囊对空腹血糖的降糖效果不同?
表5.20: 试验组与对照组空腹血糖下降值(\(mmol/L\))
试验组(n1 = 20):-0.70,-5.60,2.00,2.80,0.70,3.50,4.00,5.80,7.10,-0.50,2.50 ,-1.60 ,1.70 ,3.00 ,0.40 ,4.50 ,4.60 ,2.50 ,6.00 ,-1.40
对照组(n2 = 20):6.50 ,5.00 ,5.20 ,0.80 ,0.20 ,0.60 ,3.40 ,6.60 ,-1.10,6.00 ,3.80 ,2.00 ,1.60 ,2.00 ,2.20 ,1.20 ,3.10 ,1.70 ,-2.00
(1)检验试验组和对照组的数据是否来自正态分布,采用正态性W检验方法(见第三章),Kolmogorov-Smirnov检验方法和Pearson拟合优度\(\chi^2\) 检验;
(2)用t-检验两组数据均值是否有差异,分别用方差相同模型、方差不同模型和成对t-检验模型;
(3)检验试验组与对照组的方差是否相同。
x <- c(-0.7, -5.6, 2, 2.8, 0.7, 3.5, 4, 5.8, 7.1, -0.5, 2.5, -1.6, 1.7, 3, 0.4,
4.5, 4.6, 2.5, 6, -1.4)
y <- c(6.5, 5, 5.2, 0.8, 0.2, 0.6, 3.4, 6.6, -1.1, 6, 3.8, 2, 1.6, 2, 2.2, 1.2,
3.1, 1.7, -2, -1)
# (1)a:正态性W检验方法,p值分别为
# 0.7527,0.6546,均大于0.05,可认为样本来自正态分布,国家标准推荐国标GB/T
# 4881-2001《数据的统计处理和解释——正态性检验》
shapiro.test(x)
##
## Shapiro-Wilk normality test
##
## data: x
## W = 0.9699, p-value = 0.7527
shapiro.test(y)
##
## Shapiro-Wilk normality test
##
## data: y
## W = 0.9619, p-value = 0.5816
#
# (1)b:Kolmogorov-Smirnov检验方法,p值都小于0.05,不认为x,y和pnorm是同一分布
ks.test(x, "pnorm")
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.6054, p-value = 8.578e-07
## alternative hypothesis: two-sided
ks.test(y, "pnorm")
##
## One-sample Kolmogorov-Smirnov test
##
## data: y
## D = 0.5952, p-value = 1.402e-06
## alternative hypothesis: two-sided
# (1) C: Pearson拟合优度检验
# (2) t.test() 做两总体均值检验
t.test(x, y, var.equal = TRUE) #方差相同模型
##
## Two Sample t-test
##
## data: x and y
## t = -0.3657, df = 38, p-value = 0.7166
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.124 1.474
## sample estimates:
## mean of x mean of y
## 2.065 2.390
t.test(x, y, var.equal = FALSE) #方差不同模型
##
## Welch Two Sample t-test
##
## data: x and y
## t = -0.3657, df = 36.73, p-value = 0.7167
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.126 1.476
## sample estimates:
## mean of x mean of y
## 2.065 2.390
t.test(x, y, paired = TRUE) #成对t-检验模型
##
## Paired t-test
##
## data: x and y
## t = -0.3199, df = 19, p-value = 0.7526
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.452 1.802
## sample estimates:
## mean of the differences
## -0.325
# (3) var.test() 做方差比的检验和相应的区间估计
var.test(x, y)
##
## F test to compare two variances
##
## data: x and y
## F = 1.456, num df = 19, denom df = 19, p-value = 0.4203
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.5763 3.6786
## sample estimates:
## ratio of variances
## 1.456
5.5为研究某种新药对抗凝血酶活力的影响,随机安排新药组病人12例,对照组病人10例,分布测定其抗凝血酶活力(单位:\(mm^3\)),其结果如下:新药组:126 ,125 ,136 ,128 ,123 ,138 ,142 ,116 ,110 ,108 ,115 ,140; 对照组:162 ,172 ,177 ,170 ,175 ,152 ,157 ,159 ,160 ,162。试分析新药组和对照组病人的抗凝血酶活力有无差别(\(\alpha=0.05\))
(1)检验两组数据是否服从正态分布
(2)检验两组样本方差是否相同
(3)选择最合适的检验方法检验新药组和对照组病人的抗凝血酶活力有无差别。
# (1) p值分别为 0.4934,0.5313 大于0.05,可认为符合正态分布
x <- c(126, 125, 136, 128, 123, 138, 142, 116, 110, 108, 115, 140)
y <- c(162, 172, 177, 170, 175, 152, 157, 159, 160, 162)
shapiro.test(x)
##
## Shapiro-Wilk normality test
##
## data: x
## W = 0.9396, p-value = 0.4934
shapiro.test(y)
##
## Shapiro-Wilk normality test
##
## data: y
## W = 0.938, p-value = 0.5313
# (2) 方差比的95%置信区间为[0.5022 ,7.0489] 可认为两样本方差相同
var.test(x, y)
##
## F test to compare two variances
##
## data: x and y
## F = 1.965, num df = 11, denom df = 9, p-value = 0.32
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.5022 7.0489
## sample estimates:
## ratio of variances
## 1.965
# (3) 均值差的 95 percent confidence interval: -48.25 -29.78
# ,可认为两组样本均值有差别
t.test(x, y, var.equal = TRUE) #方差相同模型
##
## Two Sample t-test
##
## data: x and y
## t = -8.815, df = 20, p-value = 2.524e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -48.25 -29.78
## sample estimates:
## mean of x mean of y
## 125.6 164.6
5.6一项调查显示某城市老年人口比重为\(14.7%\),该市老年研究协会为了检验该项调查是否可靠,随机抽选了400名居民,发现其中有57人是老年人,问调查结果是否支撑该市老年人口比重为\(14.7%\)的看法(\(\alpha=0.05\))。
# 调查中,老年人和非老年人的比为
# 0.147:1-0.147,随机抽取中老年人和非老年的人数比为57:500-57,采用卡方检验
# p-value = 0.03717 < 0.05 拒绝原假设,即认为老年人比重不为14.7%
chisq.test(c(57, 500 - 57), p = c(0.147, 1 - 0.147))
##
## Chi-squared test for given probabilities
##
## data: c(57, 500 - 57)
## X-squared = 4.342, df = 1, p-value = 0.03717
5.7作性别控制试验,经某种处理后,共是雏鸡328只,其总公雏150只,母雏178只,试问这种处理能否着增加母雏的比例?(性别比应为1:1)
# p-value =
# 0.1221>0.05,接受原假设,认为性别比为1:1,即这种处理不能增加母雏的比例
chisq.test(c(150, 178), p = c(0.5, 0.5))
##
## Chi-squared test for given probabilities
##
## data: c(150, 178)
## X-squared = 2.39, df = 1, p-value = 0.1221
5.8Mendel用豌豆的两对相对性状进行杂交实验,黄色园滑种子与绿色皱缩种的豌豆杂交后,第二代根据自由组合规律,理论分离比为 黄圆:黄皱:绿圆:绿皱=9/16:3/16:3/16:1/16。实际实验值为:黄圆15粒,黄皱101粒,绿圆108粒,绿皱32粒,共556粒,问此结果是否符合自由组合规律?
# p-value < 2.2e-16<0.05, 拒绝原假设,结果不否符合自由组合规
chisq.test(c(15, 101, 108, 32), p = c(9, 3, 3, 1)/16)
##
## Chi-squared test for given probabilities
##
## data: c(15, 101, 108, 32)
## X-squared = 265.1, df = 3, p-value < 2.2e-16
5.9观察每分钟进入某商店的人数X,任取200分钟,所得数据如下:顾客人数 0 ,1 ,2 ,3 ,4 ,5 ;对应频数 92, 68, 28, 11, 1, 0.试分析,能否认为每分钟顾客数X服从Poisson分布(\(\alpha=0.1\))
x <- c(0, 1, 2, 3, 4, 5)
y <- c(92, 68, 28, 11, 1, 0)
# 因为y的最后一组的频数小于5,卡方检验为出错,需要把最后两组和前面的合并
y <- c(92, 68, 28, 12)
# 计算泊松分布的理论分布概率,其中,mean(rep(x,y))为样本均值
q <- ppois(x, mean(c(rep(0, 92), rep(1, 68), rep(2, 28), rep(3, 11), rep(4,
1), rep(5, 0))))
# p-value = 0.8227>0.1。可认为服从泊松分布
chisq.test(c(92, 68, 28, 12), p = c(q[1], q[2] - q[1], q[3] - q[2], 1 - q[3]))
##
## Chi-squared test for given probabilities
##
## data: c(92, 68, 28, 12)
## X-squared = 0.9113, df = 3, p-value = 0.8227
5.10观察得两样本值如下:2.36 ,3.14 ,7.52, 3.48, 2.76, 5.43, 6.54, 7.41; 和 4.38,4.25, 6.53, 3.28, 7.21, 6.55. 试分析两样本是否来自同一总体(\(\alpha=0.05\))。
# chisq.test的原假设是两变量独立,p值大于0.05,接受原假设 p-value = 0.6374
# >0.05。接受原假设,即认为两个样本来自同一总体
x <- c(2.36, 3.14, 7.52, 3.48, 2.76, 5.43, 6.54, 7.41)
y <- c(4.38, 4.25, 6.53, 3.28, 7.21, 6.55)
ks.test(x, y)
##
## Two-sample Kolmogorov-Smirnov test
##
## data: x and y
## D = 0.375, p-value = 0.6374
## alternative hypothesis: two-sided
5.11为研究分娩过程中使用胎儿电子检测仪对剖腹产率有无影响,对5824例分娩的经产妇进行回顾性调查,结果如表5.12所示,试进行分析
剖腹产 胎儿电子检测仪 合计
------------使用 未使用
是 358 229 587
否 2492 2745 5237
合计 2850 2974 5824
# chisq.test的原假设是两变量独立,p值大于0.05,接受原假设 p-value =
# 7.263e-10 <
# 0.05,因此拒绝原假设,也就是说使用胎儿电子检测仪对剖腹产率有影响
x <- c(358, 2492, 229, 2745)
dim(x) <- c(2, 2)
chisq.test(x, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: x
## X-squared = 37.95, df = 1, p-value = 7.263e-10
5.12在高中一年级男生中抽取300名考察其两个属性:B是1500米长跑,C是每天平均锻炼时间,得到4×3列联表,如表5.22所示,试对\(\alpha=0.05\)检验B与C是否独立。
表5.22:300名高中学生体育锻炼的考察结果
1500米长跑记录 锻炼时间2小时以上 1~2小时 1小时以下 合计
5''01'-5''30' 45 12 10 67
5''31'-6''00' 46 20 28 94
6''00'-6''30' 28 23 30 81
6''31'-7''00' 11 12 35 58
合计 130 67 103 300
# chisq.test的原假设是两变量独立,p值大于0.05,接受原假设 p-value =
# 6.427e-06 <0.05,拒绝原假设,认为两变量不独立
x <- c(45, 46, 28, 11, 12, 20, 23, 12, 10, 28, 30, 35, 67, 94, 81, 58)
dim(x) <- c(4, 4)
chisq.test(x, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: x
## X-squared = 40.4, df = 9, p-value = 6.427e-06
5.13为比较两种工艺对产品的质量是否有影响,对其产品进行抽样检查,其结果如表5.23所示,试进行分析。
表5.23 两种工艺下产品质量的抽查结果
-------------合格--------不合格--------------合计
工艺一-------3-----------4-------------------7
工艺二------6------------4-------------------10
合计---------9-----------8-------------------17
#
# 有一个单元频数小于5,应做fisher.test检验,原假设是两变量独立,p值大于0.05,接受原假设
# p-value = 0.6372>0.05,接受原假设,认为工艺和产品是否合格独立
x <- c(3, 6, 4, 4)
dim(x) <- c(2, 2)
fisher.test(x)
##
## Fisher's Exact Test for Count Data
##
## data: x
## p-value = 0.6372
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.04624 5.13272
## sample estimates:
## odds ratio
## 0.5213
5.14 应用核素法和对比法检测147例冠心病患者心脏收缩运动的符合情况,其结果如表5.24所示,试分析这两种方法测定的结果是否相同。
表5.24 两法检查室壁收缩运动的符合情况
-------------------------核素法-------------------------合计
对比法-----------正常--------减弱----------异常---------
正常------------58------------2------------3------------63
减弱------------1-------------42-----------7------------50
异常------------8-------------9------------17-----------34
合计------------67------------53-----------27-----------147
#
# 有一个单元频数小于5,应做fisher.test检验,原假设是两变量独立,p值大于0.05,接受原假设
# p-value < 2.2e-16 <0.05,拒绝原假设,两种方法测定的结果不相同
x <- c(58, 1, 8, 2, 42, 9, 3, 7, 17)
dim(x) <- c(3, 3)
fisher.test(x)
##
## Fisher's Exact Test for Count Data
##
## data: x
## p-value < 2.2e-16
## alternative hypothesis: two.sided
5.15在某养鱼塘中,根据过去经验,鱼的长度的中位数为14.6cm,现对鱼塘中鱼的长度进行一次估测,随机地从鱼塘中取出10条鱼长度如下:13.32,13.06, 14.02, 11.86, 13.58 ,13.77, 13.51, 14.42 ,14.44 ,15.43. 将它们作为一个样本进行检验,试分析,该鱼塘中鱼的长度是中位数之上,还是在中位数之下。(1)用符号检验分析;(2)用Wilcoxon符号秩检验。
# binom.test检验样本的中位数,sum(x>14.6)样本中大于14.6的个数, al=“l”
# 表示,H0:M>=14.6, M<14.6, M为样本的中位数 p-value = 0.01074<0.05,
# 拒绝原假设,即认为样本的中位数小于14.6
x <- c(13.32, 13.06, 14.02, 11.86, 13.58, 13.77, 13.51, 14.42, 14.44, 15.43)
binom.test(sum(x > 14.6), length(x), al = "l")
##
## Exact binomial test
##
## data: sum(x > 14.6) and length(x)
## number of successes = 1, number of trials = 10, p-value = 0.01074
## alternative hypothesis: true probability of success is less than 0.5
## 95 percent confidence interval:
## 0.0000 0.3942
## sample estimates:
## probability of success
## 0.1
# 符号检验,只比较了差值大小,而忽略了差值的绝对值
# Wilcoxon符合秩检验,弥补了这一缺点 Wilcoxon符合秩检验 H0:M>=mu,
# M<mu,M为样本的中位数
# exact表示是否计算精确的p值,样本量小时,该参数起作用 p-value =
# 0.01087<0.05,拒绝原假设,即认为样本的中位数小于14.6
wilcox.test(x, mu = 14.6, alternative = "less", exact = FALSE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: x
## V = 4.5, p-value = 0.01087
## alternative hypothesis: true location is less than 14.6
5.16用两种不同的测定方法,测定同一种中草药的有效成分,共重复20次,得到实验结果如表5.25所示。
表5.25 两种不同测定方法得到的结果
方法A: 48.0, 33.0, 37.5, 48.0, 42.5 ,40.0 ,42.0, 36.0 ,11.3 ,22.0 , 36.0, 27.3, 14.2 ,32.1 ,52.0, 38.0 ,17.3, 20.0 ,21.0 ,46.1
方法B: 37.0, 41.0 ,23.4, 17.0, 31.5 ,40.0 ,31.0 ,36.0, 5.7, 11.5,21.0, 6.1, 26.5, 21.3, 44.5 ,28.0 ,22.6 ,20.0, 11.0, 22.3
(1)试用符号检验法检验两测定有无显著差异
(2)试用Wilcoxon符号秩检验法检验两测定有无显著差异
(3)试用Wilcoxon秩和检验法检验两测定有无显著差异
(4)对数据做正态性和方差齐性检验,该数据是否做t-检验,如果能,请做t-检验
(5)分析各种的检验方法,试说明哪种检验法效果最好
x <- c(48, 33, 37.5, 48, 42.5, 40, 42, 36, 11.3, 22, 36, 27.3, 14.2, 32.1, 52,
38, 17.3, 20, 21, 46.1)
y <- c(37, 41, 23.4, 17, 31.5, 40, 31, 36, 5.7, 11.5, 21, 6.1, 26.5, 21.3, 44.5,
28, 22.6, 20, 11, 22.3)
# p-value = 0.1153>0.05,无法拒绝原假设,即认为两次无显著差异 95 percent
# confidence interval: 0.4572 0.8811,包含0.5,表示x<y x>=y的概率各占0.5
binom.test(sum(x > y), length(x))
##
## Exact binomial test
##
## data: sum(x > y) and length(x)
## number of successes = 14, number of trials = 20, p-value = 0.1153
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.4572 0.8811
## sample estimates:
## probability of success
## 0.7
# p-value = 0.005191 < 0.05,拒绝原假设,即认为两次测定有显著差异
wilcox.test(x, y, paired = TRUE, exact = FALSE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: x and y
## V = 136, p-value = 0.005191
## alternative hypothesis: true location shift is not equal to 0
# p-value = 0.04524 < 0.05,拒绝原假设,即认为两次测定有显著差异
wilcox.test(x, y, exact = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: x and y
## W = 274.5, p-value = 0.04524
## alternative hypothesis: true location shift is not equal to 0
# p-value=0.3773 大于0.05,可以认为样本是来自于正态分布的总体
shapiro.test(x)
##
## Shapiro-Wilk normality test
##
## data: x
## W = 0.9507, p-value = 0.3773
# p-value=0.6848 大于0.05,可以认为样本是来自于正态分布的总体
shapiro.test(y)
##
## Shapiro-Wilk normality test
##
## data: y
## W = 0.9667, p-value = 0.6848
# p-value = 0.7772>0.05,且95 percent confidence interval:0.4515
# 2.8818,包含1,可认为方差相等
var.test(x, y)
##
## F test to compare two variances
##
## data: x and y
## F = 1.141, num df = 19, denom df = 19, p-value = 0.7772
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.4515 2.8818
## sample estimates:
## ratio of variances
## 1.141
# x,y均来自正态分布总体,且方差齐性,可以做t-检验 p-value =
# 0.03085<0.05,拒绝原假设,认为两次测定有显著差异
t.test(x, y)
##
## Welch Two Sample t-test
##
## data: x and y
## t = 2.243, df = 37.84, p-value = 0.03085
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.8115 15.8785
## sample estimates:
## mean of x mean of y
## 33.22 24.87
# 根据以上分析,符号检验法的效果较差
5.17调查某大学学生每周学习时间与得分的平均等级之间的关系,现抽查10个学生的资料如表:学习时间:24,17, 20 ,41, 52, 23, 46, 18, 15, 29. 学习等级:8 ,1 ,4 ,7, 9 ,5 ,10, 3 ,2 ,6. 其中等级10表示最好,1表示最差,试用秩相关检验(Spearman检验和Kendall检验)分析学习等级与学习成绩有无关系。
x <- c(24, 17, 20, 41, 52, 23, 46, 18, 15, 29)
y <- c(8, 1, 4, 7, 9, 5, 10, 3, 2, 6)
# p-value < 2.2e-16<0.05,拒绝原假设,即认为两变量相关, 同时rho
# 0.9394,表示两变量正相关
cor.test(x, y, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: x and y
## S = 10, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9394
# p-value = 0.0003577<0.05,拒绝原假设,即认为两变量相关, 同时rho
# 0.8222,表示两变量正相关
cor.test(x, y, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: x and y
## T = 41, p-value = 0.0003577
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.8222
# 两种方法检验结果一致
5.18为比较一种新疗法对某种疾病的治疗效果,将40名患者随机地分为两组,每组20人,一组采用新疗法,另一组用原标准疗法,经过一段时间的治疗后,对每个患者的疗效作仔细的评估,并划分为差,较差,一般,较好和好五个等级,两组中处于不同等级的患者人数如表5.26所示,试分析,由此结果能否认为新方法的疗效显著地优于原疗法(\(\alpha=0.05\))
表5.26 不同方法治疗后的效果
等级 差 较差 一般 较好 好
新疗法组 0 1 9 7 3
原疗法组 2 2 11 4 1
# 对差到好进行编号为1:5
x <- rep(1:5, c(0, 1, 9, 7, 3))
y <- rep(1:5, c(2, 2, 11, 4, 1))
# p-value = 0.05509 >0.05,不能拒绝原假设,即认为两种效果相同
wilcox.test(x, y, exact = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: x and y
## W = 266, p-value = 0.05509
## alternative hypothesis: true location shift is not equal to 0