【R统计】主成分分析2——主成分回归
习题:
对某地区的某消费品的销售量Y进行调查,它与下面四个变量有关:x1居民可支配收入,x2该类消费品平均价格指数,x3社会该消费品保有量,x4其他消费品平均价格指数,历史资料如下表所示。试用主成分回归方法建立销售量Y与其他四个变量x1,x2, x3 和 x4的回归方程。
数据资料data.txt:
x1 x2 x3 x4 y 1 82.9 92 17.1 94 8.4 2 88.0 93 21.3 96 9.6 3 99.9 96 25.1 97 10.4 4 105.3 94 29.0 97 11.4 5 117.7 100 34.0 100 12.2 6 131.0 101 40.0 101 14.2 7 148.2 105 44.0 104 15.8 8 161.8 112 49.0 109 17.9 9 174.2 112 51.0 111 19.6 10 184.7 112 53.0 111 20.8
脚本:
#270 #230 conomy <- read.table("data.txt",header = TRUE, sep = "\t"); #### 作线性回归 lm.sol<-lm(y~x1+x2+x3, data=conomy); summary(lm.sol); Call: lm(formula = y ~ x1 + x2 + x3, data = conomy); # Residuals: # Min 1Q Median 3Q Max # -0.44365 -0.20719 0.04925 0.18879 0.47673 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 0.23574 5.39534 0.044 0.96657 # x1 0.14167 0.02587 5.477 0.00155 ** # x2 -0.02763 0.07265 -0.380 0.71685 # x3 -0.04743 0.05903 -0.803 0.45235 # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # Residual standard error: 0.349 on 6 degrees of freedom # Multiple R-squared: 0.9957, Adjusted R-squared: 0.9935 # F-statistic: 462.5 on 3 and 6 DF, p-value: 1.744e-07 #### 作主成分分析 conomy.pr<-princomp(~x1+x2+x3, data=conomy, cor=T); summary(conomy.pr, loadings=TRUE); # Importance of components: # Comp.1 Comp.2 Comp.3 # Standard deviation 1.720206 0.17628306 0.099081994 # Proportion of Variance 0.986369 0.01035857 0.003272414 # Cumulative Proportion 0.986369 0.99672759 1.000000000 # Loadings: # Comp.1 Comp.2 Comp.3 # x1 0.579 0.180 0.795 # x2 0.576 -0.781 -0.243 # x3 0.577 0.598 -0.556 #### 预测测样本主成分, 并作主成分分析 pre<-predict(conomy.pr); conomy$z1<-pre[,1]; conomy$z2<-pre[,2]; lm.sol<-lm(y~z1+z2, data=conomy); # summary(lm.sol); # Call: # lm(formula = y ~ z1 + z2, data = conomy) # Residuals: # Min 1Q Median 3Q Max # -0.79867 -0.45194 0.06536 0.36712 0.83831 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 14.0300 0.1897 73.972 2.17e-11 *** # z1 2.3763 0.1103 21.552 1.17e-07 *** # z2 0.6977 1.0759 0.648 0.537 # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # Residual standard error: 0.5998 on 7 degrees of freedom # Multiple R-squared: 0.9852, Adjusted R-squared: 0.9809 # F-statistic: 232.5 on 2 and 7 DF, p-value: 3.975e-07 #### 作变换, 得到原坐标下的关系表达式 beta<-coef(lm.sol); A<-loadings(conomy.pr); x.bar<-conomy.pr$center; x.sd<-conomy.pr$scale; coef<-(beta[2]*A[,1]+ beta[3]*A[,2])/x.sd; beta0 <- beta[1]- sum(x.bar * coef); c(beta0, coef); # (Intercept) x1 x2 x3 # -7.75109994 0.04347167 0.10678004 0.14573976 ### 结论:y=-7.75109994+0.04347167x1+ 0.10678004x2+0.14573976x3
博文源代码和习题均来自于教材《统计建模与R软件》(ISBN:9787302143666,作者:薛毅)。