利用elliipse做相关图
参考资料:《数据探掘 R语言实战》 p65-P68
install.packages("rattle") # 获取实验数据集
install.packages("ellipse") # 获取构建相关图的函数plotcorr
rm(list = ls())
library("ellipse") # 加载包
library("rattle")
data(weather) # 加载数据集
head(weather) # 查看数据集
## Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine
## 1 2007-11-01 Canberra 8.0 24.3 0.0 3.4 6.3
## 2 2007-11-02 Canberra 14.0 26.9 3.6 4.4 9.7
## 3 2007-11-03 Canberra 13.7 23.4 3.6 5.8 3.3
## 4 2007-11-04 Canberra 13.3 15.5 39.8 7.2 9.1
## 5 2007-11-05 Canberra 7.6 16.1 2.8 5.6 10.6
## 6 2007-11-06 Canberra 6.2 16.9 0.0 5.8 8.2
## WindGustDir WindGustSpeed WindDir9am WindDir3pm WindSpeed9am
## 1 NW 30 SW NW 6
## 2 ENE 39 E W 4
## 3 NW 85 N NNE 6
## 4 NW 54 WNW W 30
## 5 SSE 50 SSE ESE 20
## 6 SE 44 SE E 20
## WindSpeed3pm Humidity9am Humidity3pm Pressure9am Pressure3pm Cloud9am
## 1 20 68 29 1019.7 1015.0 7
## 2 17 80 36 1012.4 1008.4 5
## 3 6 82 69 1009.5 1007.2 8
## 4 24 62 56 1005.5 1007.0 2
## 5 28 68 49 1018.3 1018.5 7
## 6 24 70 57 1023.8 1021.7 7
## Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
## 1 7 14.4 23.6 No 3.6 Yes
## 2 3 17.5 25.7 Yes 3.6 Yes
## 3 7 15.4 20.2 Yes 39.8 Yes
## 4 7 13.5 14.1 Yes 2.8 Yes
## 5 7 11.1 15.4 Yes 0.0 No
## 6 5 10.9 14.8 No 0.2 No
test_data <- weather[, 12:21] # 第12到21列为数值型
cor_matrix <- cor(test_data, use = "pairwise") # 两两变量求相关系数
cor_matrix # 显示结果
## WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
## WindSpeed9am 1.00000000 0.47296617 -0.2706229 0.14665712 -0.35633183
## WindSpeed3pm 0.47296617 1.00000000 -0.2660925 -0.02636775 -0.35980011
## Humidity9am -0.27062286 -0.26609247 1.0000000 0.54671844 0.13572697
## Humidity3pm 0.14665712 -0.02636775 0.5467184 1.00000000 -0.08794614
## Pressure9am -0.35633183 -0.35980011 0.1357270 -0.08794614 1.00000000
## Pressure3pm -0.24795238 -0.33732535 0.1344205 -0.01005189 0.96789496
## Cloud9am 0.10184246 -0.02642642 0.3928416 0.55163264 -0.15755279
## Cloud3pm -0.02247149 0.00720724 0.2719381 0.51010790 -0.14100043
## Temp9am 0.06407405 -0.01776636 -0.4365506 -0.25568147 -0.46041819
## Temp3pm -0.23518635 -0.18756965 -0.3551186 -0.58167615 -0.25367375
## Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm
## WindSpeed9am -0.24795238 0.10184246 -0.02247149 0.06407405 -0.2351864
## WindSpeed3pm -0.33732535 -0.02642642 0.00720724 -0.01776636 -0.1875697
## Humidity9am 0.13442050 0.39284158 0.27193809 -0.43655057 -0.3551186
## Humidity3pm -0.01005189 0.55163264 0.51010790 -0.25568147 -0.5816761
## Pressure9am 0.96789496 -0.15755279 -0.14100043 -0.46041819 -0.2536738
## Pressure3pm 1.00000000 -0.12894408 -0.14383718 -0.49263629 -0.3454853
## Cloud9am -0.12894408 1.00000000 0.52521793 0.02104135 -0.2023440
## Cloud3pm -0.14383718 0.52521793 1.00000000 0.04094519 -0.1728142
## Temp9am -0.49263629 0.02104135 0.04094519 1.00000000 0.8444058
## Temp3pm -0.34548531 -0.20234405 -0.17281423 0.84440581 1.0000000
col <- 1:10 # 填充颜色
plotcorr(cor_matrix, col = col, type = "lower", diag = F)
相关性越强圆形越窄,左倾斜(\)表示负相关,右倾斜(/)表示正相关,例如Temp3pm和Temp9am为正相关
# numbers = T, diag = T
plotcorr(cor_matrix, numbers = T, type = "lower", diag = T)
-
col 设置椭圆填充颜色
-
type 设置显示上三角、下三角、全部显示(upper、lower、full)
-
diag 逻辑值,是否显示主对角线
-
numbers 逻辑值,是否用相关系数值取代椭圆,数值会增大10倍四舍五入