R聚类分析

数据格式：时间（小时）/各个省份数据

数据输入：

locdata <- read.csv("./data.csv")

此时locdata的类型为：data.frame

需要转化为matric类型：

locdata_m <- as.matrix(locdata)

剔除数据：

locdata_mt <- locdata_mt[c(2: nrow(locdata_mt)), c(1: ncol(locdata_mt))]

接下来开始聚类运算：

分两个步骤：

1、确定计算距离方式

dist(x, method = "euclidean",diag = FALSE, upper = FALSE, p = 2)

r语言中使用dist(x, method = "euclidean",diag = FALSE, upper = FALSE, p = 2) 来计算距离。其中x是样本矩阵或者数据框。

method表示计算哪种距离。method的取值有：
euclidean                欧几里德距离，就是平方再开方。
maximum                切比雪夫距离
manhattan            绝对值距离
canberra                Lance 距离
minkowski            明科夫斯基距离，使用时要指定p值
binary                    定性变量距离

(计算距离之前可以使用scale(x, center = TRUE, scale = TRUE)来对数据进行中心化及标准化。)

2、确定聚类方法

hclust(d, method = "complete", members=NULL)

method表示类的合并方法，有：
single            最短距离法
complete        最长距离法
median        中间距离法
mcquitty        相似法
average        类平均法
centroid        重心法
ward            离差平方和法

d <- dist(locdata_mt, method = *)

hcl <- hclust(d, method = *)

最后以根节点对其的方式画图：

plot(hcl, hang = -1)

重合度图形展示：

heatmap(as.matrix)

posted on 2015-07-31 12:56 闪电战阅读(918) 评论(0) 编辑收藏举报