机器学习笔记（Washington University）- Clustering Specialization-week six

1. Hierarchical clustering

Avoid choosing number of clusters beforehand
Dendrograms help visualize different clustering granularities (no need to rerun algorithm)
Most algorithm allow user to choose any distance metric (k-means restricted us to euclidean distance)
Can often find more complex shapes than k-means or gaussian mixture model

Divisive (top-down):

start with all data in a big cluster and recursively split(recursive k-means)

which algorithm to recurse
how many clusters per split
when to split vs stop, max cluster size or max cluster radius or specified number of clusters

Agglomerative (bottom-up):

start with each data point at its own cluster, merge cluster until all points are in one big cluster (single linkage)

single linkage

initialize each point to be its own cluster
define distance between clusters to bb the minimum distance of C1 in cluster one and C2 in clustrer two
merge the two closest cluster
repeat step 3 until all points are in one cluster

Dendrogram

x axis shows data points (carefully ordered).

y axis shows distance between pairs of clusters.

Path shows all cluser to which a point belongs and the order in which clusters merge.

posted @ 2017-06-02 23:12 ClimberClimb 阅读(191) 评论(0) 编辑收藏举报

刷新页面返回顶部