【转载】 兰德系数、调整兰德系数 (聚类评价标准)

原文地址:

https://blog.csdn.net/sinat_30203515/article/details/82634778

 

 

 

--------------------------------------------------------------------------------------------------------

 

 

 

 

调整兰德系数(Adjusted Rand index)用于聚类模型的性能评估,但是其需要true_label,在正式介绍兰德系数之前,先介绍调整兰德系数的前身,兰德系数。

 

 

 

 

兰德系数(Rand index)

 

 

 

 

 

 

此时,兰德系数为:

兰德系数的值在[0,1]之间,当聚类结果完美匹配时,兰德系数为1。




 

 

 

 

 

 

调整兰德系数(Adjusted Rand index)

 兰德系数的问题在于对于两个随机的划分,其兰德系数值不是一个接近于0的常数。Hubert和Arabie在1985年提出了调整兰德系数,调整兰德系数假设模型的超分布为随机模型,即  U  和 V  划分为随机的,那么各类别和各簇的数据点数目是固定的。

 

 

 

 

 

 

调整的兰德系数为:

 

 

 

 

 

 

优点:
- Random (uniform) label assignments have a ARI score close to 0.0 for any value of n_clusters and n_samples (which is not the case for raw Rand index or the V-measure for instance).

  • Bounded range [-1, 1]: negative values are bad (independent labelings), similar clusterings have a positive ARI, 1.0 is the perfect match score.

  • No assumption is made on the cluster structure: can be used to compare clustering algorithms such as k-means which assumes isotropic blob shapes with results of spectral clustering algorithms which can find cluster with “folded” shapes.

缺点:

Contrary to inertia, ARI requires knowledge of the ground truth classes while is almost never available in practice or requires manual assignment by human annotators (as in the supervised learning setting).

However ARI can also be useful in a purely unsupervised setting as a building block for a Consensus Index that can be used for clustering model selection (TODO).

 

 

 

 

 

 

参考:

http://faculty.washington.edu/kayee/pca/supp.pdf

http://scikit-learn.org/stable/modules/clustering.html#adjusted-rand-index

 

 

 

 

 

--------------------------------------------------------------------------------------------------------

 

posted on 2019-05-17 10:30  Angry_Panda  阅读(7038)  评论(0编辑  收藏  举报

导航