K-means algorithm----PRML读书笔记

    The K-means algorithm is based on the use of squared Euclidean distance as the measure of  dissimilarity between a data point and a prototype vector. Our goal is to partition the data set into some number K of clusters, where we shall suppose for the moment that the value of K is given. We can then define an objective function, sometimes called a distortion measure, given by J=ΣnΣkrnk||xnk||2,where n=1,...N, k=1,...,K, N is observations of a random D-dimensional Euclidean variable x, K is number of clusters. J represents the sum of the squares of the distances of each data point to its assigned vector μk. We can think of the μk as representing the centres of the clusters. Our goal is to find values for the {rnk} and the {μk} so as to minimize J. First we choose some initial values for the μk. Then in the first phase we minimize J with respect to the rnk, keeping the μk fixed. In the second phase we minimize J with respect to μk, keeping rnk fixed. This two-stage optimization is then repeated until convergence. We simply assign the nth data point to the closest cluster centre, this can be expressed as rnk=1,if k=argminj||xnj||2, otherwise rnk=0. The objective function J is a quadratic function of μk, and it can be minimized by setting its derivative with respect to μk to zero giving 2Σnrnk(xnk)=0. μk=(Σnrnkxn)/(Σnrnk), this result has a simple  interpretation, namely set μk equal to the mean of all of the data points xn assigned to cluster k. For this reason, the procedure is known as the K-means algorithm.

posted @   东宫得臣  阅读(173)  评论(0编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列:基于图像分类模型对图像进行分类
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
阅读排行:
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型,支持深度思考和联网搜索!
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· 25岁的心里话
· ollama系列01:轻松3步本地部署deepseek,普通电脑可用
· 按钮权限的设计及实现
点击右上角即可分享
微信分享提示