【机器学习】K-Means
K-Means
找最接近的质心
公式
\[c^{(i)} := j \quad \mathrm{that \; minimizes} \quad ||x^{(i)} - \mu_j||^2
\]
其中,范式\(||X||\),其计算公式为
\[||X|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}
\]
代码
# UNQ_C1
# GRADED FUNCTION: find_closest_centroids
def find_closest_centroids(X, centroids):
"""
Computes the centroid memberships for every example
Args:
X (ndarray): (m, n) Input values
centroids (ndarray): k centroids
Returns:
idx (array_like): (m,) closest centroids
"""
# Set K
K = centroids.shape[0]
# You need to return the following variables correctly
idx = np.zeros(X.shape[0], dtype=int)
### START CODE HERE ###
for i in range(len(idx)):
distance = []
for j in range(K):
norm_ij = np.linalg.norm(X[i] - centroids[j]) # 求范数(即距离)
distance.append(norm_ij)
idx[i] = np.argmin(distance)
### END CODE HERE ###
return idx
计算质心平均值
公式
\[\mu_k = \frac{1}{|C_k|} \sum_{i \in C_k} x^{(i)}
\]
代码
# UNQ_C2
# GRADED FUNCTION: compute_centpods
def compute_centroids(X, idx, K):
"""
Returns the new centroids by computing the means of the
data points assigned to each centroid.
Args:
X (ndarray): (m, n) Data points
idx (ndarray): (m,) Array containing index of closest centroid for each
example in X. Concretely, idx[i] contains the index of
the centroid closest to example i
K (int): number of centroids
Returns:
centroids (ndarray): (K, n) New centroids computed
"""
# Useful variables
m, n = X.shape
# You need to return the following variables correctly
centroids = np.zeros((K, n))
### START CODE HERE ###
for k in range(K):
points = X[idx == k]
centroids[k] = np.mean(points, axis=0)
### END CODE HERE ##
return centroids