【机器学习】协同过滤
Collaborative Filtering Recommender Systems
解决相似度问题
概念
准确率 = \(accuracy = \frac{预测正确的样本}{总样本}\)
精确率 = \(precision = \frac{预测成功的正类}{预测的正类}\) 【不能误检】
召回率 = \(recall = \frac{预测成功的正类}{总正类}\) 【不能漏报】
相似度
余弦定理相似度
\[Cosine = \frac{\sum^{n}_{i=1}A_i \times B_i}{\sqrt{\sum^{n}_{i=1}(A_i)^2} \times \sqrt{\sum ^{n}_{i=1}(B_i)^2}}
\]
def compute_cos(a, b):
cos = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
皮尔逊相关系数
两向量减均指,在计算consline的值
\[sim(i, j) = \frac{\sum_{p\in P}(R_{i, p} - \overline{R_i})R_{j, p} - \overline{R_J}}{\sqrt{\sum_{p\in P}(R_{i, p} - \overline{R_i})^2} \sqrt{\sum_{p\in P}(R_{j, p} - \overline{R_J})^2}}
\]
def compute_sim(a, b):
a = a - np.mean(b)
b = b - np.mean(b)
sim = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
Cost Function
公式
\[J = \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2
+\text{regularization}
\]
代码
# GRADED FUNCTION: cofi_cost_func
# UNQ_C1
def cofi_cost_func(X, W, b, Y, R, lambda_):
"""
Returns the cost for the content-based filtering
Args:
X (ndarray (num_movies,num_features)): matrix of item features
W (ndarray (num_users,num_features)) : matrix of user parameters
b (ndarray (1, num_users) : vector of user parameters
Y (ndarray (num_movies,num_users) : matrix of user ratings of movies
R (ndarray (num_movies,num_users) : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
lambda_ (float): regularization parameter
Returns:
J (float) : Cost
"""
nm, nu = Y.shape
J = 0
### START CODE HERE ###
for j in range(nu):
w = W[j, :]
b_j = b[0, j]
for i in range(nm):
x = X[i, :]
y = Y[i, j]
r = R[i, j]
J += r * np.square(np.dot(w, x) + b_j - y)
J += lambda_ * (np.sum(np.square(W)) + np.sum(np.square(X)))
J /= 2
print(J)
### END CODE HERE ###
return J