PCA的算法实现(更新)
1. 直推式的PCA
基本步骤:
- 对样本数据进行中心化处理(这步操作比较重要,特别是对推导公式)
- 求样本的协方差矩阵;
- 对样本的协方差矩阵进行特征值分解,并通过前k个特征值对应的特征向量进行映射:
PCA的优化目标是:
X = D + N,即低秩矩阵D和独立同分布的Gaussian噪声;
1 def pca(X, d): 2 """直推式RPCA 3 input: 4 X: vector samples(row) 5 d: target dimension 6 output: 7 reduced vector samples 8 """ 9 # 1. zero-equalization 10 size = X.shape[0] 11 mean_x = np.mean(X) 12 new_X = np.array([x - mean_x for x in X]) # 零均值化 13 # 2.Cov(X) 14 tmp_a = new_X.reshape(size, 1) 15 tmp_b = new_X.reshape(1, size) 16 cov_X = np.dot(tmp_a, tmp_b) # 协方差矩阵 17 # 3.eig_value & eig_vectors 18 e_val, e_vecs = linalg.eig(cov_X) 19 eval_idx = np.argsort(e_val)[::-1] 20 # 4.取出前d大的特征值对应特征向量 21 sorted_eval = [e_val[i] for i in eval_idx] 22 sorted_vec = e_vecs[:, eval_idx] 23 d_vec = sorted_vec[:, :d] # 每个特征向量对应一个主成分方向 24 # 5. X降维结果 25 reduced_X = np.dot(d_vec.T, X) 26 i = np.dot(d_vec.T, d_vec) 27 28 return reduced_X, i 29 30 31 # -----------------------Test Part------------------------- 32 if __name__ == '__main__': 33 # a = np.array(([1, 2, 3], 34 # [2, 5, 6], 35 # [3, 6, 9])) 36 a = np.array(([1,2,3,4,5,6,7,8])) 37 a,i = pca(a,4) 38 print(a) 39 print(i)
Notes:
# 一维矩阵相乘
将向量都reshape成n*1和1*n的形式
a = a.reshape(size, 1)
b = a.reshape(1, size)
np.dot(a, b)
2.RPCA
# 这里代码还没写
RPCA的优化目标是:
D=L+SD=L+S,即低秩矩阵L和稀疏尖锐噪声矩阵S