机器学习回归算法之线性回归
一、概念
线性回归(Linear Regression)是回归算法中比较简单的一种,是一种监督学习算法,类似于逻辑回归,但是线性回归不需要Sigmoid函数处理。
线性回归会拟合出一条直线,这条线可以某种程度上代表这些点的发展趋势和分布,拟合出线后,就可以推测后续点的分布,从而实现预测。
二、计算
除 Sigmoid函数外类似逻辑回归。
三、实现
算法分别使用sklearn和自己实现的算法实现线性回归:
# !/usr/bin/env python # -*- coding: utf-8 -*- import numpy as np import matplotlib.pyplot as plt from sklearn import linear_model cs = ['black', 'blue', 'brown', 'red', 'yellow', 'green'] def create_sample(): np.random.seed(5) # 随机数种子,保证随机数生成的顺序一样 n_dim = 2 num = 100 k = 1 data_mat = 1 * np.random.randn(1, n_dim) for i in range(num - 1): k += 1 b = k * np.random.randn(1, n_dim) data_mat = np.concatenate((data_mat, b)) return {'data_mat': data_mat} def grad_ascent(data_mat, class_label, alpha): data_matrix = np.mat(data_mat).transpose() label_mat = np.mat(class_label).transpose() m, n = np.shape(data_matrix) data_matrix = augment(data_matrix) # 增广 n += 1 weight = np.ones((n, 1)) while True: error = data_matrix * weight - label_mat cha = alpha * data_matrix.transpose() * error if np.abs(np.sum(cha)) < 0.00001: break weight = weight - cha return np.asarray(weight).flatten() def augment(data_matrix): n, n_dim = data_matrix.shape a = np.mat(np.ones((n, 1))) return np.concatenate((data_matrix, a), axis=1) def plot_data(samples, color, plot_type='o'): plt.plot(samples[:, 0], samples[:, 1], plot_type, markerfacecolor=color, markersize=14) def sk_linear_regression(x, y): linear_regression = linear_model.LinearRegression() linear_regression.fit(x, y) return np.asarray((linear_regression.coef_, linear_regression.intercept_)).flatten() def main(): data = create_sample() weight_sk = sk_linear_regression(data['data_mat'][:, 0:1], data['data_mat'][:, 1:2]) print(weight_sk) weight = grad_ascent(data['data_mat'][:, 0], data['data_mat'][:, 1], 0.000001) print(weight) plot_data(data['data_mat'], 'red') lx = [-200, 200] ly = [-200 * weight[0] + weight[1], 200 * weight[0] + weight[1]] ly_sk = [-200 * weight_sk[0] + weight_sk[1], 200 * weight_sk[0] + weight_sk[1]] plt.plot(lx, ly) plt.plot(lx, ly_sk) plt.show() if __name__ == '__main__': main()
结果:
sklearn:[0.1165388985642626 3.958251157566739]
自己的算法:[0.11655941 3.85822306]
可以看出,差别不大,拟合出的线画出来也基本是重合的。