Python机器学习--回归
-
线性回归
# -*- coding: utf-8 -*- """ Created on Wed Aug 30 19:55:37 2017 @author: Administrator """ ''' 背景:与房价密切相关的除了单位的房价,还有房屋的尺寸。我们可以根 据已知的房屋成交价和房屋的尺寸进行线性回归,继而可以对已知房屋尺 寸,而未知房屋成交价格的实例进行成交价格的预测 ''' import matplotlib.pyplot as plt import numpy as np from sklearn import linear_model # 读取数据集 datasets_X = [] datasets_Y = [] fpath='F:\\RANJIEWEN\\MachineLearning\\Python机器学习实战_mooc\\data\\回归\\' fr = open(fpath+'prices.txt','r') lines = fr.readlines() for line in lines: items = line.strip().split(',') datasets_X.append(int(items[0])) datasets_Y.append(int(items[1])) length = len(datasets_X) datasets_X = np.array(datasets_X).reshape([length,1]) datasets_Y = np.array(datasets_Y) minX = min(datasets_X) maxX = max(datasets_X) X = np.arange(minX,maxX).reshape([-1,1]) linear = linear_model.LinearRegression() linear.fit(datasets_X, datasets_Y) # 图像中显示 plt.scatter(datasets_X, datasets_Y, color = 'red') plt.plot(X, linear.predict(X), color = 'blue') plt.xlabel('Area') plt.ylabel('Price') plt.show()
-
多项式回归
# -*- coding: utf-8 -*- """ Created on Wed Aug 30 20:24:09 2017 @author: Administrator """ ''' 我们在前面已经根据已知的房屋成交价和房屋的尺寸进行了线 性回归,继而可以对已知房屋尺寸,而未知房屋成交价格的实例进行了成 交价格的预测,但是在实际的应用中这样的拟合往往不够好,因此我们在 此对该数据集进行多项式回归。 ''' import matplotlib.pyplot as plt import numpy as np from sklearn import linear_model from sklearn.preprocessing import PolynomialFeatures # 读取数据集 datasets_X = [] datasets_Y = [] fpath='F:\\RANJIEWEN\\MachineLearning\\Python机器学习实战_mooc\\data\\回归\\' fr = open(fpath+'prices.txt','r') lines = fr.readlines() for line in lines: items = line.strip().split(',') datasets_X.append(int(items[0])) datasets_Y.append(int(items[1])) length = len(datasets_X) datasets_X = np.array(datasets_X).reshape([length,1]) datasets_Y = np.array(datasets_Y) minX = min(datasets_X) maxX = max(datasets_X) X = np.arange(minX,maxX).reshape([-1,1]) poly_reg = PolynomialFeatures(degree = 2) X_poly = poly_reg.fit_transform(datasets_X) lin_reg_2 = linear_model.LinearRegression() lin_reg_2.fit(X_poly, datasets_Y) # 图像中显示 plt.scatter(datasets_X, datasets_Y, color = 'red') plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue') plt.xlabel('Area') plt.ylabel('Price') plt.show()
-
岭回归
- 还有就是容易过拟合,才出现了岭回归,L2正则项
# -*- coding: utf-8 -*- """ Created on Wed Aug 30 20:33:00 2017 @author: Administrator """ ''' 数据介绍: 数据为某路口的交通流量监测数据,记录全年小时级别的车流量。 实验目的: 根据已有的数据创建多项式特征,使用岭回归模型代替一般的线性模型,对 车流量的信息进行多项式回归。 ''' import numpy as np from sklearn.linear_model import Ridge from sklearn import cross_validation import matplotlib.pyplot as plt from sklearn.preprocessing import PolynomialFeatures fpath='F:\RANJIEWEN\MachineLearning\Python机器学习实战_mooc\data\回归\岭回归.csv' data=pd.read_csv(fpath,encoding='gbk',parse_dates=[0],index_col=0) #data.sort_index(0,ascending=True,inplace=True) X=data.iloc[:,:4] ##语法 y=data.iloc[:,4] poly=PolynomialFeatures(6) #设置多项式的最高次数 X=poly.fit_transform(X) train_set_X,test_set_X,train_set_y,test_set_y= \ cross_validation.train_test_split(X,y,test_size=0.3,random_state=0) #设置测试集的比例,random_state随机数种子 clf=Ridge(alpha=1.0,fit_intercept=True) clf.fit(train_set_X,train_set_y) clf.score(test_set_X,test_set_y) #plot start=200 end=300 y_pre=clf.predict(X) time=np.arange(start,end) plt.plot(time,y[start:end],'b',label='real') plt.plot(time,y_pre[start:end],'r',label='predict') plt.legend(loc='upper left') plt.show()
- Lasso回归,添加L1正则项,具有稀疏解
C/C++基本语法学习
STL
C++ primer