线性回归的改进-岭回归
带有L2正则化的线性回归-岭回归
岭回归,其实也是一种线性回归。只不过在算法建立回归方程时候,加上正则化的限制,从而达到解决过拟合的效果
API
sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True,solver="auto", normalize=False)
- 具有l2正则化的线性回归
- alpha:正则化力度,也叫 λ
- λ取值:0~1 1~10
- solver:会根据数据自动选择优化方法
- sag:如果数据集、特征都比较大,选择该随机梯度下降优化
- normalize:数据是否进行标准化
- normalize=False:可以在fit之前调用preprocessing.StandardScaler标准化数据
- Ridge.coef_:回归权重
- Ridge.intercept_:回归偏置
Ridge方法相当于SGDRegressor(penalty='l2', loss="squared_loss"),只不过SGDRegressor实现了一个普通的随机梯度下降学习,推荐使用Ridge(实现了SAG)
sklearn.linear_model.RidgeCV(_BaseRidgeCV, RegressorMixin)
- 具有l2正则化的线性回归,可以进行交叉验证
- coef_:回归系数
观察正则化程度的变化,对结果的影响?
- 正则化力度越大,权重系数会越小
- 正则化力度越小,权重系数会越大
波士顿房价预测
from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge from sklearn.metrics import mean_squared_error from sklearn.externals import joblib def linear3(): """ 岭回归对波士顿房价进行预测 :return: """ # 1)获取数据 boston = load_boston() print("特征数量:\n", boston.data.shape) # 2)划分数据集 x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=22) # 3)标准化 transfer = StandardScaler() x_train = transfer.fit_transform(x_train) x_test = transfer.transform(x_test) # 4)预估器 estimator = Ridge(alpha=0.5, max_iter=10000) estimator.fit(x_train, y_train) # 保存模型 # joblib.dump(estimator, "my_ridge.pkl") # 加载模型 #estimator = joblib.load("my_ridge.pkl") # 5)得出模型 print("岭回归-权重系数为:\n", estimator.coef_) print("岭回归-偏置为:\n", estimator.intercept_) # 6)模型评估 y_predict = estimator.predict(x_test) print("预测房价:\n", y_predict) error = mean_squared_error(y_test, y_predict) print("岭回归-均方误差为:\n", error) return None if __name__ == "__main__": # 代码3:岭回归对波士顿房价进行预测 linear3()
结果:
岭回归-权重系数为: [-0.62710135 1.13221555 -0.07373898 0.74492864 -1.93983515 2.71141843 -0.07982198 -3.27753496 2.44876703 -1.81107644 -1.74796456 0.88083243 -3.91211699] 岭回归-偏置为: 22.62137203166228 预测房价: [28.23082349 31.50636545 21.12739377 32.65793823 20.02076945 19.06632771 21.106687 19.61624365 19.63161548 32.86596512 20.9946695 27.50329913 15.55414648 19.79639417 36.88392371 18.80672342 9.38096 18.50907253 30.67484295 24.30753141 19.0666843 34.09564382 29.80095002 17.51949727 34.8916544 26.5394645 34.68264723 27.42856108 19.09405963 14.98997618 30.8505874 15.81996969 37.18247113 7.85916465 16.25653448 17.15490009 7.48867279 19.99147768 40.57329959 28.95128807 25.25723034 17.73738109 38.75700749 6.87711291 21.78043375 25.27159224 20.45456114 20.48220948 17.25258857 26.1375367 8.5448374 27.49204889 30.58183066 16.58438621 9.37182303 35.52269097 32.24958654 21.87431027 17.60876103 22.08124517 23.50114904 24.09591554 20.15605099 38.49857046 24.64026646 19.75933465 13.91713858 6.78030217 42.04984214 21.92558236 16.8702938 22.59592875 40.74980559 21.4284924 36.88064128 27.18855416 21.04326386 20.36536628 25.36109432 22.27869444 31.14592486 20.39487869 23.99757481 31.54428168 26.76210157 20.89486664 29.07215993 21.99603204 26.30599891 20.11183257 25.47912071 24.0792631 19.89111149 16.56247916 15.22770226 18.38342191 24.82070397 16.60156656 20.86675004 26.71162923 20.74443479 17.8825254 24.28515984 23.37007961 21.58413976 36.79386382 15.88357121 21.47915185 32.79931234 33.71603437 20.62134398 26.83678658 22.68850452 17.37312422 21.67296898 21.67559608 27.66601539 25.0712154 23.73692967 14.64799906 15.21577315 3.82030283 29.17847194 20.66853036 22.33184243 28.0180608 28.56771983] 岭回归-均方误差为: 20.644810227653515 Process finished with exit code 0