Coursera机器学习编程作业Python实现(Andrew Ng)—— 1.2 Linear regression with multiple variables

1.2 Linear regression with multiple variables

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

数据读取

data2 = pd.read_csv('ex1data2.txt', sep=',', header=None, names=['size', 'bedrooms', 'price'])

数据预处理

data2.iloc[:,:-1] = (data2.iloc[:,:-1] - data2.iloc[:,:-1].mean())/data2.iloc[:,:-1].std()
data2.insert(0, 'ones', 1)
X = data2.values[:,:-1]
y = data2.values[:,-1]
y = y.reshape((-1,1))

定义假设函数

def h(X, theta):
    return np.dot(X, theta)

定义代价函数

def computeCost(X, theta, y):
    return 0.5 * np.mean(np.square(h(X, theta) - y))

定义梯度下降函数

def gradientDescent(X, theta, y, iterations, alpha):
    Cost = []
    Cost.append(computeCost(X, theta, y))
    grad = np.zeros(len(theta))
    for _ in range(iterations):
        for j in range(len(theta)):
            grad[j] = np.mean((h(X, theta) - y) * (X[:,j].reshape([len(X), 1])))
        for k in range(len(theta)):
            theta[k] = theta[k] - alpha * grad[k]
        Cost.append(computeCost(X, theta, y))
    return theta, Cost

参数初始化

iterations = 200
lr = [1, 0.3, 0.1, 0.03, 0.01]
_,ax = plt.subplots(figsize=(10,6))
for l in lr:
    theta = np.zeros((X.shape[1], 1))
    _, Cost = gradientDescent(X, theta, y, iterations, l)
    ax.plot(Cost, label='lr=%.2f'%(l))
ax.set_xlabel('iterations')
ax.set_ylabel('Cost')
ax.legend()
plt.show()
theta = np.zeros((X.shape[1], 1))
theta_result, Cost_result = gradientDescent(X, theta, y, iterations, 0.3)
theta_result
array([[340412.65957447],
       [110631.05027879],
       [ -6649.47427076]])

正规方程

theta_ref = np.linalg.inv(X.T @ X) @ X.T @ y
theta_re
array([[340412.65957447],
       [110631.05027885],
       [ -6649.47427082]])

梯度下降和正规方程求出来的解非常接近。

posted on 2019-03-23 22:33  MuenPaPa  阅读(248)  评论(0编辑  收藏  举报