梯度下降

梯度讲解

更新过程的公式有问题，修改为：

a代表学习率或者说事步长

举例说明

假设

有样本点（4,20）、（8,50）、（5,30）、（10,70）、（12,60）

求回归函数

求解过程：

将样本点拆分

x=[4, 8, 5, 10, 12]

y = [20, 50, 30, 70, 60]

假设回归函数是线性函数：y = theta0 + theta1*x

x、y已知求theta0、theta1，则可以写成下面的目标函数，求目标函数的最小时的theta0和theta1的值

使用梯度下降法：

迭代得到新的theta0，theta1

将新的theta0，theta1带入目标函数得到新的目标函数值j1，与上一次theta0，theta1带入目标函数得到上一次的目标函数值j0；j1与j0相减小于一个阈值（很小的数）时，可以认为此时新的theta0，theta1就是所求的theta0，theta1。

代码如下：（python）

#y= theta0 + theta1*x
X = [4, 8, 5, 10, 12]
y = [20, 50, 30, 70, 60]
theta0 = theta1 = 0
#学习率 步长
alpha = 0.00001
#迭代次数
cnt = 0
#误差
error0=error1 = 0
#指定阈值用于检查两个误差的差 一遍用来停止迭代
threshold = 0.0000001
while True:
    #dif[0]为theta0的梯度， dif[1]为theta1的梯度
    dif = [0, 0]
    m = len(X)
    for i in range(m):
        dif[0] += y[i] - (theta0 + theta1*X[i])
        dif[1] += (y[i] - (theta0 + theta1*X[i])) * X[i]
        pass
    theta0 = theta0 + alpha*dif[0]
    theta1 = theta1 + alpha*dif[1]
    #计算误差
    for  i in range(m):
        error1 += (y[i] - (theta0 + theta1*X[i]))**2
        pass
    error1 /= m
    if abs(error1 - error0) <= threshold:
        break
    else:
        error0 = error1
        pass
    cnt += 1
    pass
print(theta0, theta1, cnt)
            
def predicty(theta0, theta1, x_test):
    return theta0 + theta1*x_test
print(predicty(theta0, theta1, 15))

结果：

练习：编程实现

#y= theta0*x0 + theta1*x1 + theta2*x2
#X = [[1,0,3],[1,1,3],[1,2,3],[1,3,2],[1,4,4]]
X0 = [1,1,1,1,1]
X1 = [0,1,2,3,4]
X2 = [3,3,3,2,4]
y = [95.364, 97.217205, 75.195834, 60.105519, 49.342380]
theta0 = theta1 = theta2 = 0
#学习率 步长
alpha = 0.00001
#迭代次数
cnt = 0
#误差
error0=error1 = 0
#指定阈值用于检查两个误差的差 一遍用来停止迭代
threshold = 0.000000001
while True:
    #dif[0]为theta0的梯度， dif[1]为theta1的梯度
    dif = [0, 0, 0]
    m = len(X)
    for i in range(m):
        dif[0] += y[i] - (theta0*x0[i] + theta1*X1[i] + theta2*X2[i])*X0[i]
        dif[1] += (y[i] - (theta0*x0[i] + theta1*X1[i] + theta2*X2[i]))*X1[i]
        dif[2] += (y[i] - (theta0*x0[i] + theta1*X1[i] + theta2*X2[i]))*X2[i]
        pass
    theta0 = theta0 + alpha*dif[0]
    theta1 = theta1 + alpha*dif[1]
    theta2 = theta2 + alpha*dif[2] 
    #计算误差
    for  i in range(m):
        error1 += (y[i] - (theta0 + theta1*X1[i] + theta2*X2[i]))**2
        pass
    error1 /= m
    if abs(error1 - error0) <= threshold:
        break
    else:
        error0 = error1
        pass
    cnt += 1
    pass
print(theta0, theta1, theta2, cnt)
            
def predicty(theta0, theta1, theta2,x1_test, x2_test):
    return theta0 + theta1*x1_test +theta2 * x2_test
print(predicty(theta0, theta1, theta2, 0,3))