Linear Regression with machine learning methods

By leading into the machine learning methods, this passage introduced three methods to get optimal k and b of linear regression(y = k*x + b).


  • What is machine learning? you design methods for machine to learn itself and improve itself.
  • By leading into the machine learning methods, this passage introduced three methods to get optimal k and b of linear regression(y = k*x + b).
  • The data used is produced by ourselves.
  1. Self-sufficient data generation
  2. Random Chosen Method
  3. Supervised Direction Method
  4. Gradient Descent Method
  5. Conclusion

Life is simple


import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import random

#produce data
age_with_fares = pd.DataFrame({"Fare":[263.0, 247.5208, 146.5208, 153.4625, 135.6333, 247.5208, 164.8667, 134.5, 135.6333, 153.4625, 134.5, 263.0, 211.5, 263.0, 151.55, 153.4625, 227.525, 211.3375, 211.3375],
                          "Age":[23.0, 24.0, 58.0, 58.0, 35.0, 50.0, 31.0, 40.0, 36.0, 38.0, 41.0, 24.0, 27.0, 64.0, 25.0, 40.0, 38.0, 29.0, 43.0]})
sub_fare = age_with_fares['Fare']
sub_age = age_with_fares['Age']

#show our data


def func(age, k, b): return k*age+b
def loss(y,yhat): return np.mean(np.abs(y-yhat))
#here we choose only minus methods as the loss, besides, there are mean-square-error(L2) loss and other loss methods


min_error_rate = float('inf')

loop_times = 10000
losses = []

def step(): return random.random() * 2 - 1
# random生成 0~1的随机数;(0,1)*2 -> (0,2); 再减1 -> (-1,1), 随机生成+循环:学习动力来源

while loop_times > 0:
    k_hat = random.random() * 20 - 10
    b_hat = random.random() * 20 - 10
    estimated_fares = func(sub_age, k_hat, b_hat)
    error_rate = loss(y=sub_fare, yhat=estimated_fares)
    if error_rate<min_error_rate:# 自我监督机制体现在此
        min_error_rate = error_rate
        best_k = k_hat
        best_b = b_hat

    loop_times -= 1

plt.scatter(sub_age, sub_fare)
plt.plot(sub_age, func(sub_age, best_k, best_b), c = 'r')


show the loss change

plt.plot(range(len(losses)), losses)



  • We can see the loss decrease sometimes quickly, sometimes slowly, anyway, it decreases finally.
  • One shortcoming of this method: the Random Chosen methods is not so valid as it runs random function tons of time.
  • Because even when it comes out a better parameter, it may choose a worse one next time.
  • One improved method see next part.


change_directions = [
    (+1, -1),# k increase, b decrease
    (+1, +1),
    (-1, -1),
    (-1, +1)
min_error_rate = float('inf')

loop_times = 10000
losses = []

best_direction = random.choice(change_directions)
def step(): return random.random()*2-1 
#random生成 0~1的随机数;(0,1)*2 -> (0,2); 再减1 -> (-1,1);
#但是change_directions已经有加减1(改变方向)的操作,所以去掉 *2-1
#但保留*2-1 能增加choise

k_hat = random.random() * 20 - 10
b_hat = random.random() * 20 - 10
best_k, best_b = k_hat, b_hat
while loop_times > 0:
    k_delta_direction, b_delta_direction = best_direction or random.choice(change_directions)
    k_delta = k_delta_direction * step()
    b_delta = b_delta_direction * step()

    new_k = best_k + k_delta
    new_b = best_b + b_delta

    estimated_fares = func(sub_age, new_k, new_b)
    error_rate = loss(y=sub_fare, yhat=estimated_fares)

    if error_rate < min_error_rate:#supervisor learning
        min_error_rate = error_rate
        best_k, best_b = new_k, new_b

        best_direction = (k_delta_direction, b_delta_direction)

        #print("loop == {}".format(loop_times))
        #print("f(age) = {} * age + {}, with error rate: {}".format(best_k, best_b, error_rate))
        best_irection = random.choice(list(set(change_directions)-{(k_delta_direction, b_delta_direction)}))
    loop_times -= 1
print("f(age) = {} * age + {}, with error rate: {}".format(best_k, best_b, error_rate))    
plt.scatter(sub_age, sub_fare)
plt.plot(sub_age, func(sub_age, best_k, best_b), c = 'r')


show the loss change

plt.plot(range(len(losses)), losses)



  • The Supervised Direction method(2nd method) is better than Random Chosen method(1st method).
  • The 2nd method introduced supervise mechanism, which is more efficiently in changing parameters k and b.
  • But the 2nd method can't optimize the parameters to smaller magnitude.
  • Besides, the 2nd method can't find the extreme value, thus can't find the optimal parameters effectively.


min_error_rate = float('inf')
loop_times = 10000
losses = []
learing_rate = 1e-1

change_directions = [
    # (k, b)
    (+1, -1), # k increase, b decrease
    (+1, +1),
    (-1, +1),
    (-1, -1)  # k decrease, b decrease

k_hat = random.random() * 20 - 10
b_hat = random.random() * 20 - 10

best_direction = None
def step(): return random.random() * 1
direction = random.choice(change_directions)

def derivate_k(y, yhat, x):
    abs_values = [1 if (y_i - yhat_i) > 0 else -1 for y_i, yhat_i in zip(y, yhat)]

    return np.mean([a * -x_i for a, x_i in zip(abs_values, x)])

def derivate_b(y, yhat):
    abs_values = [1 if (y_i - yhat_i) > 0 else -1 for y_i, yhat_i in zip(y, yhat)]
    return np.mean([a * -1 for a in abs_values])

while loop_times > 0:

    k_delta = -1 * learing_rate * derivate_k(sub_fare, func(sub_age, k_hat, b_hat), sub_age)
    b_delta = -1 * learing_rate * derivate_b(sub_fare, func(sub_age, k_hat, b_hat))

    k_hat += k_delta
    b_hat += b_delta

    estimated_fares = func(sub_age, k_hat, b_hat)
    error_rate = loss(y=sub_fare, yhat=estimated_fares)

    #print('loop == {}'.format(loop_times))
    #print('f(age) = {} * age  {}, with error rate: {}'.format(k_hat, b_hat, error_rate))

    loop_times -= 1

print('f(age) = {} * age  {}, with error rate: {}'.format(k_hat, b_hat, error_rate))
plt.scatter(sub_age, sub_fare)
plt.plot(sub_age, func(sub_age, k_hat, b_hat), c = 'r')


show the loss change

plt.plot(range(len(losses)), losses)



  • To fit the objective function given discrete data, we use the loss function to determine how good the fit is.
  • In order to get the minimum loss, it becomes a problem of finding the extremum without constraints.
  • Therefore, the method of gradient reduction of the objective function is conceived.
  • The gradient is the maximum value in the directional derivative.
  • When the gradient approaches 0, we fit the better objective function.


  • Machine learning is a process to make the machine learning and improving by methods designed by us.
  • Random function usually not so efficient, but when we add supervise mechanism, it becomes efficient.
  • Gradient Descent is efficiently to find extreme value and optimal.

Serious question for this article:
Why do you use machine learning methods instead of creating a y = k*x + b formula?

  • In some senarios, complicated formula can't meet the reality needs, like irrational elements in economics models.
  • When we have enough valid data, we can run regression or classification model by machine learning methods
  • We can also evaluate our machine learning model by test data which contributes to the application of the model in our real life
  • This is just an example, Okay.

Reference for this article: Jupyter Notebook

