随笔- 4974 文章- 0 评论- 338 阅读- 917万

机器学习准备---1、简单线性回归（最小二乘法实例）

打赏

一、总结

一句话总结：

1、在本例中，最小二乘法就是计算损失的，就是求出w和b之后计算这对w和b对应的损失（因为本例中w和b是用公式可以求的）

2、而在tensorflow2的例子中，因为w和b是多次试探，所以每次试探的结果就是使最小二乘法对应的损失函数最小

1、损失函数就是用最小二乘法来做的？

核心代码：total_cost += (y-w*x-b)**2

# 损失函数是系数的函数，另外还要传入数据的x,y
def compute_cost(w,b,points):
    total_cost=0
    M =len(points)
    for i in range(M):
        x=points.iloc[i,1]
        y=points.iloc[i,2]
        # y1=wx+b
        # 最小二乘法也就是求(y-y1)^2 
        total_cost += (y-w*x-b)**2
        print("i={}, x={}, y={}, y-w*x-b={}, total_cost={}".format(i,x,y,y-w*x-b,total_cost))
    return total_cost/M #一除都是浮点 两个除号是地板除，整型。 如 3 // 4

2、做拟合就是求出w和b，而w和b是有公式可以计算的？

核心代码：w = sum_yx/(sum_x2-M*(x_bar**2))

#定义核心拟合函数
# 也就是求w和b
def fit1(points):
    M = len(points)
    x_bar=np.mean(points.iloc[:,1])
    y_bar=np.mean(points.iloc[:,2])
    # print("x_bar={}".format(x_bar))
    sum_yx= 0
    sum_x2=0
    sum_delta =0
    for i in range(M):
        x=points.iloc[i,1]
        y=points.iloc[i,2]
        sum_yx += y*x-x_bar*y_bar
        sum_x2 += x**2
    #根据公式计算w
    w = sum_yx/(sum_x2-M*(x_bar**2))
    
    # b的求法就是：b = 总误差 / 总样本数 
    
    for i in range(M):
        x=points.iloc[i,1]
        y=points.iloc[i,2]
        sum_delta += (y-w*x)
    b = sum_delta / M
    return w,b
# 4.测试
points = data
w,b =fit1(points)
print ("w is :",w)
print ("b is :",b)

回到顶部

二、简单线性回归（最小二乘法实例）

博客对应课程的视频位置：

# 0.引入依赖
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:

# 1.导入数据
data = pd.read_csv('../dataset/income.csv')
data

Out[2]:

	Unnamed: 0	Education	Income
0	1	10.000000	26.658839
1	2	10.401338	27.306435
2	3	10.842809	22.132410
3	4	11.244147	21.169841
4	5	11.645449	15.192634
5	6	12.086957	26.398951
6	7	12.048829	17.435307
7	8	12.889632	25.507885
8	9	13.290970	36.884595
9	10	13.732441	39.666109
10	11	14.133779	34.396281
11	12	14.635117	41.497994
12	13	14.978589	44.981575
13	14	15.377926	47.039595
14	15	15.779264	48.252578
15	16	16.220736	57.034251
16	17	16.622074	51.490919
17	18	17.023411	51.336621
18	19	17.464883	57.681998
19	20	17.866221	68.553714
20	21	18.267559	64.310925
21	22	18.709030	68.959009
22	23	19.110368	74.614639
23	24	19.511706	71.867195
24	25	19.913043	76.098135
25	26	20.354515	75.775216
26	27	20.755853	72.486055
27	28	21.167191	77.355021
28	29	21.598662	72.118790
29	30	22.000000	80.260571

In [3]:

print(data.iloc[0,0])
print(data.iloc[1,1])
print(data.iloc[2,2])
print(len(data))

In [4]:

%matplotlib inline
plt.scatter(data.Education,data.Income)

Out[4]:

<matplotlib.collections.PathCollection at 0x15282d68508>

In [5]:

# 2.定义损失函数
# 损失函数是系数的函数，另外还要传入数据的x,y
def compute_cost(w,b,points):
    total_cost=0
    M =len(points)
    for i in range(M):
        x=points.iloc[i,1]
        y=points.iloc[i,2]
        # y1=wx+b
        # 最小二乘法也就是求(y-y1)^2 
        total_cost += (y-w*x-b)**2
        print("i={}, x={}, y={}, y-w*x-b={}, total_cost={}".format(i,x,y,y-w*x-b,total_cost))
    return total_cost/M #一除都是浮点 两个除号是地板除，整型。 如 3 // 4

In [23]:

# 测试损失函数
compute_cost(1,1,data)

i=0, x=10.0, y=26.658839, y-w*x-b=15.658839, total_cost=245.19923882792102
i=1, x=10.401338, y=27.306434999999997, y-w*x-b=15.905096999999998, total_cost=498.17134940732996
i=2, x=10.842808999999999, y=22.13241, y-w*x-b=10.289601000000001, total_cost=604.047238146531
i=3, x=11.244147, y=21.169841, y-w*x-b=8.925694000000002, total_cost=683.715251528167
i=4, x=11.645449000000001, y=15.192634, y-w*x-b=2.547184999999999, total_cost=690.2034029523919
i=5, x=12.086957, y=26.398951, y-w*x-b=13.311994, total_cost=867.4125872084279
i=6, x=12.048829, y=17.435307, y-w*x-b=4.386478000000002, total_cost=886.6537764529119
i=7, x=12.889632, y=25.507885, y-w*x-b=11.618253000000001, total_cost=1021.637579224921
i=8, x=13.290970000000002, y=36.884595000000004, y-w*x-b=22.593625000000003, total_cost=1532.109469865546
i=9, x=13.732441, y=39.666109000000006, y-w*x-b=24.933668000000004, total_cost=2153.79726979977
i=10, x=14.133779, y=34.396281, y-w*x-b=19.262502, total_cost=2524.8412530997743
i=11, x=14.635117000000001, y=41.497994, y-w*x-b=25.862876999999997, total_cost=3193.729659816903
i=12, x=14.978589000000001, y=44.981575, y-w*x-b=29.002986, total_cost=4034.9028567330993
i=13, x=15.377926, y=47.039595, y-w*x-b=30.661668999999996, total_cost=4975.04080259866
i=14, x=15.779264000000001, y=48.252578, y-w*x-b=31.473314000000002, total_cost=5965.610296741256
i=15, x=16.220736, y=57.034251, y-w*x-b=39.813514999999995, total_cost=7550.726273396481
i=16, x=16.622073999999998, y=51.490919, y-w*x-b=33.868845, total_cost=8697.824935030505
i=17, x=17.023411, y=51.336621, y-w*x-b=33.31321, total_cost=9807.594895534605
i=18, x=17.464883, y=57.681998, y-w*x-b=39.217115, total_cost=11345.57700445783
i=19, x=17.866221, y=68.553714, y-w*x-b=49.687493, total_cost=13814.423965082879
i=20, x=18.267559, y=64.310925, y-w*x-b=45.043366, total_cost=15843.328785692835
i=21, x=18.70903, y=68.959009, y-w*x-b=49.249978999999996, total_cost=18268.889217193275
i=22, x=19.110367999999998, y=74.614639, y-w*x-b=54.504271, total_cost=21239.604774434716
i=23, x=19.511706, y=71.867195, y-w*x-b=51.35548899999999, total_cost=23876.991024863837
i=24, x=19.913043, y=76.098135, y-w*x-b=55.185092, total_cost=26922.3854039123
i=25, x=20.354515, y=75.775216, y-w*x-b=54.420701, total_cost=29883.998101243702
i=26, x=20.755853, y=72.48605500000001, y-w*x-b=50.730202000000006, total_cost=32457.551496204505
i=27, x=21.167191, y=77.355021, y-w*x-b=55.18782999999999, total_cost=35503.24807631341
i=28, x=21.598662, y=72.11879, y-w*x-b=49.520128, total_cost=37955.491153449795
i=29, x=22.0, y=80.260571, y-w*x-b=57.260571, total_cost=41234.26414469584

Out[23]:

1374.4754714898613

用第二个公式求 b帽子

In [6]:

# 3.定义核心算法拟合函数
# 先定义一个求均值的函数 问题 求均值是不是可以直接用np.mean（data）来实现？
# def average(data):
#     sum=0
#     num=len(data)
#     for i in range(num):
#         sum += data[i]
#     return sum/num
# print(average(x))
# print(np.mean(x))
#打印出来结果一样，可以通用。

#定义核心拟合函数
# 也就是求w和b
def fit(points):
    M = len(points)
    x_bar=np.mean(points.iloc[:,1])
    # print("x_bar={}".format(x_bar))
    sum_yx= 0
    sum_x2=0
    sum_delta =0
    for i in range(M):
        x=points.iloc[i,1]
        y=points.iloc[i,2]
        sum_yx += y*(x-x_bar)
        # 与sum_yx += y*x-x_bar*y_bar 是一样的
        # x_bar*y_bar=x_bar*n*(y1+y2+..+yn)/n=x_bar*yi
        sum_x2 += x**2
    #根据公式计算w
    w = sum_yx/(sum_x2-M*(x_bar**2))
    
    # b的求法就是：b = 总误差 / 总样本数 
    
    for i in range(M):
        x=points.iloc[i,1]
        y=points.iloc[i,2]
        sum_delta += (y-w*x)
    b = sum_delta / M
    return w,b

In [7]:

#定义核心拟合函数
# 也就是求w和b
def fit1(points):
    M = len(points)
    x_bar=np.mean(points.iloc[:,1])
    y_bar=np.mean(points.iloc[:,2])
    # print("x_bar={}".format(x_bar))
    sum_yx= 0
    sum_x2=0
    sum_delta =0
    for i in range(M):
        x=points.iloc[i,1]
        y=points.iloc[i,2]
        sum_yx += y*x-x_bar*y_bar
        sum_x2 += x**2
    #根据公式计算w
    w = sum_yx/(sum_x2-M*(x_bar**2))
    
    # b的求法就是：b = 总误差 / 总样本数 
    
    for i in range(M):
        x=points.iloc[i,1]
        y=points.iloc[i,2]
        sum_delta += (y-w*x)
    b = sum_delta / M
    return w,b
# 4.测试
points = data
w,b =fit1(points)
print ("w is :",w)
print ("b is :",b)

w is : 5.564068237721681
b is : -39.14888093981615

In [8]:

# 4.测试
points = data
w,b =fit(points)
print ("w is :",w)
print ("b is :",b)
cost = compute_cost(w,b,points)
print("cost is :" ,cost)

w is : 5.564068237721692
b is : -39.14888093981633
i=0, x=10.0, y=26.658839, y-w*x-b=10.167037562599411, total_cost=103.36865279930737
i=1, x=10.401338, y=27.306434999999997, y-w*x-b=8.581561544208657, total_cost=177.01185133634823
i=2, x=10.842808999999999, y=22.13241, y-w*x-b=0.9511617752334374, total_cost=177.91656005901345
i=3, x=11.244147, y=21.169841, y-w*x-b=-2.244479243157322, total_cost=182.95424713197752
i=4, x=11.645449000000001, y=15.192634, y-w*x-b=-10.454557955091524, total_cost=292.25202916834496
i=5, x=12.086957, y=26.398951, y-w*x-b=-1.704821594591543, total_cost=295.1584458377306
i=6, x=12.048829, y=17.435307, y-w*x-b=-10.45631880082368, total_cost=404.49304870218936
i=7, x=12.889632, y=25.507885, y-w*x-b=-7.062026067304792, total_cost=454.3652608774818
i=8, x=13.290970000000002, y=36.884595000000004, y-w*x-b=2.081611914304453, total_cost=458.698369039256
i=9, x=13.732441, y=39.666109000000006, y-w*x-b=2.406751145329231, total_cost=464.4908201147996
i=10, x=14.133779, y=34.396281, y-w*x-b=-5.09614887306153, total_cost=490.4615534512059
i=11, x=14.635117000000001, y=41.497994, y-w*x-b=-0.783914715224455, total_cost=491.07607573195133
i=12, x=14.978589000000001, y=44.981575, y-w*x-b=0.7885646390288059, total_cost=491.69790992187797
i=13, x=15.377926, y=47.039595, y-w*x-b=0.6246463211817428, total_cost=492.08809294844383
i=14, x=15.779264000000001, y=48.252578, y-w*x-b=-0.39544269720901326, total_cost=492.24446787521975
i=15, x=16.220736, y=57.034251, y-w*x-b=5.929849969747529, total_cost=527.4075885389345
i=16, x=16.622073999999998, y=51.490919, y-w*x-b=-1.8465540486432133, total_cost=530.8173503934952
i=17, x=17.023411, y=51.336621, y-w*x-b=-4.233918502965736, total_cost=548.7434162832508
i=18, x=17.464883, y=57.681998, y-w*x-b=-0.3449218360092061, total_cost=548.8623873562068
i=19, x=17.866221, y=68.553714, y-w*x-b=8.29372214560005, total_cost=617.6482143846235
i=20, x=18.267559, y=64.310925, y-w*x-b=1.817861127209305, total_cost=620.9528334624422
i=21, x=18.70903, y=68.959009, y-w*x-b=4.009570358234065, total_cost=637.0294879200715
i=22, x=19.110367999999998, y=74.614639, y-w*x-b=7.432128339843324, total_cost=692.2660195799738
i=23, x=19.511706, y=71.867195, y-w*x-b=2.451612321452565, total_cost=698.2764225546719
i=24, x=19.913043, y=76.098135, y-w*x-b=4.449485867130072, total_cost=718.0743470364621
i=25, x=20.354515, y=75.775216, y-w*x-b=1.6701865340865893, total_cost=720.8638700951062
i=26, x=20.755853, y=72.48605500000001, y-w*x-b=-3.8520464843041466, total_cost=735.7021322123462
i=27, x=21.167191, y=77.355021, y-w*x-b=-1.2717931850721342, total_cost=737.3195901179421
i=28, x=21.598662, y=72.11879, y-w*x-b=-8.908758271670145, total_cost=816.6855640609933
i=29, x=22.0, y=80.260571, y-w*x-b=-3.0000492900608933, total_cost=825.6858598037882
cost is : 27.522861993459607

In [26]:

x = data.Education
y = data.Income
plt.scatter(x,y)
pred_y= w*x+b
plt.plot(x,pred_y,c='r')

Out[26]:

[<matplotlib.lines.Line2D at 0x1e2c64637c8>]

为什么用tersorflow2做的线性回归的效果没有本例中的好

因为本例是使用公式直接计算的w和b，而tensorflow2中是一个值一个值的去找的

In [ ]:

posted @ 2020-07-22 14:19 范仁义阅读(241) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

公告

昵称：范仁义
园龄： 7年9个月
粉丝： 2153
关注： 2

+加关注

2025年2月

日

一

二

三

四

五

六

范仁义

在校每年国奖、每年专业第一，加拿大留学，先后工作于华东师范大学和香港教育大学
在校每年国奖、每年专业第一，加拿大留学，先后工作于华东师范大学和香港教育大学

机器学习准备---1、简单线性回归（最小二乘法实例）

机器学习准备---1、简单线性回归（最小二乘法实例）

目录

一、总结

一句话总结：

1、在本例中，最小二乘法就是计算损失的，就是求出w和b之后计算这对w和b对应的损失（因为本例中w和b是用公式可以求的）

2、而在tensorflow2的例子中，因为w和b是多次试探，所以每次试探的结果就是使最小二乘法对应的损失函数最小

1、损失函数就是用最小二乘法来做的？

核心代码：total_cost += (y-w*x-b)**2

2、做拟合就是求出w和b，而w和b是有公式可以计算的？

核心代码：w = sum_yx/(sum_x2-M*(x_bar**2))

二、简单线性回归（最小二乘法实例）

作者相关推荐

公告

搜索

常用链接

我的标签

积分与排名

随笔分类 (5775)

随笔档案 (4974)

阅读排行榜

评论排行榜

推荐排行榜

最新评论

范仁义

在校每年国奖、每年专业第一，加拿大留学，先后工作于华东师范大学和香港教育大学 在校每年国奖、每年专业第一，加拿大留学，先后工作于华东师范大学和香港教育大学

机器学习准备---1、简单线性回归（最小二乘法实例）

机器学习准备---1、简单线性回归（最小二乘法实例）

目录

一、总结

一句话总结：

1、在本例中，最小二乘法就是计算损失的，就是求出w和b之后计算这对w和b对应的损失（因为本例中w和b是用公式可以求的）

2、而在tensorflow2的例子中，因为w和b是多次试探，所以每次试探的结果就是使最小二乘法对应的损失函数最小

1、损失函数就是用最小二乘法来做的？

核心代码：total_cost += (y-w*x-b)**2

2、做拟合就是求出w和b，而w和b是有公式可以计算的？

核心代码：w = sum_yx/(sum_x2-M*(x_bar**2))

二、简单线性回归（最小二乘法实例）

公告

搜索

常用链接

我的标签

积分与排名

随笔分类 (5775)

随笔档案 (4974)

阅读排行榜

评论排行榜

推荐排行榜

最新评论

在校每年国奖、每年专业第一，加拿大留学，先后工作于华东师范大学和香港教育大学
在校每年国奖、每年专业第一，加拿大留学，先后工作于华东师范大学和香港教育大学