线性回归推理与Matlab实现

1 线性回归

线性回归是最简单的机器学习模型,其形式简单,易于实现,同时也是很多机器学习模型的基础。对于给定的数据集,线性回归的目标是找到一个与这些数据最为吻合的线性函数。

1.1 一元线性回归

(1) 模型假设

\[y=wx+b \]

其中,\(x\), \(y\)是标量。
(2) 评价标准-均方差损失

\[L = \frac{1}{2m}\sum_{i=1}^{m}(y_i - \hat{y_i})^2 \\ L(w,b) = \frac{1}{2m}\sum_{i=1}^{m}(y_i - (wx_i + b))^2 \\ \]

求导:

\[\begin{aligned} \frac{\partial{L(w,b)}}{\partial{w}}&=\frac{1}{2m}\sum_{i=1}^{m}2\times(y_i - (wx_i + b))(-x_i) \\ &=\frac{1}{m}\sum_{i=1}^{m}((wx_i+b)x_i - y_ix_i) \end{aligned}\tag{1}\]

\[\begin{aligned} \frac{\partial{L(w,b)}}{\partial{b}}&=\frac{1}{2m}\sum_{i=1}^{m}2\times((wx_i + b) - y_i) \\ &=\frac{1}{m}\sum_{i=1}^{m}((wx_i+b) - y_i) \end{aligned}\tag{2}\]

令(1)和(2)分别为0

\[\begin{aligned} \frac{1}{m}\sum_{i=1}^{m}(wx_i + b - y_i)&=0 \\ \frac{1}{m}\sum_{i=1}^{m}wx_i + \frac{1}{m}\sum_{i=1}^{m}b - \frac{1}{m}\sum_{i=1}^{m}y_i&=0 \\ w\frac{1}{m}\sum_{i=1}^{m}x_i + b - \frac{1}{m}\sum_{i=1}^{m}y_i&=0 \\ b = \frac{1}{m}\sum_{i=1}^{m}(y_i-wx_i) \end{aligned}\]

\[\begin{aligned} \frac{1}{m}\sum_{i=1}^{m}((wx_i+b)x_i - y_ix_i)&=0 \\ \frac{1}{m}\sum_{i=1}^{m}((wx_i+\overline{y} - w\overline{x})x_i - y_ix_i)&=0 \\ \frac{1}{m}\sum_{i=1}^{m}(wx_i^2+\overline{y}x_i - w\overline{x}x_i - y_ix_i)&=0 \\ \frac{1}{m}\sum_{i=1}^{m}w(x_i-\overline{x})x_i - \frac{1}{m}\sum_{i=1}^{m}(y_i-\overline{y})x_i&=0 \\ \end{aligned}\]

\[\begin{aligned} w\sum_{i=1}^{m}(x_i-\overline{x})x_i &= \sum_{i=1}^{m}(y_i-\overline{y})x_i \\ w &= \frac{\sum_{i=1}^{m}(y_i-\overline{y})x_i}{\sum_{i=1}^{m}(x_i-\overline{x})x_i} \\ w &= \frac{\sum_{i=1}^{m}(y_ix_i - \overline{y}x_i)}{\sum_{i=1}^{m}x_i^{2} - m\overline{x}^2} \\ w &= \frac{\sum_{i=1}^{m}y_ix_i - \overline{y}\sum_{i=1}^{m}x_i}{\sum_{i=1}^{m}x_i^{2} - m\overline{x}^2} \\ w &= \frac{\sum_{i=1}^{m}y_ix_i - \overline{x}\sum_{i=1}^{m}y_i}{\sum_{i=1}^{m}x_i^{2} - m\overline{x}^2} \end{aligned}\]

1.2 多元线性回归

(1) 模型假设

\[y = \vec{w}^T\vec{x} + b \tag{5} \]

其中,\(w \in R^n\)\(b \in R\)为模型参数,也称为回归系数。为了方便,通常将\(b\)纳入权向量\(\vec{w}\),作为\(w_0\),同时为输入向量\(\vec{x}\)添加一个常数1作为\(x_0\),模型可以简化为:

\[y = \vec{w}^T\vec{x} \tag{6} \]

(2) 评价标准

\[L(w) = \frac{1}{m}\sum_{i=1}^{m}(\vec{w}^T\vec{x_i}-y_i)^2 \]

如果利用一元线性回归的思想求解是比较困难的,利用矩阵的方法可以省时省力。

\[X = \begin{bmatrix} x_{11} & x_{12} & ... & x_{1n} & 1 \\ x_{21} & x_{22} & ... & x_{2n} & 1 \\ \vdots & \vdots & & \vdots & \vdots \\ x_{m1} & x_{m2} & ... & x_{mn} & 1 \\ \end{bmatrix}\]

\[=\begin{pmatrix} \vec{x_1}^T \\ \vec{x_2}^T \\ \vdots \\ \vec{x_m}^T \\ \end{pmatrix}\]

\[\vec{w}=\begin{pmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \\ b \end{pmatrix}\]

\[\vec{y}=\begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \\ \end{pmatrix}\]

\[L(\vec{w}) = \frac{1}{2m}(X\vec{w}-\vec{y})^T(X\vec{w}-\vec{y}) \]

矩阵求导:

\[\nabla{L(\vec{w})} = \frac{1}{m}X^T(X\vec{w} - \vec{y})=0 \\ X^TX\vec{w} = X^T\vec{y} \]

如果\(X^TX\)存在逆矩阵,则

\[\vec{w} = (X^TX)^{-1}X^T\vec{y} \]

注:矩阵求导:参考《矩阵分析与应用》--张贤达
(3) 求解
很多时候,机器学习模型是不能通过闭式方程直接求解,此时需要求助于迭代优化的方法。迭代优化就是每次根据当前情况做出一点点微调,反复迭代调整,直到最优时停止。在迭代优化方法中,应用最为广泛的是梯度下降法(Gradient Descent)。梯度下降法可描述为:
(a) 根据当前参数\(\vec{w}\)计算损失函数梯度\(\nabla{L(\vec{w})}\);
(b) 沿着负梯度方向\(-\nabla{L(\vec{w})}\),调整\(\vec{w}\),调整步长的大小为\(\eta\),使用公式表述为:

\[\vec{w} = \vec{w} - \eta \nabla{L(\vec{w})} \]

(c) 反复执行上述过程,直到梯度为0或者损失函数降低小于阈值,此时称算法已收敛。
要证明上面的解是全局最优解,需要证明\(L(\vec{w})\)是凸函数。
【证明方法参考】
https://blog.csdn.net/AIHUBEI/article/details/104347195

1.3 项目实战

本文使用的数据集下载地址:https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/
普通方法:

clc;
clear;
close all;

%% 导入数据集
data = readtable("winequality-red.csv","VariableNamingRule","preserve");
data = table2array(data);


%% 划分数据集
num_data = size(data, 1);  % 样本数目
idx = randperm(num_data);

rate = 0.7;   % 划分比例
num_train = ceil(num_data*rate);     % 训练集样本数目

train_data = data(idx(1:num_train),:);  % 训练数据集
x_train = train_data(:, 1:end-1);
x_train = [x_train, ones(num_train,1)];
y_train = train_data(:, end);

test_data = data(idx(num_train+1:end),:);  % 测试数据集
x_test = test_data(:, 1:end-1);
x_test = [x_test, ones(num_data-num_train, 1);];
y_test = test_data(:, end);


% 求解
w = (x_train'*x_train)^(-1)*x_train'*y_train;
loss = sum((x_test*w - y_test));

梯度下降方法:

clc;
clear;
close all;

%% 导入数据集
data = readtable("winequality-red.csv","VariableNamingRule","preserve");
data = table2array(data);


%% 划分数据集
num_data = size(data, 1);  % 样本数目
idx = randperm(num_data);

rate = 0.7;   % 划分比例
num_train = ceil(num_data*rate);     % 训练集样本数目

train_data = data(idx(1:num_train),:);  % 训练数据集
x_train = train_data(:, 1:end-1);
x_train = [x_train, ones(num_train,1)];
y_train = train_data(:, end);

test_data = data(idx(num_train+1:end),:);  % 测试数据集
x_test = test_data(:, 1:end-1);
x_test = [x_test, ones(num_data-num_train, 1);];
y_test = test_data(:, end);

max_iter = 3000;  % 最大迭代次数
w = zeros(size(x_train,2), 1);
eta = 0.0001;   % 学习率
tol = 0.00001;  % 损失降低阈值

%% 训练
for iter=1:max_iter
    y_pred = x_train*w; 
    loss = 1/(2*num_train)*sum((y_train - y_pred).^2);
    if iter>2
        error = loss_old - loss;
        if error<tol
            break;
        end
    end
    loss_old = loss;
    w = w - eta*(1/num_train)*x_train'*(y_pred- y_train);
end

%% 测试
y_ = x_test*w;
loss_ = 1/(2*(num_data-num_train))*sum((y_test - y_).^2);
posted @ 2022-04-05 13:23  编码雪人  阅读(123)  评论(0编辑  收藏  举报