正态分布与最小二乘
线性回归模型
\(y=Ax+v,b是噪声\)
\(v=y-Ax\)
附加了IID噪声的线性测量
\(iid是指独立同分布的意思\)
\(y_i=a_i^Tx+v_i,i=1,...,m\)
\(设v的概率密度函数为p_v\)
\(则似然函数为\)
\(p_{x}(y)=\prod_{i=1}^{m}p_v(y_i-a_i^Tx)\)
\(对数似然函数为L(x)=log\ p_x(y)=\sum_{i=1}^{m}log\ p(y_i-a_i^Tx)\)
\(则优化问题为 max\ \sum_{i=1}^{m}log\ p(y_i-a_i^Tx)\)
高斯分布
\(此时令b\sim 正态分布\)
\(P(x)=\frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\)
\(对数似然函数MLE=L(x)=\prod_{i=1}^{m}lnP(v_i)\)
点击查看代码
# 高斯分布图像
from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1, 1)
x = np.linspace(norm.ppf(0.01),
norm.ppf(0.99), 100)
ax.plot(x, norm.pdf(x),
'r-', lw=5, alpha=0.6, label='norm pdf')
\(=\sum_{i=1}^{m}(log\frac{1}{\sqrt{2\pi\sigma^2}}-\frac{1}{2\sigma^2}(y_i-a_i^Tx_i)^2)\)
\(=-\frac{m}{2}log\frac{1}{2\pi\sigma^2}-\frac{1}{2\sigma^2}\sum_{i=1}^{m}(y_i-a_i^Tx_i)^2\)
\(max_{x}L(x)\Leftrightarrow min_x \frac{1}{2}\sum_{i=1}^{m}(y_i-a_i^Tx_i)^2\)
\(即对误差的优化问题等价于最小二乘问题\)
拉普拉斯分布
\(拉普拉斯概率密度函数\)
\(P(x)=\frac{1}{2\lambda}e^{-\frac{|x-\mu|}{\lambda}}\)
点击查看代码
from scipy.stats import laplace
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
x = np.linspace(laplace.ppf(0.01),
laplace.ppf(0.99), 100)
ax.plot(x, laplace.pdf(x),
'r-', lw=5, alpha=0.6, label='laplace pdf')
\(对数似然函数=L(v)=\sum_{i=1}^{m}log \frac{1}{2\lambda}e^{-\frac{|y_i-a_i^Tx|}{\lambda}}\)
\(=-mlog2\lambda \sum_{i=1}^{m}\frac{1}{\lambda}|y_i-a_i^Tx|\)
\(max\ L(x) \Leftrightarrow min\ \sum_{i=1}^{m}|y_i-a_i^Tx|\)
\(这是L1范数下的优化问题,最小二乘就是L2范数下的优化问题\)