逻辑回归推导
样本(\(x_{i}\),\(y_{i}\))个数为\(m\):
\[\{x_{1},x_{2},x_{3}...x_{m}\}
\]
\[\{y_{1},y_{2},y_{3}...y_{m}\}
\]
其中\(x_{i}\)为\(n-1\)维向量(在最后添加一个1,和\(w\)的维度对齐,用于向量相乘):
\[x_{i}=\{x_{i1},x_{i2},x_{i3}...x_{i(n-1)},1\}
\]
\[y_{i}\in\{0,1\}
\]
其中\(w\)为\(n\)维向量:
\[w=\{w_{1},w_{2},w_{3}...w_{n}\}
\]
回归函数:
\[h_{w}(x_{i})=\frac{1}{1+e^{wx_{i}}}
\]
概率分布:
\[P(y=1|x;w)=h_{w}(x)
\]
\[P(y=0|x;w)=1-h_{w}(x)
\]
\[P(y|x;w)=h_{w}(x)^{y}*(1-h_{w}(x))^{1-y}
\]
极大似然函数:
\[L(w)=\prod_{i=1}^{m}P(y_{i}|x_{i};w)
=\prod_{i=1}^{m}h_{w}(x_{i})^{y_{i}}*(1-h_{w}(x_{i}))^{1-y_{i}}
\]
函数两边取对数:
\[lnL(w)=\sum_{i=1}^{m}y_{i}lnh_{w}(x_{i})+(1-y_{i})ln(1-h_{w}(x_{i}))
\]
\[求w->max_{lnL(w)}
\]
损失函数:
\[J(w)=-\frac{1}{m}*\sum_{i=1}^{m}y_{i}lnh_{w}(x_{i})+(1-y_{i})ln(1-h_{w}(x_{i}))
\]
\[求w->min_{J(w)}
\]
损失函数对\(w\)中的每个\(w_{j}\)求偏导数(梯度下降求最小值):
\[\frac{\partial J(w)}{\partial w_{j}}=\frac{\partial}{\partial w_{j}}-\frac{1}{m}*\sum_{i=1}^{m}y_{i}lnh_{w}(x_{i})+(1-y_{i})ln(1-h_{w}(x_{i}))
\]
\[=-\frac{1}{m}*\sum_{i=1}^{m}\frac{y_{i}}{h_{w}(x_{i})}*\frac{\partial h_{w}(x_{i})}{\partial w_{j}}+\frac{1-y_{i}}{1-h_{w}(x_{i})}*\frac{\partial (1-h_{w}(x_{i}))}{\partial w_{j}}
\]
\[=-\frac{1}{m}*\sum_{i=1}^{m}(\frac{y_{i}}{h_{w}(x_{i})}-\frac{1-y_{i}}{1-h_{w}(x_{i})})*\frac{\partial h_{w}(x_{i})}{\partial w_{j}}
\]
\[=-\frac{1}{m}*\sum_{i=1}^{m}(\frac{y_{i}}{h_{w}(x_{i})}-\frac{1-y_{i}}{1-h_{w}(x_{i})})*\frac{\partial h_{w}(x_{i})}{\partial wx_{i}}*\frac{\partial wx_{i}}{\partial w_{j}}
\]
\[=-\frac{1}{m}*\sum_{i=1}^{m}(\frac{y_{i}}{h_{w}(x_{i})}-\frac{1-y_{i}}{1-h_{w}(x_{i})})*h_{w}(x_{i})*(1-h_{w}(x_{i}))*\frac{\partial wx_{i}}{\partial w_{j}}
\]
\[=\frac{1}{m}*\sum_{i=1}^{m}(h_w(x_{i})-y_{i})*x_{ij}
\]
更新\(w\)中的每个\(w_{j}\)的值,其中\(\alpha\)为学习速度:
\[w_{j}:=w_{j}-\alpha*\frac{\partial J(w)}{\partial w_{j}}
\]
批量梯度下降:使用所有样本值进行更新\(w\)中的每个\(w_{j}\)的值
\[w_{j}:=w_{j}-\alpha*\frac{1}{m}*\sum_{i=1}^{m}(h_{w}(x_{i})-y_{i})*x_{ij}
\]