逻辑回归推导

样本(\(x_{i}\),\(y_{i}\))个数为\(m\):

\[\{x_{1},x_{2},x_{3}...x_{m}\} \]

\[\{y_{1},y_{2},y_{3}...y_{m}\} \]

其中\(x_{i}\)为\(n-1\)维向量(在最后添加一个1,和\(w\)的维度对齐,用于向量相乘):

\[x_{i}=\{x_{i1},x_{i2},x_{i3}...x_{i(n-1)},1\} \]

\[y_{i}\in\{0,1\} \]

其中\(w\)为\(n\)维向量:

\[w=\{w_{1},w_{2},w_{3}...w_{n}\} \]

回归函数:

\[h_{w}(x_{i})=\frac{1}{1+e^{wx_{i}}} \]

概率分布:

\[P(y=1|x;w)=h_{w}(x) \]

\[P(y=0|x;w)=1-h_{w}(x) \]

\[P(y|x;w)=h_{w}(x)^{y}*(1-h_{w}(x))^{1-y} \]

极大似然函数:

\[L(w)=\prod_{i=1}^{m}P(y_{i}|x_{i};w) =\prod_{i=1}^{m}h_{w}(x_{i})^{y_{i}}*(1-h_{w}(x_{i}))^{1-y_{i}} \]

函数两边取对数:

\[lnL(w)=\sum_{i=1}^{m}y_{i}lnh_{w}(x_{i})+(1-y_{i})ln(1-h_{w}(x_{i})) \]

\[求w->max_{lnL(w)} \]

损失函数:

\[J(w)=-\frac{1}{m}*\sum_{i=1}^{m}y_{i}lnh_{w}(x_{i})+(1-y_{i})ln(1-h_{w}(x_{i})) \]

\[求w->min_{J(w)} \]

损失函数对\(w\)中的每个\(w_{j}\)求偏导数(梯度下降求最小值):

\[\frac{\partial J(w)}{\partial w_{j}}=\frac{\partial}{\partial w_{j}}-\frac{1}{m}*\sum_{i=1}^{m}y_{i}lnh_{w}(x_{i})+(1-y_{i})ln(1-h_{w}(x_{i})) \]

\[=-\frac{1}{m}*\sum_{i=1}^{m}\frac{y_{i}}{h_{w}(x_{i})}*\frac{\partial h_{w}(x_{i})}{\partial w_{j}}+\frac{1-y_{i}}{1-h_{w}(x_{i})}*\frac{\partial (1-h_{w}(x_{i}))}{\partial w_{j}} \]

\[=-\frac{1}{m}*\sum_{i=1}^{m}(\frac{y_{i}}{h_{w}(x_{i})}-\frac{1-y_{i}}{1-h_{w}(x_{i})})*\frac{\partial h_{w}(x_{i})}{\partial w_{j}} \]

\[=-\frac{1}{m}*\sum_{i=1}^{m}(\frac{y_{i}}{h_{w}(x_{i})}-\frac{1-y_{i}}{1-h_{w}(x_{i})})*\frac{\partial h_{w}(x_{i})}{\partial wx_{i}}*\frac{\partial wx_{i}}{\partial w_{j}} \]

\[=-\frac{1}{m}*\sum_{i=1}^{m}(\frac{y_{i}}{h_{w}(x_{i})}-\frac{1-y_{i}}{1-h_{w}(x_{i})})*h_{w}(x_{i})*(1-h_{w}(x_{i}))*\frac{\partial wx_{i}}{\partial w_{j}} \]

\[=\frac{1}{m}*\sum_{i=1}^{m}(h_w(x_{i})-y_{i})*x_{ij} \]

更新\(w\)中的每个\(w_{j}\)的值,其中\(\alpha\)为学习速度:

\[w_{j}:=w_{j}-\alpha*\frac{\partial J(w)}{\partial w_{j}} \]

批量梯度下降:使用所有样本值进行更新\(w\)中的每个\(w_{j}\)的值

\[w_{j}:=w_{j}-\alpha*\frac{1}{m}*\sum_{i=1}^{m}(h_{w}(x_{i})-y_{i})*x_{ij} \]

posted @ 2019-06-18 14:51  JohnRed  阅读(272)  评论(0编辑  收藏  举报