Explanation of logistic regression cost function
Explanation of logistic regression cost function
\[\begin{array}{c}
\hat{y} = \sigma(w^Tx+b)\quad where\;\sigma(z) = \frac{1}{1+e^{-z}}\\
interpret \quad\hat{y} =P(y=1\mid x)\\
if\quad y=1:P(y\mid x)=\hat{y}\\
if\quad y=0:P(y\mid x)=1-\hat{y}\\
y=0,1\;because\;of\;binary\;cost\;equation.\\
and\;we\;can\;such\;equation\;to\;maintain\;its\;contunity\\
P(y\mid x)=\hat{y}^y\cdot(1-\hat{y})^{1-y}\\
log(P(y\mid x))=y\cdot log(\hat{y})+(1-y)\cdot log(1-\hat{y})\\
and\;our\;single\;loss\;function=-log(P(y\mid x))\\
because\;minimize\;loss\;is\;equivalent\;to\;maximize\;P(y\mid x)
\end{array}
\]
对于m个独立样例的数据集时
\[P(labels\;in\;training\;set)=\prod^{m}_{i=1}P(y^{(i)}\mid x^{(i)})
\]
我们希望通过寻找一组参数使得上式概率最大(maximun likelihood estimation),则
\[\begin{align}
log\;P(\cdots)&=\sum^{m}_{i=1}log(P(y^{(i)}\mid x^{(i)}))\\
&=\sum^{m}_{i=1}(- \mathcal{L}(\hat{y}^{(i)},y^{i}))\\
&=-\sum^{m}_{i=1}( \mathcal{L}(\hat{y}^{(i)},y^{i}))
\end{align}
\]
所以我们定义了成本函数
\[Cost:\quad J(w,b)=\frac{1}{m}\sum^{m}_{i=1} \mathcal{L}(\hat{y}^{(i)},y^{(i)})
\]