【吴恩达】分类

Chapter6 Logistic Regression

Hypothesis representation

We want \(0\le h_\theta(x)\le 1\), so defined sigmoid function(logistic function).

\[g(z)=\frac{1}{1+e^{-z}} \]

\[h_\theta(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}} \]

\[h_\theta(x)=P(y=1|x;\theta) \]

Decision boundary

When \(\theta^Tx\ge0\)\( h_\theta(x)=g(\theta^Tx)\ge0.5 \). We will predict \(y=1\).

The decision boundary is a property not of the training set, but of the hypothesis and of the parameters.

Cost function

training set(m examples, n features):

\[\{ (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\cdots,(x^{(n)},y^{(n)})\}, \]

\[x\in \left[ \begin{matrix} x_0\\ x_1\\ \vdots\\ x_n \end{matrix} \right] ,x_0=1,y\in\{0,1\} \]

If we use the linear regression cost function, it would be non-convex. Because the sigmoid function is not linear.

\[J(\theta)=\frac{1}{m}\sum_{i=1}^mcost(h(x),y) \]

Linear regression(used in the linear regression)

\[cost(h_\theta(x),y)=\frac{1}{2}[h_\theta(x)-y]^2 \]

Logistic regression

\[\begin{aligned} cost(h_\theta(x),y)&= \begin{cases} -\log(h_\theta(x)) &\text{y=1}\\ -\log(1-h_\theta(x)) &\text{y=0} \end{cases} \\&=-y\log(h_\theta(x))-(1-y)\log(1-h_\theta(x)) \end{aligned} \]

\[\begin{aligned} J(\theta)&=\frac{1}{m}\sum_{i=1}^mcost(h(x^{(i)}),y^{(i)}) \\&=-\frac{1}{m}\sum_{i=1}^m[y^{(i)}\log(h_\theta(x^{(i)}))+(1-y^{(i)})\log(1-h_\theta(x^{(i)}))] \end{aligned} \]

Choose \(\theta\) so that \(J(\theta)\) is minimum, to make a prediction given new \(x\).

Gradient descent

repeat until convergence{

\[\begin{aligned} \theta_j&=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta)\\ &=\theta_j-\frac{\alpha}{m} \sum_1^m[h_\theta(x^{(i)})-y^{(i)}]x_j^{(i)} \end{aligned} \]

(simultaneously update for every \(j=0,\cdots,n\))

}

Algorithm looks identical to linear regression.

Advanced optimization

Optimization algorithm

  • Gradient descent
  • Conjugate gradient
  • BFGS
  • L-BFGS

Advantages:

  • No need to manually pick \(\alpha\).
  • Often faster than gradient descent.

Multi-class classification

one-vs-all

Train a logistic regression classifier \(h_\theta^{(i)}\) for each class \(i\) to predict the probability that y=i. The class \(i\) that maximizes \(h_\theta^{(i)}\) is the prediction result.

\[\max_ih_\theta^{(i)(x)} \]

Actually, logistic regression is a special kind of multi-class classification. In logistic regression, we needn't compare the value of two hypothesis function, just compare one hypothesis function with 0.5.

Words and expressions

maximum likelihood estimation 极大似然估计

transaction 交易

fraudulent 欺骗的

posted @ 2022-03-17 18:28  煮茶听风雨  阅读(48)  评论(0编辑  收藏  举报