【吴恩达】分类
Chapter6 Logistic Regression
Hypothesis representation
We want \(0\le h_\theta(x)\le 1\), so defined sigmoid function(logistic function).
Decision boundary
When \(\theta^Tx\ge0\),\( h_\theta(x)=g(\theta^Tx)\ge0.5 \). We will predict \(y=1\).
The decision boundary is a property not of the training set, but of the hypothesis and of the parameters.
Cost function
training set(m examples, n features):
If we use the linear regression cost function, it would be non-convex. Because the sigmoid function is not linear.
Linear regression(used in the linear regression)
Logistic regression
Choose \(\theta\) so that \(J(\theta)\) is minimum, to make a prediction given new \(x\).
Gradient descent
repeat until convergence{
(simultaneously update for every \(j=0,\cdots,n\))
}
Algorithm looks identical to linear regression.
Advanced optimization
Optimization algorithm
- Gradient descent
- Conjugate gradient
- BFGS
- L-BFGS
Advantages:
- No need to manually pick \(\alpha\).
- Often faster than gradient descent.
Multi-class classification
one-vs-all
Train a logistic regression classifier \(h_\theta^{(i)}\) for each class \(i\) to predict the probability that y=i. The class \(i\) that maximizes \(h_\theta^{(i)}\) is the prediction result.
Actually, logistic regression is a special kind of multi-class classification. In logistic regression, we needn't compare the value of two hypothesis function, just compare one hypothesis function with 0.5.
Words and expressions
maximum likelihood estimation 极大似然估计
transaction 交易
fraudulent 欺骗的