【吴恩达】正则化
Chapter7 Regularization
The problem of overfitting
If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.
addressing overfitting
- Reduce number of features
- Manually select which features to keep.
- Model selection algorithm.
- Regularization
- Keep all the features, but reduce magnitude/values of parameters \(\theta_j\), which can make simpler hypothesis, smoother functions.
Cost function
\(\lambda\):regularization parameter, control a trade off between two different goals. The goal of fitting the training set well and the goal of keeping the parameter small to avoid overfitting.
We don't need to shrink \(\theta_0\), because \(\theta_0\) correspond to the constant term, which makes little influence to the overfitting.
Linear regression
Gradient descent
repeat until convergence{
( \(j=1,\cdots,n\))
}
\(1-\alpha\frac{\lambda}{m}<1\), but very close to \(1\). Because \(\alpha\) is small and \(m\) is large. multiplying this term means reducing the influence of \(\theta_j\).
Normal equation
if \(\lambda>0\), we can prove that the matrix is invertible.
Logistic regression
Gradient descent
repeat until convergence{
( \(j=1,\cdots,n\))
}
Words and expressions
ameliorate 改良
wiggly 弯曲的