Lecture7 Regularization

Lecture7 Regularization

Overfitting : If we have too many features,the learned hypothesis may fit the training set very well,but fail to generalize to new examples(predict prices on new examples)

Addressing overfitting :

  1. Reduce numbeer of features
    • Manually select which features to keep
    • Model selection algorithm
  2. Regularization
    • Keep all the features,but reduce magnitude/values of parameters \(\theta_j\)
    • Works well when we have a lot of features,each of which contributes a bit to predicting y

Cost function

\[J(\theta) = \frac{1}{2m}[\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})^2 + \lambda \sum^n_{j=1}\theta_j^2] \]

if \(\lambda\) is set to an extremely large value:

  • Algorithm works fine; setting \(\lambda\) to be very large can't hurt it
  • Algortihm fails to eliminate overfitting.
  • Algorithm results in underfitting.(Fails to fit even training data well).
  • Gradient descent will fail to converge.

Regularized linear regression

Gradient descent

Repeat{

\[\theta_0:=\theta_0 - \alpha\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)} \\ \theta_j:=\theta_j - \alpha[\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} - \frac{\lambda}{m}\theta_j] \]

}

Normal equation

\[\theta=(X^TX+\lambda\underbrace{\left[ \begin{matrix} 0 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ \end{matrix} \right]})^{-1}X^Ty \\(n+1)*(n+1) \]

Regularized logistic regression

Gradient descent

Repeat{

\[\theta_0:=\theta_0 - \alpha\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)} \\ \theta_j:=\theta_j - \alpha[\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} - \frac{\lambda}{m}\theta_j] \]

}

posted @ 2021-12-16 15:14  Un-Defined  阅读(26)  评论(0编辑  收藏  举报