Lecture7 Regularization

Overfitting : If we have too many features,the learned hypothesis may fit the training set very well,but fail to generalize to new examples(predict prices on new examples)

Addressing overfitting :

Reduce numbeer of features
- Manually select which features to keep
- Model selection algorithm
Regularization
- Keep all the features,but reduce magnitude/values of parameters \(\theta_j\)
- Works well when we have a lot of features,each of which contributes a bit to predicting y

Cost function

\[J(\theta) = \frac{1}{2m}[\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})^2 + \lambda \sum^n_{j=1}\theta_j^2] \]

if \(\lambda\) is set to an extremely large value:

Algorithm works fine; setting \(\lambda\) to be very large can't hurt it
Algortihm fails to eliminate overfitting.
Algorithm results in underfitting.(Fails to fit even training data well).
Gradient descent will fail to converge.

Regularized linear regression

Gradient descent

Repeat{

\[\theta_0:=\theta_0 - \alpha\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)} \\ \theta_j:=\theta_j - \alpha[\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} - \frac{\lambda}{m}\theta_j] \]

}

Normal equation

\[\theta=(X^TX+\lambda\underbrace{\left[ \begin{matrix} 0 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ \end{matrix} \right]})^{-1}X^Ty \\(n+1)*(n+1) \]

Regularized logistic regression

Gradient descent

Repeat{

}

posted @ 2021-12-16 15:14 Un-Defined 阅读(26) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

UnDefined

Lecture7 Regularization

Lecture7 Regularization

Cost function

Regularized linear regression

Regularized logistic regression

公告