【吴恩达】正则化

Chapter7 Regularization

The problem of overfitting

If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.

addressing overfitting

Reduce number of features

Manually select which features to keep.
Model selection algorithm.

Regularization

Keep all the features, but reduce magnitude/values of parameters \(\theta_j\), which can make simpler hypothesis, smoother functions.

Cost function

\[J(\theta)=\frac{1}{2m} [\sum_1^m(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum^n_{j=1}\theta_j^2] \]

\(\lambda\):regularization parameter, control a trade off between two different goals. The goal of fitting the training set well and the goal of keeping the parameter small to avoid overfitting.

We don't need to shrink \(\theta_0\), because \(\theta_0\) correspond to the constant term, which makes little influence to the overfitting.

Linear regression

Gradient descent

repeat until convergence{

\[\begin{aligned} \theta_0&=\theta_0-\alpha\frac{\partial}{\partial\theta_0}J(\theta)\\ &=\theta_j-\frac{\alpha}{m} \sum_1^m[h_\theta(x^{(i)})-y^{(i)}]x_0^{(i)}\\ \theta_j&=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta)\\ &=\theta_j-\alpha[\frac{1}{m} \sum_1^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j]\\ &=\theta_j(1-\alpha\frac{\lambda}{m})-\frac{\alpha}{m} \sum_1^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} \end{aligned} \]

( \(j=1,\cdots,n\))

}

\(1-\alpha\frac{\lambda}{m}<1\), but very close to \(1\). Because \(\alpha\) is small and \(m\) is large. multiplying this term means reducing the influence of \(\theta_j\).

Normal equation

\[\theta=(X^TX+\lambda \left[ \begin{matrix} 0&&&&\\ &1&&&\\ &&1&&\\ &&&\ddots&\\ &&&&1 \end{matrix} \right]_{(n+1)\times(n+1)} )^{-1}X^Ty \]

if \(\lambda>0\), we can prove that the matrix is invertible.

Logistic regression

\[\begin{aligned} J(\theta)=-\frac{1}{m}\sum_{i=1}^m[y^{(i)}\log(h_\theta(x^{(i)}))+(1-y^{(i)})\log(1-h_\theta(x^{(i)}))]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2 \end{aligned} \]

Gradient descent

repeat until convergence{

( \(j=1,\cdots,n\))

}

Words and expressions

ameliorate 改良

wiggly 弯曲的

posted @ 2022-03-17 22:01 煮茶听风雨阅读(30) 评论(0) 编辑收藏举报

刷新页面返回顶部

煮茶听风雨

【吴恩达】正则化

Chapter7 Regularization

The problem of overfitting

addressing overfitting

Cost function

Linear regression

Gradient descent

Normal equation

Logistic regression

Gradient descent

Words and expressions

公告