CS 229 notes Supervised Learning

CS 229 notes Supervised Learning

标签(空格分隔): 监督学习 线性代数


Forword

the proof of Normal equation and, before that, some linear algebra equations, which will be used in the proof.

The normal equation

Linear algebra preparation

For two matrices A and B such that AB is square, trAB\ = \ trBA.

Proof:

 

 

Some properties:

 

some facts of matrix derivative:
\nabla_AtrAB=B^T...................................................................(1)

Proof:

 

\nabla_{A^T}f(A) = (\nabla_Af(A))^T...........................................................(2)
\nabla_AtrABA^TC = CAB+C^TAB^T..................................................(3)

Proof 1:

 

Proof 2:

 

\nabla_A|A| = |A|(A^{-1})^T.............................................................(4)

Proof: (\nabla_A |A|)_{pq} = C_{pq} = A^*_{qp} = (A^*)^T_{pq} = |A|(A^{-1})_{pq}
(C refers to the cofactor)

Least squares revisited

X = \begin{bmatrix}-(x^{(1)})^T-\\-(x^{(2)})^T-\\.\\.\\.\\-(x^{(m)})^T-\end{bmatrix}(if we don’t include the intercept term)

\vec y = \begin{bmatrix}y^{(1)}\\y^{(2)}\\.\\.\\.\\y^{(m)}\end{bmatrix}

since h_\theta(x^{(i)} = (x^{(i)})^T\theta,

Thus,
$\frac{1}{2}(X\theta-\vec{y})^T(X\theta-\vec{y}) =
\frac{1}{2}\displaystyle{\sum{i=1}^{m}(h\theta(x^{(i)}) -y^{(i)})^2} = J(\theta) $.

Combine Equations (2),(3)
\nabla_{A^T}trABA^TC = B^TA^TC^T+BA^TC..............................................(5)

Hence

\nabla_\theta J(\theta) = \frac{1}{2}\nabla_\theta(X\theta-\vec{y})^T(X\theta-\vec{y})\\
 = \frac{1}{2}\nabla_\theta(\theta^TX^TX\theta-\theta^TX^T\vec{y}-\vec{y}X\theta -({\vec{y}})^T\vec{y})

Notice it is a real number, or you can see it as a 1\times 1 matrix, so

 


since trA = trA^T and \vec y involves no \theta elements.
then use equation (5) with A^T = \theta, B = B^T = X^TX, C = I

 


To minmize J, we set its derivative to zero, and obtain the normal equation:
X^TX\theta = X^T\vec{y}
\theta = (X^TX)^{-1}X^T\vec{y}

posted @ 2017-11-22 21:44  EtoDemerzel  阅读(306)  评论(0编辑  收藏  举报