CS 229 notes Supervised Learning

标签（空格分隔）：监督学习线性代数

Forword

the proof of Normal equation and, before that, some linear algebra equations, which will be used in the proof.

The normal equation

Linear algebra preparation

For two matrices $A$ and $B$ such that $AB$ is square, $trAB\ = \ trBA$ .

Proof:

Some properties:

some facts of matrix derivative:
$\nabla_AtrAB=B^T...................................................................(1)$

Proof:

$\nabla_{A^T}f(A) = (\nabla_Af(A))^T...........................................................(2)$
$\nabla_AtrABA^TC = CAB+C^TAB^T..................................................(3)$

Proof 1:

Proof 2:

$\nabla_A|A| = |A|(A^{-1})^T.............................................................(4)$

Proof: $(\nabla_A |A|)_{pq} = C_{pq} = A^*_{qp} = (A^*)^T_{pq} = |A|(A^{-1})_{pq}$
( $C$ refers to the cofactor)

Least squares revisited

$X = \begin{bmatrix}-(x^{(1)})^T-\\-(x^{(2)})^T-\\.\\.\\.\\-(x^{(m)})^T-\end{bmatrix}$ (if we don’t include the intercept term)

$\vec y = \begin{bmatrix}y^{(1)}\\y^{(2)}\\.\\.\\.\\y^{(m)}\end{bmatrix}$

since $h_\theta(x^{(i)} = (x^{(i)})^T\theta$ ,

Thus,
$\frac{1}{2}(X\theta-\vec{y})^T(X\theta-\vec{y}) =
\frac{1}{2}\displaystyle{\sum{i=1}^{m}(h\theta(x^{(i)}) -y^{(i)})^2} = J(\theta) $.

Combine Equations $(2),(3)$ ：
$\nabla_{A^T}trABA^TC = B^TA^TC^T+BA^TC..............................................(5)$

Hence

$\nabla_\theta J(\theta) = \frac{1}{2}\nabla_\theta(X\theta-\vec{y})^T(X\theta-\vec{y})\\ = \frac{1}{2}\nabla_\theta(\theta^TX^TX\theta-\theta^TX^T\vec{y}-\vec{y}X\theta -({\vec{y}})^T\vec{y})$