机器学习中常用的求导公式
课堂上老师介绍的几个求偏导的公式,但是不知道为什么是这么个结果,只有课下带入实例计算一下才能更好的理解。
-
\(\frac{\partial \beta^{\mathrm{T}} \mathrm{x}}{\partial \mathrm{x}}=\beta\)
-
\(\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{x}}{\partial \mathrm{x}}=2 \mathrm{x}\)
-
\(\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{Ax}}{\partial \mathrm{x}}=\left(\mathrm{A}+\mathrm{A}^{\mathrm{T}}\right) \mathrm{x}\)
对于上述三个求导公式,通过带入实例进行求导计算,令:
\[\beta =
\begin{bmatrix}
\beta_1 \\
\beta_2 \\
\beta_3
\end{bmatrix}\\
\mathrm{x}=
\begin{bmatrix}
x_1 \\
x_2 \\
x_3
\end{bmatrix}\\
A =
\begin{bmatrix}
a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33}
\end{bmatrix}
\]
第一个公式
\[\beta^T\mathrm{x} = \begin{bmatrix}
\beta_1 &
\beta_2 &
\beta_3
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
x_3
\end{bmatrix}
=\beta_1x_1+\beta_2x_2+\beta_3x_3\\
\\
\frac{\partial \beta^{\mathrm{T}} \mathrm{x}}{\partial \mathrm{x}} = \begin{bmatrix}
\frac{\partial (\beta_1x_1+\beta_2x_2+\beta_3x_3)}{\partial x_1} \\
\frac{\partial (\beta_1x_1+\beta_2x_2+\beta_3x_3)}{\partial x_2} \\
\frac{\partial (\beta_1x_1+\beta_2x_2+\beta_3x_3)}{\partial x_3}
\end{bmatrix}
=
\begin{bmatrix}
\beta_1 \\
\beta_2 \\
\beta_3
\end{bmatrix}
=
\beta
\]
第二个公式
\[\mathrm{x}^T\mathrm{x} =
\begin{bmatrix}
x_1&x_2&x_3
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
x_3
\end{bmatrix}
=
x_1^2+x_2^2+x_3^2\\
\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{x}}{\partial \mathrm{x}}
=
\begin{bmatrix}
\frac{\partial (x_1^2+x_2^2+x_3^2)}{\partial x1} \\
\frac{\partial (x_1^2+x_2^2+x_3^2)}{\partial x2} \\
\frac{\partial (x_1^2+x_2^2+x_3^2)}{\partial x3}
\end{bmatrix}
=
\begin{bmatrix}
2x_1 \\
2x_2 \\
2x_3
\end{bmatrix}
=
2
\begin{bmatrix}
x_1 \\
x_2 \\
x_3
\end{bmatrix}
=
2\mathrm{x}
\]
第三个公式
\[\begin{aligned}
\mathrm{x}^{T} A \mathrm{x} &=\left[\begin{array}{lll}
x_{1} & x_{2} & x_{3}
\end{array}\right]\left[\begin{array}{lll}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33}
\end{array}\right]\left[\begin{array}{l}
x_{1} \\
x_{2} \\
x_{3}
\end{array}\right] \\
&=a_{11} x_{1}^{2}+a_{21} x_{2} x_{1}+a_{31} x_{3} x_{1}+a_{12} x_{1} x_{2}+a_{22} x_{2}^{2}+a_{32} x_{3} x_{2}+a_{13} x_{1} x_{3}+a_{23} x_{2} x_{3}+a_{33} x_{3}^{2}
\end{aligned}\\
\]
\[\begin{aligned}\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{Ax}}{\partial \mathrm{x}}=&\left[\begin{array}{l}\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{Ax}}{\partial x_{1}} \\\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{Ax}}{\partial x_{2}} \\\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{Ax}}{\partial x_{3}}\end{array}\right] \\=&\left[\begin{array}{l}2 a_{11} x_{1}+a_{21} x_{2}+a_{31} x_{3}+a_{12} x_{2}+a_{13} x_{3} \\a_{21} x_{1}+a_{12} x_{1}+2 a_{22} x_{2}+a_{33} x_{3}+a_{23} x_{3} \\a_{31} x_{1}+a_{32} x_{2}+a_{13} x_{1}+a_{23} x_{2}+2 a_{33} x_{3}\end{array}\right] \\=&\left[\begin{array}{l}2 a_{11} x_{1}+\left(a_{21}+a_{12}\right) x_{2}+\left(a_{31}+a_{13}\right) x_{3} \\\left(a_{21}+a_{12}\right) x_{1}+2 a_{22} x_{2}+\left(a_{33}+a_{23}\right) x_{3} \\\left(a_{31}+a_{13}\right) x_{1}+\left(a_{32}+a_{23}\right) x_{2}+2 a_{33} x_{3}\end{array}\right] \\=&\left[\begin{array}{l}2 a_{11}+\left(a_{21}+a_{12}\right)+\left(a_{31}+a_{13}\right) \\\left(a_{21}+a_{12}\right)+2 a_{22}+\left(a_{33}+a_{23}\right) \\\left(a_{31}+a_{13}\right)+\left(a_{32}+a_{23}\right)+2 a_{33}\end{array}\right]\left[\begin{array}{l}x_{1} \\x_{2} \\x_{3}\end{array}\right] \\=&\left(A+A^{T}\right) \mathrm{x}\end{aligned}
\]