机器学习中常用的求导公式

课堂上老师介绍的几个求偏导的公式,但是不知道为什么是这么个结果,只有课下带入实例计算一下才能更好的理解。

  1. \(\frac{\partial \beta^{\mathrm{T}} \mathrm{x}}{\partial \mathrm{x}}=\beta\)

  2. \(\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{x}}{\partial \mathrm{x}}=2 \mathrm{x}\)

  3. \(\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{Ax}}{\partial \mathrm{x}}=\left(\mathrm{A}+\mathrm{A}^{\mathrm{T}}\right) \mathrm{x}\)

对于上述三个求导公式,通过带入实例进行求导计算,令:

\[\beta = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \beta_3 \end{bmatrix}\\ \mathrm{x}= \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}\\ A = \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix} \]

第一个公式

\[\beta^T\mathrm{x} = \begin{bmatrix} \beta_1 & \beta_2 & \beta_3 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} =\beta_1x_1+\beta_2x_2+\beta_3x_3\\ \\ \frac{\partial \beta^{\mathrm{T}} \mathrm{x}}{\partial \mathrm{x}} = \begin{bmatrix} \frac{\partial (\beta_1x_1+\beta_2x_2+\beta_3x_3)}{\partial x_1} \\ \frac{\partial (\beta_1x_1+\beta_2x_2+\beta_3x_3)}{\partial x_2} \\ \frac{\partial (\beta_1x_1+\beta_2x_2+\beta_3x_3)}{\partial x_3} \end{bmatrix} = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \beta_3 \end{bmatrix} = \beta \]

第二个公式

\[\mathrm{x}^T\mathrm{x} = \begin{bmatrix} x_1&x_2&x_3 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = x_1^2+x_2^2+x_3^2\\ \frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{x}}{\partial \mathrm{x}} = \begin{bmatrix} \frac{\partial (x_1^2+x_2^2+x_3^2)}{\partial x1} \\ \frac{\partial (x_1^2+x_2^2+x_3^2)}{\partial x2} \\ \frac{\partial (x_1^2+x_2^2+x_3^2)}{\partial x3} \end{bmatrix} = \begin{bmatrix} 2x_1 \\ 2x_2 \\ 2x_3 \end{bmatrix} = 2 \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = 2\mathrm{x} \]

第三个公式

\[\begin{aligned} \mathrm{x}^{T} A \mathrm{x} &=\left[\begin{array}{lll} x_{1} & x_{2} & x_{3} \end{array}\right]\left[\begin{array}{lll} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right]\left[\begin{array}{l} x_{1} \\ x_{2} \\ x_{3} \end{array}\right] \\ &=a_{11} x_{1}^{2}+a_{21} x_{2} x_{1}+a_{31} x_{3} x_{1}+a_{12} x_{1} x_{2}+a_{22} x_{2}^{2}+a_{32} x_{3} x_{2}+a_{13} x_{1} x_{3}+a_{23} x_{2} x_{3}+a_{33} x_{3}^{2} \end{aligned}\\ \]

\[\begin{aligned}\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{Ax}}{\partial \mathrm{x}}=&\left[\begin{array}{l}\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{Ax}}{\partial x_{1}} \\\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{Ax}}{\partial x_{2}} \\\frac{\partial \mathrm{x}^{\mathrm{T}} \mathrm{Ax}}{\partial x_{3}}\end{array}\right] \\=&\left[\begin{array}{l}2 a_{11} x_{1}+a_{21} x_{2}+a_{31} x_{3}+a_{12} x_{2}+a_{13} x_{3} \\a_{21} x_{1}+a_{12} x_{1}+2 a_{22} x_{2}+a_{33} x_{3}+a_{23} x_{3} \\a_{31} x_{1}+a_{32} x_{2}+a_{13} x_{1}+a_{23} x_{2}+2 a_{33} x_{3}\end{array}\right] \\=&\left[\begin{array}{l}2 a_{11} x_{1}+\left(a_{21}+a_{12}\right) x_{2}+\left(a_{31}+a_{13}\right) x_{3} \\\left(a_{21}+a_{12}\right) x_{1}+2 a_{22} x_{2}+\left(a_{33}+a_{23}\right) x_{3} \\\left(a_{31}+a_{13}\right) x_{1}+\left(a_{32}+a_{23}\right) x_{2}+2 a_{33} x_{3}\end{array}\right] \\=&\left[\begin{array}{l}2 a_{11}+\left(a_{21}+a_{12}\right)+\left(a_{31}+a_{13}\right) \\\left(a_{21}+a_{12}\right)+2 a_{22}+\left(a_{33}+a_{23}\right) \\\left(a_{31}+a_{13}\right)+\left(a_{32}+a_{23}\right)+2 a_{33}\end{array}\right]\left[\begin{array}{l}x_{1} \\x_{2} \\x_{3}\end{array}\right] \\=&\left(A+A^{T}\right) \mathrm{x}\end{aligned} \]

posted @ 2020-09-10 17:56  世纪小小孟  阅读(584)  评论(0编辑  收藏  举报