矩阵的迹及迹的求导

  矩阵的迹的定义:一个 $n \times n$ 的矩阵 A 的迹是指 A 的主对角线上各元素的总和,记作 $\operatorname{tr}(A)$ 。即

    $\operatorname{tr}(A)=\sum\limits\limits _{i=1}^{n} a_{i i}$

  定理1:

    $\operatorname{tr}(A B)=\operatorname{tr}(B A) $

  证明:

    $\operatorname{tr}(A B)=\sum\limits_{i=1}^{n}(A B)_{i i}=\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{m} A_{i j} B_{j i}=\sum\limits_{j=1}^{m} \sum\limits_{i=1}^{n} B_{j i} A_{i j}=\sum\limits_{j=1}^{m}(B A)_{j j}=\operatorname{tr}(B A) $

  定理2:

    $\operatorname{tr}(A B C)=\operatorname{tr}(C A B)=\operatorname{tr}(B C A) $

  证明:

    把 $\mathrm{AB}$ 或者 $ \mathrm{BC}$ 当作整体, 由定理 1 可以知道成立

  定理3:

    $\frac{\partial \operatorname{tr}(A B)}{\partial A}=\frac{\partial \operatorname{tr}(B A)}{\partial A}=B^{T} $

  其中 $A$ 是 $m \times n$ 的矩阵, $B$ 是 $n \times m$ 的矩阵

  证明:

    $\operatorname{tr}(A B)=\operatorname{tr}\left(\begin{array}{cccc}a_{11} & a_{12} & \cdots & a_{1 n} \\a_{21} & a_{22} & \cdots & a_{2 n} \\\vdots & \vdots & \ddots & \vdots \\a_{m 1} & a_{m 2} & \cdots & a_{m n}\end{array}\right)\left(\begin{array}{cccc}b_{11} & b_{12} & \cdots & b_{1 m} \\b_{21} & b_{22} & \cdots & b_{2 m} \\\vdots & \vdots & \ddots & \vdots \\b_{n 1} & b_{n 2} & \cdots & b_{n m}\end{array}\right)$

  只考虑对角线上的元素, 那么有

    $\operatorname{tr}(A B)=\sum\limits_{i=1}^{n} a_{1 i} b_{i 1}+\sum\limits_{i=1}^{n} a_{2 i} b_{i 2}+\ldots+\sum\limits_{i=1}^{n} a_{m i} b_{i m}=\sum\limits_{i=1}^{m} \sum\limits_{j=1}^{n} a_{i j} b_{j i}$

    $\frac{\partial \operatorname{tr}(A B)}{\partial a_{i j}}=b_{j i} \Rightarrow \frac{\partial \operatorname{tr}(A B)}{\partial A}=B^{T}$

  定理4:

    $\frac{\partial \operatorname{tr}\left(A^{T} B\right)}{\partial A}=\frac{\partial \operatorname{tr}\left(B A^{T}\right)}{\partial A}=B$

  证明:

    证明步骤跟定理 3 一样, 很容易, 不再赘述。
  定理5:

    $\operatorname{tr}(A)=\operatorname{tr}\left(A^{T}\right) $
  证明:

    略。
  定理6:

    如果 $a \in R$ , 那么有 $\operatorname{tr}(a)=a $
  证明:

    当作 $1 \times 1$  的矩阵处理即可。

  定理7:

    $\frac{\partial \operatorname{tr}\left(A B A^{T} C\right)}{\partial A}=C A B+C^{T} A B^{T} $
  证明: 分步求导, 得到如下表达式

    $\begin{aligned}\frac{\partial \operatorname{tr}\left(A B A^{T} C\right)}{\partial A} &=\frac{\partial \operatorname{tr}\left(A B A^{T} C\right)}{\partial A}+\frac{\partial \operatorname{tr}\left(A^{T} C A B\right)}{\partial A}\quad\quad(分步求导,定理1) \\&=\left(B A^{T} C\right)^{T}+C A B\quad\quad(定理1、定理4) \\&=C A B+C^{T} A B^{T}\end{aligned}$

  例子:

    $\begin{array}{l}\operatorname{tr}(A)=\sum_{i=1}^{n} a_{i i} \\\operatorname{tr}(A B C)=\operatorname{tr}(B C A)=\operatorname{tr}(C A B) \\ \operatorname{tr}(A B)=\operatorname{tr}(B A) \\\frac{\partial \operatorname{tr}(A B)}{\partial A}=\frac{\partial \operatorname{tr}(B A)}{\partial A}=B^{T}\\\operatorname{tr}(A)=\operatorname{tr}\left(A^{T}\right) \\ \frac{\partial \operatorname{tr}\left(A^{T} B A\right)}{\partial A}=B A+B^{T} A\frac{\partial \operatorname{tr}\left(A X B X C^{T}\right)}{\partial X}=A^{T} C X^{T} B^{T}+B^{T} X^{T} A^{T} C \\ \frac{\partial\operatorname{tr}\left(A B A^{T}\right)}{\partial A}=A B+A B^{T} \\\frac{\partial \operatorname{tr}(A X B X)}{\partial X}=A^{T} X^{T} B^{T}+B^{T} X^{T} A^{T} \\ \frac{\partial \operatorname{tr}\left(A X B X^{T}\right)}{\partial X}=A X B+A^{T} X B^{T}\\\frac{\partial \operatorname{tr}\left(A^{T} B\right)}{\partial A}=\frac{\partial \operatorname{tr}\left(B A^{T}\right)}{\partial A}=B \\\frac{\partial\operatorname{tr}\left(A^{T} X B^{T}\right)}{\partial X}=\frac{\partial \operatorname{tr}\left(A X^{T} B\right)}{\partial X}=A B\end{array}$

 


  向量的L2范数求导

  回归中最为基础的方法, 最小二乘法.

    $J_{L S}(\theta)=\frac{1}{2}\|A \vec{x}-\vec{b}\|^{2}$

  向量的范数定义

    $\begin{array}{l}\vec{x} =\left[x_{1}, \cdots, x_{n}\right]^{\mathrm{T}} \\\|\vec{x}\|_{p} =\left(\sum_{i=1}^{m}\left|x_{i}\right|^{p}\right)^{\frac{1}{p}}, p<+\infty\end{array}$
  $L_{2}$ 范数具体为

    $\|\vec{x}\|_{2}=\left(\left|x_{1}\right|^{2}+\cdots+\left|x_{m}\right|^{2}\right)^{\frac{1}{2}}=\sqrt{\vec{x}^{\mathrm{T}} \vec{x}}$

  矩阵求导

    采用列向量形式定义的偏导算子称为列向量偏导算子,习惯称为梯度算子, $\mathrm{n} \times 1$ 列向量偏导算子即梯度算子记作 $\nabla_{x}$ ,定义为

    $\nabla_{x}=\frac{\partial}{\partial x}=\left[\frac{\partial}{\partial x_{1}}, \cdots, \frac{\partial}{\partial x_{m}}\right]^{\mathrm{T}}$
  如果 $\vec{x}$ 是一个 $n \times 1$ 的列向量,那么

    $\begin{array}{l}\frac{\partial y x}{\partial x}=y^{T} \\ \frac{\partial\left(x^{T} A x\right)}{\partial x}=\left(A+A^{T}\right) x\end{array}$

  通过以上准备, 我们下面进行求解

    $\begin{aligned}\therefore \quad J_{L S}(\theta) &=\frac{1}{2}\|A x-\vec{b}\|^{2} \\&=\frac{1}{2}(A x-b)^{T}(A x-b) \\&=\frac{1}{2}\left(x^{T} A^{T}-b^{T}\right)(A x-b) \\ &=\frac{1}{2}\left(x^{T} A^{T} A x-2 b^{T} A x+b^{T} b\right) \end{aligned}$

  需要注意的 $\mathrm{b}, \mathrm{x}$ 都是列向量, 那么 $b^{T} A x$ 是个标量, 标量的转置等于自身, $b^{T} A x=x^{T} A^{T} b$

  对 $\overrightarrow{\boldsymbol{x}}$ 求导得:

    $J_{L S}^{\prime}(\theta)=A^{T} A x-A^{T} b=A^{T}(A x-b)$

 

posted @ 2022-03-21 19:29  图神经网络  阅读(1875)  评论(0编辑  收藏  举报
Live2D