矩阵求导

基本概念

假设 \(f = f(X)\)\(f\) 是标量,\(X\) 是矩阵,定义标量对矩阵的导数如下:

\[\frac{\partial f}{\partial X} = \begin{bmatrix} \frac{\partial f}{\partial X_{ij}} \end{bmatrix} \]

求导思路

矩阵求导的核心是矩阵导数与微分的联系:

\[\mathrm{d}f = tr(\frac{\partial f}{\partial X}^T \mathrm{d}X) \]

可以看出,\(\mathrm{d}f\) 是导数 \(\frac{\partial f}{\partial X} (m \times n)\) 与微分矩阵 \(\mathrm{d}X (m \times n)\) 的内积。我们要做的事情就是对 \(f\) 求微分,进而转化成 \(\mathrm{d}f = tr(\frac{\partial f}{\partial X}^T \mathrm{d}X)\) 的形式,直接比较得出 \(\frac{\partial f}{\partial X}\)

矩阵微分的运算法则

  1. 加减:\(\mathrm{d}(X \pm Y) = \mathrm{d}X \pm \mathrm{d}Y\)
  2. 乘:\(\mathrm{d}(XY) = \mathrm{d}XY + X\mathrm{d}Y\)
  3. 转置:\(\mathrm{d}(X^T) = (\mathrm{d}X)^T\)
  4. 迹:\(\mathrm{d}tr(X) = tr(\mathrm{d}X)\)
  5. 逆:\(\mathrm{d}X^{-1} = -X^{-1}\mathrm{d}XX^{-1}\)
  6. 行列式:\(\mathrm{d}\vert X \vert = tr(X^* \mathrm{d}X)\)
  7. 逐元素相乘:\(\mathrm{d}(X \odot Y) = \mathrm{d}X \odot Y + X \odot \mathrm{d}Y\)
  8. 逐元素函数:\(\mathrm{d}(\sigma (X)) = \sigma ' (X) \odot \mathrm{d}X\)

迹运算

  1. 标量的迹:\(a = tr(a)\)
  2. 转置:\(tr(A^T) = tr(A)\)
  3. 线性:\(tr(A \pm B) = tr(A) \pm tr(B)\)
  4. 矩阵乘法的迹:\(A, B^T \in R^{m \times n}\),则 \(tr(AB) = tr(BA)\)
  5. 向量乘法的迹:\(a \in R^{n \times 1}, b \in R^{m \times 1}, W \in R^{n \times m}\),则 \(tr(a^T W b) = tr(b \times a W)\)
  6. 矩阵乘法及逐元素乘法:\(tr(A^T (B \odot C)) = tr((A \odot B)^T C)\)

链式法则

假设 \(f = f(Y), Y = g(X)\),则根据上述方法可先求出 f 对 Y 的微分形式:

\[\mathrm{d}f = tr(\frac{\partial f}{\partial Y} ^T \mathrm{d}Y) \]

进而利用 \(\mathrm{d}Y = \mathrm{d}g(X)\) 得到:

\[\mathrm{d}f = tr(\frac{\partial f}{\partial Y} ^T \mathrm{d}g(X)) = tr(\frac{\partial f}{\partial X}^T \mathrm{d}X) \]

posted @ 2020-02-03 13:37  问李白买酒  阅读(560)  评论(0)    收藏  举报