矩阵求导
基本概念
假设 \(f = f(X)\),\(f\) 是标量,\(X\) 是矩阵,定义标量对矩阵的导数如下:
\[\frac{\partial f}{\partial X} = \begin{bmatrix} \frac{\partial f}{\partial X_{ij}} \end{bmatrix}
\]
求导思路
矩阵求导的核心是矩阵导数与微分的联系:
\[\mathrm{d}f = tr(\frac{\partial f}{\partial X}^T \mathrm{d}X)
\]
可以看出,\(\mathrm{d}f\) 是导数 \(\frac{\partial f}{\partial X} (m \times n)\) 与微分矩阵 \(\mathrm{d}X (m \times n)\) 的内积。我们要做的事情就是对 \(f\) 求微分,进而转化成 \(\mathrm{d}f = tr(\frac{\partial f}{\partial X}^T \mathrm{d}X)\) 的形式,直接比较得出 \(\frac{\partial f}{\partial X}\)。
矩阵微分的运算法则
- 加减:\(\mathrm{d}(X \pm Y) = \mathrm{d}X \pm \mathrm{d}Y\)
- 乘:\(\mathrm{d}(XY) = \mathrm{d}XY + X\mathrm{d}Y\)
- 转置:\(\mathrm{d}(X^T) = (\mathrm{d}X)^T\)
- 迹:\(\mathrm{d}tr(X) = tr(\mathrm{d}X)\)
- 逆:\(\mathrm{d}X^{-1} = -X^{-1}\mathrm{d}XX^{-1}\)
- 行列式:\(\mathrm{d}\vert X \vert = tr(X^* \mathrm{d}X)\)
- 逐元素相乘:\(\mathrm{d}(X \odot Y) = \mathrm{d}X \odot Y + X \odot \mathrm{d}Y\)
- 逐元素函数:\(\mathrm{d}(\sigma (X)) = \sigma ' (X) \odot \mathrm{d}X\)
迹运算
- 标量的迹:\(a = tr(a)\)
- 转置:\(tr(A^T) = tr(A)\)
- 线性:\(tr(A \pm B) = tr(A) \pm tr(B)\)
- 矩阵乘法的迹:\(A, B^T \in R^{m \times n}\),则 \(tr(AB) = tr(BA)\)
- 向量乘法的迹:\(a \in R^{n \times 1}, b \in R^{m \times 1}, W \in R^{n \times m}\),则 \(tr(a^T W b) = tr(b \times a W)\)
- 矩阵乘法及逐元素乘法:\(tr(A^T (B \odot C)) = tr((A \odot B)^T C)\)
链式法则
假设 \(f = f(Y), Y = g(X)\),则根据上述方法可先求出 f 对 Y 的微分形式:
\[\mathrm{d}f = tr(\frac{\partial f}{\partial Y} ^T \mathrm{d}Y)
\]
进而利用 \(\mathrm{d}Y = \mathrm{d}g(X)\) 得到:
\[\mathrm{d}f = tr(\frac{\partial f}{\partial Y} ^T \mathrm{d}g(X)) = tr(\frac{\partial f}{\partial X}^T \mathrm{d}X)
\]

浙公网安备 33010602011771号