2 Main Layout Conventions of Matrix Calculus
考虑 \(x\), \(y\) 分别是 \(n\), \(m\) 维列向量, \(A\) 是 \(m\times n\) 矩阵, \(z\) 是标量.
Numerator Layout
想象分子不变, 分母转置.
Vector by vector 符合直观. Jacobian.
\[\frac{\partial y}{\partial x} =
\begin{pmatrix}
\frac{\partial y_1}{\partial x_1} & \dots & \frac{\partial y_1}{\partial x_n}\\
\vdots & \ddots &\vdots\\
\frac{\partial y_m}{\partial x_1} & \dots & \frac{\partial y_m}{\partial x_n}
\end{pmatrix}
\]
Scalar by matrix 要做一次转置, 不舒服.
\[\frac{\partial z}{\partial A}=
\begin{pmatrix}
\frac{\partial z}{\partial a_{11}} & \dots & \frac{\partial z}{\partial a_{m1}}\\
\vdots & \ddots &\vdots\\
\frac{\partial z}{\partial a_{1n}} & \dots & \frac{\partial z}{\partial a_{mn}}
\end{pmatrix}
\]
Chain rule 符合直观.
\[\frac{\partial f\circ g}{\partial x} = \frac{\partial f}{\partial g}\frac{\partial g}{\partial x}
\]
Denominator Layout
想象分母不变, 分子转置.
Vector by vector 不舒服. Hessian.
\[\frac{\partial y}{\partial x} =
\begin{pmatrix}
\frac{\partial y_1}{\partial x_1} & \dots & \frac{\partial y_m}{\partial x_1}\\
\vdots & \ddots &\vdots\\
\frac{\partial y_1}{\partial x_n} & \dots & \frac{\partial y_m}{\partial x_n}
\end{pmatrix}
\]
Scalar by matrix 舒服.
\[\frac{\partial z}{\partial A}=
\begin{pmatrix}
\frac{\partial z}{\partial a_{11}} & \dots & \frac{\partial z}{\partial a_{1n}}\\
\vdots & \ddots &\vdots\\
\frac{\partial z}{\partial a_{m1}} & \dots & \frac{\partial z}{\partial a_{mn}}
\end{pmatrix}
\]
Chain rule "倒过来" 了, 不舒服.
\[\frac{\partial f\circ g}{\partial x} = \frac{\partial g}{\partial x}\frac{\partial f}{\partial g}
\]
混用
混用现象很常见. 比如 CS224n, 主体是采用 numerator layout, 但是 scalar by matrix 时是不转置的.