矩阵求导相关

1. pre

ref: https://zhuanlan.zhihu.com/p/263777564

起初是课上看到 \(\frac{\partial w^TRw}{\partial w}\),但没认真听,后面不知道怎么求导...

2. 简单情况

首先,针对函数\(f\)

\[f(x_1,x_2,x_3)=x_1^2+x_1x_2+x_2x_3 \tag{e.g.10} \]

可以分别对 \(x_1,x_2,x_3\) 求偏导

\[\left\{ \begin{align*} \frac{\partial f}{\partial x_1} & = 2x_1+x_2 \\\\ \frac{\partial f}{\partial x_2} & = x_1+x_3 \\\\ \frac{\partial f}{\partial x_3} & = x_2 \end{align*} \right. \]

可以把得到的结果写成列向量形式:

\[{\frac{\partial f(x)}{\partial x_{3\times1}}}= {\left[\begin{matrix}{{\frac{\partial f}{\partial x_{1}}}}\\ {{\frac{\partial f}{\partial x_{2}}}}\\ {{\frac{\partial f}{\partial x_{3}}}}\end{matrix}\right]}= {\left[\begin{matrix}{2x_{1}+x_{2}}\\ {x_{1}+x_{3}}\\ {x_{2}}\end{matrix}\right]} \tag{1} \]

或者也可以以行向量形式展开:

\[\frac{\partial f(x)}{\partial x_{3\times1}^T}= \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \frac{\partial f}{\partial x_3} \right] = \left[ 2x_1+x_2, x_1+x_3, x_2 \right] \\\\ \tag{2} \]

3. 两种布局

至此函数\(f\)都是类似标量那样只有一个,实际上函数也可以多个组合成向量或者矩阵,引申出分子布局跟分母布局
分子布局:分子是列向量形式,分母是行向量形式,如式2

\[\frac{\partial f_{2\times1}(x)}{\partial x_{3\times1}^{T}}= \left[ \begin{matrix} {{\frac{\partial f_{1}}{\partial x_{1}}}}& {{\frac{\partial f_{1}}{\partial x_{2}}}}& {{\frac{\partial f_{1}}{\partial x_{3}}}}\\ {{\frac{\partial f_{2}}{\partial x_{1}}}}& {{\frac{\partial f_{2}}{\partial x_{2}}}}& {{\frac{\partial f_{2}}{\partial x_{3}}}}\\ \end{matrix}\right]_{2\times3} \tag{3} \]

分母布局:分母是列向量形式,分子是行向量形式,如式1

\[\frac{\partial f^T_{2\times1}(x)}{\partial x_{3\times1}}= \left[ \begin{matrix} {{\frac{\partial f_{1}}{\partial x_{1}}}}& {{\frac{\partial f_{2}}{\partial x_{1}}}}\\ {{\frac{\partial f_{1}}{\partial x_{2}}}}& {{\frac{\partial f_{2}}{\partial x_{2}}}}\\ {{\frac{\partial f_{1}}{\partial x_{3}}}}& {{\frac{\partial f_{2}}{\partial x_{3}}}}\\ \end{matrix}\right]_{3\times 2} \\\\ \tag{4} \]

4. 实值标量函数-向量变元

实值标量函数\(f(\pmb{x})\) 对 向量变元\(\pmb{x}=[x_1,x_2,\cdots,x_n]^T\) 求导,有两种形式:

4.1. 行向量偏导形式(又称行偏导向量形式)

\[\text{D}_{x}f(x)= \frac{\partial f(x)}{\partial x^T}= \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \cdots, \frac{\partial f}{\partial x_n} \right] \\\\ \tag{5} \]

4.2. 梯度向量形式(又称列向量偏导形式、列偏导向量形式)

\[\nabla_{x}f(x)= \frac{\partial f(x)}{\partial x}= \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \cdots, \frac{\partial f}{\partial x_n} \right]^T \\\\ \tag{6} \]

这俩互为转置,也就是说对行向量求偏导,相当于对相应行向量求偏导后,结果再转置

5. 实值标量函数-矩阵变元

实值标量函数 \(f(X)\) 对 矩阵变元 \(X_{m\times n}=(x_{ij})_{i=1,j=1}^{m,n}\)

首先,有个符号 \(\text{vec}(X)\),将矩阵X按列堆栈来向量化,就是将矩阵X的第 1 列,第 2 列,直到第 n 列取出来,然后按顺序组成一个列向量,即:

\[\text{vec}({X)}= \left[ x_{11},x_{21},\cdots,x_{m1},x_{12},x_{22},\cdots,x_{m2},\cdots,x_{1n},x_{2n},\cdots,x_{mn} \right]^T \\\\ \tag{7} \]

5.1. 行向量偏导形式(又称行偏导向量形式)

即先把矩阵变元 X 按 vec 向量化,转换成向量变元,再对该向量变元使用式5:

\[\begin{align} \text{D}_{\text{vec}X}f(X)&= \frac{\partial f(X)}{\partial \text{vec}^T(X)} \\\\ &= \left[ \frac{\partial f}{\partial x_{11}},\frac{\partial f}{\partial x_{21}},\cdots,\frac{\partial f}{\partial x_{m1}},\frac{\partial f}{\partial x_{12}},\frac{\partial f}{\partial x_{22}},\cdots,\frac{\partial f}{\partial x_{m2}},\cdots,\frac{\partial f} {\partial x_{1n}},\frac{\partial f}{\partial x_{2n}},\cdots,\frac{\partial f}{\partial x_{mn}} \right] \end{align} \\\\ \tag{8} \]

5.2. Jacobian矩阵形式

即先把矩阵变元 X 进行转置,再对转置后的每个位置的元素逐个求偏导,结果布局和转置布局一样。

\[\begin{align} \text{D}_{X}f(X)&= \frac{\partial f(X)}{\partial X^T_{m\times n}} \\\\ &= \left[ \begin{array}{cccc} \frac{\partial f}{\partial x_{11}} & \frac{\partial f}{\partial x_{21}} & \cdots & \frac{\partial f}{\partial x_{m 1}} \\ \frac{\partial f}{\partial x_{12}} & \frac{\partial f}{\partial x_{22}} & \cdots & \frac{\partial f}{\partial x_{m 2}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial f}{\partial x_{1 n}} & \frac{\partial f}{\partial x_{2 n}} & \cdots & \frac{\partial f}{\partial x_{m n}} \end{array} \right]_{n\times m} \end{align} \\\\ \tag{9} \]

5.3. 梯度向量形式(又称列向量偏导形式、列偏导向量形式)

即先把矩阵变元 X 按 vec 向量化,转换成向量变元,再对该变元使用式6:

\[\begin{align} \nabla_{vec\pmb{X}}f(\pmb{X})&= \frac{\partial f(\pmb{X})}{\partial vec\pmb{X}} \\\\ &= \left[ \frac{\partial f}{\partial x_{11}},\frac{\partial f}{\partial x_{21}},\cdots,\frac{\partial f}{\partial x_{m1}},\frac{\partial f}{\partial x_{12}},\frac{\partial f}{\partial x_{22}},\cdots,\frac{\partial f}{\partial x_{m2}},\cdots,\frac{\partial f} {\partial x_{1n}},\frac{\partial f}{\partial x_{2n}},\cdots,\frac{\partial f}{\partial x_{mn}} \right]^T \end{align} \\\\ \tag{10} \]

5.4. 梯度矩阵形式

直接对原矩阵变元 X 的每个位置的元素逐个求偏导,结果布局和原矩阵布局一样。

\[\begin{align} \nabla_{\pmb{X}}f(\pmb{X})&= \frac{\partial f(\pmb{X})}{\partial \pmb{X}_{m\times n}} \\\\ &= \left[ \begin{array}{cccc} \frac{\partial f}{\partial x_{11}} & \frac{\partial f}{\partial x_{12}} & \cdots & \frac{\partial f}{\partial x_{1n}} \\ \frac{\partial f}{\partial x_{21}} & \frac{\partial f}{\partial x_{22}} & \cdots & \frac{\partial f}{\partial x_{2n}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial f}{\partial x_{m1}} & \frac{\partial f}{\partial x_{m2}} & \cdots & \frac{\partial f}{\partial x_{mn}} \end{array} \right]_{m\times n} \end{align} \\\\ \tag{11} \]

6. 小总结

  • 式8与式10互为转置;式9与式11互为转置
  • 函数为标量形式时,按照x布局对其中每个元素求偏导
  • 若函数也为向量,则偏导的x应该是形式不同的向量,如行-列,列-行。实际上若都是同一种可以用公式6进行转化

7. post

事实上,当A代表nxn矩阵,X为nx1向量,有下列便捷的公式可用:

\[(AX)' = A^T \\ (XA)' = A^T \\ (X^TA)' = A \\ (AX^T)' = A \\ \]

回到开头的问题,依公式可得:\(\frac{\partial w^TRw}{\partial w} = Rw + R^Tw = (R+R^T)w\)

实际上这公式也很好证明,比如第一个公式,可设

\[A = \begin{bmatrix} a_{11}& a_{12} \\ a_{21}& a_{22} \\ \end{bmatrix} \\ \\ X = \begin{bmatrix} x_{1} \\ x_{2} \\ \end{bmatrix} \]

按照上面的公式3 (3. 两种布局),可得

\[\frac{\partial AX }{\partial X^T} = \begin{bmatrix} \frac{\partial a_{11}x_1 + a_{22}x_2 }{\partial x_1}& \ldots \\ \ldots& \ldots \\ \end{bmatrix} = \begin{bmatrix} a_{11}& a_{12} \\ a_{21}& a_{22} \\ \end{bmatrix} = A \]

其他同理可得
ps. 不知道多行公式为啥会有多个tag,知道的同学麻烦教一教QvQ

posted @ 2023-03-10 14:48  NoNoe  阅读(89)  评论(0编辑  收藏  举报