1. pre
ref: https://zhuanlan.zhihu.com/p/263777564
起初是课上看到 \(\frac{\partial w^TRw}{\partial w}\),但没认真听,后面不知道怎么求导...
2. 简单情况
首先,针对函数\(f\):
\[f(x_1,x_2,x_3)=x_1^2+x_1x_2+x_2x_3 \tag{e.g.10}
\]
可以分别对 \(x_1,x_2,x_3\) 求偏导
\[\left\{ \begin{align*} \frac{\partial f}{\partial x_1} & = 2x_1+x_2 \\\\ \frac{\partial f}{\partial x_2} & = x_1+x_3 \\\\ \frac{\partial f}{\partial x_3} & = x_2 \end{align*} \right.
\]
可以把得到的结果写成列向量形式:
\[{\frac{\partial f(x)}{\partial x_{3\times1}}}=
{\left[\begin{matrix}{{\frac{\partial f}{\partial x_{1}}}}\\ {{\frac{\partial f}{\partial x_{2}}}}\\ {{\frac{\partial f}{\partial x_{3}}}}\end{matrix}\right]}=
{\left[\begin{matrix}{2x_{1}+x_{2}}\\ {x_{1}+x_{3}}\\ {x_{2}}\end{matrix}\right]} \tag{1}
\]
或者也可以以行向量形式展开:
\[\frac{\partial f(x)}{\partial x_{3\times1}^T}= \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \frac{\partial f}{\partial x_3} \right] = \left[ 2x_1+x_2, x_1+x_3, x_2 \right] \\\\ \tag{2}
\]
3. 两种布局
至此函数\(f\)都是类似标量那样只有一个,实际上函数也可以多个组合成向量或者矩阵,引申出分子布局跟分母布局
分子布局:分子是列向量形式,分母是行向量形式,如式2
\[\frac{\partial f_{2\times1}(x)}{\partial x_{3\times1}^{T}}=
\left[
\begin{matrix}
{{\frac{\partial f_{1}}{\partial x_{1}}}}&
{{\frac{\partial f_{1}}{\partial x_{2}}}}&
{{\frac{\partial f_{1}}{\partial x_{3}}}}\\
{{\frac{\partial f_{2}}{\partial x_{1}}}}&
{{\frac{\partial f_{2}}{\partial x_{2}}}}&
{{\frac{\partial f_{2}}{\partial x_{3}}}}\\
\end{matrix}\right]_{2\times3} \tag{3}
\]
分母布局:分母是列向量形式,分子是行向量形式,如式1
\[\frac{\partial f^T_{2\times1}(x)}{\partial x_{3\times1}}= \left[
\begin{matrix}
{{\frac{\partial f_{1}}{\partial x_{1}}}}&
{{\frac{\partial f_{2}}{\partial x_{1}}}}\\
{{\frac{\partial f_{1}}{\partial x_{2}}}}&
{{\frac{\partial f_{2}}{\partial x_{2}}}}\\
{{\frac{\partial f_{1}}{\partial x_{3}}}}&
{{\frac{\partial f_{2}}{\partial x_{3}}}}\\
\end{matrix}\right]_{3\times 2} \\\\ \tag{4}
\]
4. 实值标量函数-向量变元
实值标量函数\(f(\pmb{x})\) 对 向量变元\(\pmb{x}=[x_1,x_2,\cdots,x_n]^T\) 求导,有两种形式:
4.1. 行向量偏导形式(又称行偏导向量形式)
\[\text{D}_{x}f(x)= \frac{\partial f(x)}{\partial x^T}= \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \cdots, \frac{\partial f}{\partial x_n} \right] \\\\ \tag{5}
\]
4.2. 梯度向量形式(又称列向量偏导形式、列偏导向量形式)
\[\nabla_{x}f(x)= \frac{\partial f(x)}{\partial x}= \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \cdots, \frac{\partial f}{\partial x_n} \right]^T \\\\ \tag{6}
\]
这俩互为转置,也就是说对行向量求偏导,相当于对相应行向量求偏导后,结果再转置
5. 实值标量函数-矩阵变元
实值标量函数 \(f(X)\) 对 矩阵变元 \(X_{m\times n}=(x_{ij})_{i=1,j=1}^{m,n}\)
首先,有个符号 \(\text{vec}(X)\),将矩阵X按列堆栈来向量化,就是将矩阵X的第 1 列,第 2 列,直到第 n 列取出来,然后按顺序组成一个列向量,即:
\[\text{vec}({X)}= \left[ x_{11},x_{21},\cdots,x_{m1},x_{12},x_{22},\cdots,x_{m2},\cdots,x_{1n},x_{2n},\cdots,x_{mn} \right]^T \\\\ \tag{7}
\]
5.1. 行向量偏导形式(又称行偏导向量形式)
即先把矩阵变元 X 按 vec 向量化,转换成向量变元,再对该向量变元使用式5:
\[\begin{align} \text{D}_{\text{vec}X}f(X)&= \frac{\partial f(X)}{\partial \text{vec}^T(X)} \\\\ &= \left[ \frac{\partial f}{\partial x_{11}},\frac{\partial f}{\partial x_{21}},\cdots,\frac{\partial f}{\partial x_{m1}},\frac{\partial f}{\partial x_{12}},\frac{\partial f}{\partial x_{22}},\cdots,\frac{\partial f}{\partial x_{m2}},\cdots,\frac{\partial f} {\partial x_{1n}},\frac{\partial f}{\partial x_{2n}},\cdots,\frac{\partial f}{\partial x_{mn}} \right] \end{align} \\\\ \tag{8}
\]
5.2. Jacobian矩阵形式
即先把矩阵变元 X 进行转置,再对转置后的每个位置的元素逐个求偏导,结果布局和转置布局一样。
\[\begin{align} \text{D}_{X}f(X)&= \frac{\partial f(X)}{\partial X^T_{m\times n}} \\\\ &=
\left[
\begin{array}{cccc}
\frac{\partial f}{\partial x_{11}} & \frac{\partial f}{\partial x_{21}} & \cdots & \frac{\partial f}{\partial x_{m 1}} \\
\frac{\partial f}{\partial x_{12}} & \frac{\partial f}{\partial x_{22}} & \cdots & \frac{\partial f}{\partial x_{m 2}} \\
\vdots & \vdots & \vdots & \vdots \\
\frac{\partial f}{\partial x_{1 n}} & \frac{\partial f}{\partial x_{2 n}} & \cdots & \frac{\partial f}{\partial x_{m n}}
\end{array}
\right]_{n\times m} \end{align} \\\\ \tag{9}
\]
5.3. 梯度向量形式(又称列向量偏导形式、列偏导向量形式)
即先把矩阵变元 X 按 vec 向量化,转换成向量变元,再对该变元使用式6:
\[\begin{align} \nabla_{vec\pmb{X}}f(\pmb{X})&= \frac{\partial f(\pmb{X})}{\partial vec\pmb{X}} \\\\ &= \left[ \frac{\partial f}{\partial x_{11}},\frac{\partial f}{\partial x_{21}},\cdots,\frac{\partial f}{\partial x_{m1}},\frac{\partial f}{\partial x_{12}},\frac{\partial f}{\partial x_{22}},\cdots,\frac{\partial f}{\partial x_{m2}},\cdots,\frac{\partial f} {\partial x_{1n}},\frac{\partial f}{\partial x_{2n}},\cdots,\frac{\partial f}{\partial x_{mn}} \right]^T \end{align} \\\\ \tag{10}
\]
5.4. 梯度矩阵形式
直接对原矩阵变元 X 的每个位置的元素逐个求偏导,结果布局和原矩阵布局一样。
\[\begin{align} \nabla_{\pmb{X}}f(\pmb{X})&= \frac{\partial f(\pmb{X})}{\partial \pmb{X}_{m\times n}} \\\\ &= \left[
\begin{array}{cccc}
\frac{\partial f}{\partial x_{11}} & \frac{\partial f}{\partial x_{12}} & \cdots & \frac{\partial f}{\partial x_{1n}} \\
\frac{\partial f}{\partial x_{21}} & \frac{\partial f}{\partial x_{22}} & \cdots & \frac{\partial f}{\partial x_{2n}} \\
\vdots & \vdots & \vdots & \vdots \\
\frac{\partial f}{\partial x_{m1}} & \frac{\partial f}{\partial x_{m2}} & \cdots & \frac{\partial f}{\partial x_{mn}}
\end{array}
\right]_{m\times n} \end{align} \\\\ \tag{11}
\]
6. 小总结
- 式8与式10互为转置;式9与式11互为转置
- 函数为标量形式时,按照x布局对其中每个元素求偏导
- 若函数也为向量,则偏导的x应该是形式不同的向量,如行-列,列-行。实际上若都是同一种可以用公式6进行转化
7. post
事实上,当A代表nxn矩阵,X为nx1向量,有下列便捷的公式可用:
\[(AX)' = A^T \\
(XA)' = A^T \\
(X^TA)' = A \\
(AX^T)' = A \\
\]
回到开头的问题,依公式可得:\(\frac{\partial w^TRw}{\partial w} = Rw + R^Tw = (R+R^T)w\)
实际上这公式也很好证明,比如第一个公式,可设
\[A = \begin{bmatrix}
a_{11}& a_{12} \\
a_{21}& a_{22} \\
\end{bmatrix} \\
\\
X = \begin{bmatrix}
x_{1} \\
x_{2} \\
\end{bmatrix}
\]
按照上面的公式3 (3. 两种布局),可得
\[\frac{\partial AX }{\partial X^T} = \begin{bmatrix}
\frac{\partial a_{11}x_1 + a_{22}x_2 }{\partial x_1}& \ldots \\
\ldots& \ldots \\
\end{bmatrix} = \begin{bmatrix}
a_{11}& a_{12} \\
a_{21}& a_{22} \\
\end{bmatrix} = A
\]
其他同理可得
ps. 不知道多行公式为啥会有多个tag,知道的同学麻烦教一教QvQ
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步