凯鲁嘎吉
用书写铭记日常,最迷人的不在远方

关于“Unsupervised Deep Embedding for Clustering Analysis”的优化问题

作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/

    Deep Embedding Clustering (DEC)和Improved Ceep Emdedding Clustering (IDEC)被相继提出,但关于参数的优化问题,作者并未详细给出,于是乎自己推导了一遍,但是发现关于聚类中心的偏导和这两篇文章的推导结果不一致,不知道问题出在哪?下面,相当于给出一道数学题,来求解目标函数关于某个参数的偏导问题。

2023.4.10 更新:原文推导见评论一楼,原文没错,我错了,i与j不应该混为一谈。类似的求导:https://peterroelants.github.io/posts/cross-entropy-softmax/#Derivative-of-the-cross-entropy-loss-function-for-the-softmax-function

问题描述

已知

\[L=\sum\limits_{i}^{N}{\sum\limits_{j}^{c}{{{p}_{ij}}\log \frac{{{p}_{ij}}}{{{q}_{ij}}}}}\]

\[{{q}_{ij}}=\frac{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}{\sum\nolimits_{j}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}}\]

\[{{p}_{ij}}=\frac{q_{ij}^{2}/\sum\nolimits_{i}{{{q}_{ij}}}}{\sum\nolimits_{j}{(q_{ij}^{2}/\sum\nolimits_{i}{{{q}_{ij}}})}}\]

固定${p}_{ij}$, 求$\frac{\partial L}{\partial {{z}_{i}}}$, $\frac{\partial L}{\partial {{\mu }_{j}}}$

问题求解

1. 先求$\frac{\partial L}{\partial {{z}_{i}}}$

根据链式法则

\[\frac{\partial L}{\partial {{z}_{i}}}=\sum\limits_{j}^{c}{\frac{\partial L}{\partial {{q}_{ij}}}\frac{\partial {{q}_{ij}}}{\partial {{z}_{i}}}}\]

\[\frac{\partial L}{\partial {{q}_{ij}}}=\frac{\partial \left( {{p}_{ij}}\log \frac{{{p}_{ij}}}{{{q}_{ij}}} \right)}{\partial {{q}_{ij}}}=\frac{\partial \left( {{p}_{ij}}\log {{p}_{ij}}-{{p}_{ij}}\log {{q}_{ij}} \right)}{\partial {{q}_{ij}}}=-\frac{{{p}_{ij}}}{{{q}_{ij}}}\]

\[ \frac{\partial {{q}_{ij}}}{\partial {{z}_{i}}}=\frac{-2{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}-{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}\cdot (-2)\cdot \sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{l}})}}{{{\left( \sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}} \right)}^{2}}} \\ =-2{{q}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}+2{{q}_{ij}}\frac{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{l}})}}{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}} \]

其中用到${{\left( \frac{A}{B} \right)}^{\prime }}=\frac{{{A}^{\prime }}B-A{{B}^{\prime }}}{{{B}^{2}}}$,以及上下同乘以$q_{ij}$.

因此,

\[\frac{\partial L}{\partial {{z}_{i}}}=\sum\limits_{j}^{c}{\frac{\partial L}{\partial {{q}_{ij}}}\frac{\partial {{q}_{ij}}}{\partial {{z}_{i}}}}\\ =\sum\limits_{j}^{c}{-\frac{{{p}_{ij}}}{{{q}_{ij}}}\left( -2{{q}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}+2{{q}_{ij}}\frac{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{l}})}}{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}} \right)}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-\sum\limits_{j}^{c}{2{{p}_{ij}}\frac{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{l}})}}{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}}}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-2\frac{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{l}})}}{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-2\frac{\sum\limits_{j}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})}}{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-2\frac{\sum\limits_{j}^{c}{{{q}_{ij}}{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})}}{{{q}_{ij}}\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-2\frac{\sum\limits_{j}^{c}{{{q}_{ij}}{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})}}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-2\sum\limits_{j}^{c}{{{q}_{ij}}{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}({{z}_{i}}-{{\mu }_{j}})}\\ =\sum\limits_{j}^{c}{2({{p}_{ij}}-{{q}_{ij}})({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}} \]

其中用到$\sum\limits_{j}^{c}{{{p}_{ij}}}=1$.

2. 再求$\frac{\partial L}{\partial {{\mu }_{j}}}$

根据链式法则

\[\frac{\partial L}{\partial {{\mu }_{j}}}=\sum\limits_{i}^{N}{\frac{\partial L}{\partial {{q}_{ij}}}\frac{\partial {{q}_{ij}}}{\partial {{\mu }_{j}}}}\]

\[\frac{\partial {{q}_{ij}}}{\partial {{\mu }_{j}}}=\frac{2{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}-{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}\cdot 2{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})}{{{\left( \sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}} \right)}^{2}}}\\ =2({{z}_{i}}-{{\mu }_{j}}){{q}_{ij}}{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}-2q_{ij}^{2}{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}({{z}_{i}}-{{\mu }_{j}})\\ =2{{q}_{ij}}(1-{{q}_{ij}})({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}} \]

因此,

\[\frac{\partial L}{\partial {{\mu }_{j}}}=\sum\limits_{i}^{N}{\frac{\partial L}{\partial {{q}_{ij}}}\frac{\partial {{q}_{ij}}}{\partial {{\mu }_{j}}}}=\sum\limits_{i}^{N}{\left( -\frac{{{p}_{ij}}}{{{q}_{ij}}}\cdot 2{{q}_{ij}}(1-{{q}_{ij}})({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}} \right)}=\sum\limits_{i}^{N}{\left( 2{{p}_{ij}}({{q}_{ij}}-1)({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}} \right)}\]

原文结果

不知道问题出在哪?虽然这些推导结果并不影响最终的实验结果,毕竟直接调用函数就可以出来,不需要亲自动手推,但是我觉得原文给出的这个结果可能不对,求广大网友指正~

参考文献

[1] Deep Clustering Algorithms - 凯鲁嘎吉 博客园

[2] Xie J, Girshick R, Farhadi A. Unsupervised deep embedding for clustering analysis[C]//International conference on machine learning. 2016: 478-487.

[3] Guo X, Gao L, Liu X, et al. Improved deep embedded clustering with local structure preservation[C]//IJCAI. 2017: 1753-1759.

posted on 2021-01-19 22:07  凯鲁嘎吉  阅读(1326)  评论(2编辑  收藏  举报