Fork me on GitHub

PRML第十章习题答案

Chapter 10. Approximate Inference

更新日志(截至20210127)
  • 20210127:首次提交,含习题 10.2,10.4,10.5,10.6,10.31 的详解

Exercise 10.2


Hint.

\(\mathbb{E}[z_i] = m_i\),移项后方程10.1310.15可以写成

\[\begin{aligned} 0= \left[ \begin{matrix} I & \Lambda_{11}^{-1}\Lambda_{12}\\ \Lambda_{22}^{-1}\Lambda_{21} & I \end{matrix} \right] \left[ \begin{matrix} m_1 - \mu_1 \\ m_2 - \mu_2 \end{matrix} \right] = \left[ \begin{matrix} \Lambda_{11}^{-1} & \\ & \Lambda_{22}^{-1} \end{matrix} \right] \left[ \begin{matrix} \Lambda_{11} & \Lambda_{12}\\ \Lambda_{21} & \Lambda_{22} \end{matrix} \right] \left[ \begin{matrix} m_1 - \mu_1 \\ m_2 - \mu_2 \end{matrix} \right] , \end{aligned} \]

\[\begin{aligned} \Lambda \left[ \begin{matrix} m_1 - \mu_1 \\ m_2 - \mu_2 \end{matrix} \right] =0. \end{aligned} \]

\(\Lambda\)非奇异,则方程只有零解,即\(m_i = \mu_i\)


Comment.

\(\Lambda\)本不就是非奇异的,假设意义何在?


Exercise 10.4


Hint.

\(L={\rm KL}(p||q) = -\underset{p}{\mathbb{E}}[\ln q] + C\)

\[\begin{aligned} \frac{\partial L}{\partial \mu} &= \underset{p}{\mathbb{E}}[\Sigma^{-1}(x - \mu)]=0 \Rightarrow \mu^*=\underset{p}{\mathbb{E}}[x]\\ \frac{\partial L}{\partial \Sigma} &= \frac{1}{2}\underset{p}{\mathbb{E}}\left[(x - \mu)(x - \mu)^T - \Sigma \right] = 0 \Rightarrow \Sigma^*= \underset{p}{\mathbb{E}}[(x - \underset{p}{\mathbb{E}}[x])(x - \underset{p}{\mathbb{E}}[x])^T] \end{aligned} \]


Comment.

以上恰好是高斯分布极大似然估计的渐近结果,即\(N \to \infty\)的情形。


Exercise 10.5


Hint.

  • 优化\(q(z)\)

由公式(10.9)\(\ln q^*(z) = \underset{q(\theta)}{\mathbb{E}}[\ln p(x,z,\theta)]+C = \ln p(z|x, \theta_0) + p(x|\theta_0) + C\),故\(q^*(z)\propto p(z|x, \theta_0)\),故\(q(z) = p(z|x, \theta_0)\),对应EM算法的E步。

  • 优化\(q(\theta)\)

\[\begin{aligned} \mathcal{L}(q) &= \underset{q(\theta)}{\mathbb{E}}\underset{q(z)}{\mathbb{E}}\left[ \ln p(x, z, \theta) - \ln q(z) - \ln q(\theta) \right] \\ &= \underset{q(z)}{\mathbb{E}}\underset{q(\theta)}{\mathbb{E}}\left[ \ln p(x, z, \theta)\right] + {\rm H}[q(z)] + {\rm H}[q(\theta)]\\ &= \underset{q(z)}{\mathbb{E}}\left[ \ln p(x, z, \theta_0)\right] + C \end{aligned} \]

对应EM算法中的M步。

Comment.

\({\rm H}[q(\theta)]\)实际上是发散的,将导致\(\mathcal{L}(q) = -\infty\),优化没有意义,但为了说明与EM算法的联系,暂且忽略这一点。


Exercise 10.6


Hints.

\(q^{\frac{1-\alpha}{2}}=1+\frac{1-\alpha}{2}\ln q + o((\frac{1-\alpha}{2})^2)\),

\(p^{\frac{1+\alpha}{2}}=p^{1-\frac{1-\alpha}{2}}=p\cdot p^{-\frac{1-\alpha}{2}}=p(1-\frac{1-\alpha}{2}\ln p + o((\frac{1-\alpha}{2})^2))\),

\[\begin{aligned} {\rm D}_{\alpha}(p||q) &=\frac{2}{1-\alpha}\frac{2}{1+\alpha}(1-\int p(1-\frac{1-\alpha}{2}\ln p + o((\frac{1-\alpha}{2})^2))(1+\frac{1-\alpha}{2}\ln q + o((\frac{1-\alpha}{2})^2)))\\ &=\frac{2}{1-\alpha}\frac{2}{1+\alpha}(1-\int p(1 + \frac{1-\alpha}{2}\ln \frac{q}{p}+o((\frac{1-\alpha}{2})^2)))\\ &=\frac{2}{1+\alpha}\int p \ln \frac{q}{p} + o(\frac{1-\alpha}{2})\\ &={\rm KL}(p||q)\quad(\alpha \to 1). \end{aligned} \]

对于\(\alpha\to -1\),只需令\(\beta=-\alpha\),则有\(\frac{1-\alpha}{2}=\frac{1+\beta}{2},\frac{1+\alpha}{2}=\frac{1-\beta}{2}\),进一步有\({\rm D}_{\alpha}(p||q)={\rm D}_{\beta}(q||p)={\rm KL}(q||p),\beta\to 1\)


Comments.

注意到\(\frac{1-\alpha}{2}+\frac{1+\alpha}{2}=1\),我们可以得到\(p^{\frac{1+\alpha}{2}}\)\(\alpha=1\)处的一阶泰勒展开。对于\(\alpha \to -1\),根据对称性做一个变量替换就可以直接利用前面的结果。


Exercise 10.31


Hint.

  • \(f(x)=-\ln (e^{x/2} + e^{-x/2})= -\ln 2\cosh(\frac{x}{2})\)的凹性

\(f'(x) = -\frac{1}{2} \tanh(\frac{x}{2}),f''(x) = -\frac{1}{4} (1 - \tanh^2(\frac{x}{2})) < 0\)

  • \(g(x)=f(x^{1/2})\)的凸性

\(g'(x^{1/2}) = -\frac{1}{4}x^{-1/2}\tanh(\frac{x^{1/2}}{2}),\)

\(g''(x^{1/2})=\frac{1}{16}x^{-3/2}\left[2\tanh(\frac{x^{1/2}}{2}) - x^{1/2} (1 - \tanh^2(\frac{x^{1/2}}{2}))\right],\)

\(y= x^{1/2}\),利用半角公式\(\tanh(\frac{y}{2})=\frac{\sinh y}{1+\cosh y}\),中括号中项可化为

\(\frac{2\sinh y}{1+\cosh y} -\frac{2y }{1+\cosh y} = \frac{2(\sinh y -y)}{1+\cosh y}\),因为\(\forall y\geq 0, \sinh y \geq y \geq \tanh y\),所以\(g''(x^{1/2}) \geq 0\)

  • 证明10.144

由一阶条件,\(g(x) \geq g(\xi) + g'(\xi)(x - \xi)\),取\(x=x^2, \xi = \xi^2\),并注意到\(\ln \sigma(x) = \frac{x}{2} + f(x)\),可以得到\(\ln \sigma(x) - \frac{x}{2} \geq \ln \sigma(\xi) - \frac{\xi}{2} + g'(\xi^2)(x^2 - \xi^2)\),等价于\(\sigma (x) \geq \sigma(\xi) \exp(\frac{1}{2}(x - \xi) + g'(\xi^2)(x^2 - \xi^2))\),相当于令\(\lambda(\xi) = - g'(\xi^2)\),此处书中有误,需要参照勘误改正。

posted @ 2021-01-27 14:07  Rotopia  阅读(347)  评论(0编辑  收藏  举报