Deep Learning 深度学习 Notes Chapter 3 Prob and Information Theory
1. Marginal Prob
\[\begin{equation}
P(x=a) = \sum_y P(x=a,y=y)
\end{equation}
\]
\(\text{For continuous variables, we have:}\)
\[\begin{equation}
p(x) = \int p(x,y)dy
\end{equation}
\]
2. Chain Rule of Conditional Prob
\[\begin{equation}
P(x^{(1)},...,x^{(n)}) = P(x^{(1)}\prod_{i=2}^n P(x^{(i)}|x^{(1)},...,x^{(i-1)}))
\end{equation}
\]
3. Conditional Independence
\(\text{2 random variables }x,y \text{ are conditional independence on }z:\)
\[\begin{equation}
p(x,y|z) = p(x|z)p(y|z)
\end{equation}
\]
4. Covariance
\[\begin{equation}
Cov(f(x),g(y)) = \mathbb{E}[(f(x)-\mathbb{E}(f))(g(y)-\mathbb{E}(g))]
\end{equation}
\]
5. KL Divergence and Cross Entropy
\[\begin{align}
D_{KL}(P||Q) &= \sum_i P(i)\log{\frac{P(i)}{Q(i)}}\\
H(P,Q) &= -\sum_i P(i)\log{Q(i)}
\end{align}
\]
\(\text{Therefore, we have:}\)
\[H(P,Q) = H(P) +D_{KL}(P||Q)
\]
\(\text{where } H(P) = -\sum_i P(i)\log{P(i)}\)