信息论

  1. 自信息(可以理解为该事件发生后所带来的信息量):\[I\left( x \right) =  - \log P\left( x \right)\],(注意:这里为什么要用log呢,假设x,y独立同分步,则应该满足I(x,y)=I(x)+I(y)。而I(x,y)=-logP(x,y)=-log(P(x)P(y))= -logP(x)-logP(y)=I(x)+I(y)
  2. 熵(度量整个概率分布的不确定性):\[H\left( x \right) = {E_{x \sim P}}\left[ {I\left( x \right)} \right] =  - {E_{x \sim P}}\left[ {\log P\left( x \right)} \right] =  - \sum\limits_{i = 1}^n {p\left( {{x_i}} \right)\log \left( {p\left( {{x_i}} \right)} \right)} \]
  3. KL散度(又称相对熵,用来衡量两个分布的差异,是非负的):\[{D_{KL}}\left( {P||Q} \right) = {E_{x \sim P}}\left[ {\log \frac{{P\left( x \right)}}{{Q\left( x \right)}}} \right] = {E_{x \sim P}}\left[ {\log P\left( x \right) - \log Q\left( x \right)} \right] = \sum\limits_{i = 1}^n {p\left( {{x_i}} \right)\log \left( {\frac{{p\left( {{x_i}} \right)}}{{q\left( {{x_i}} \right)}}} \right)} \],假设P是真实分布,Q是模型生成的数据分布(上面公式可以理解为从P的角度看,Q和P分布的差异),我们要通过Q来使KL散度最小化,P可以看成是确定的,对Q求最小就相当于最小化交叉熵。
  4. 交叉熵(针对Q最小化交叉熵等价于最小化KL散度):\[H\left( {P,Q} \right) =  - {E_{x \sim P}}\left[ {\log Q\left( x \right)} \right] =  - \sum\limits_{i = 1}^n {p\left( {{x_i}} \right)\log \left( {q\left( {{x_i}} \right)} \right)} \]
  5. 条件熵(已知随机变量X的条件下随机变量Y的不确定性):\[H\left( {Y|X} \right) = \sum\limits_{i = 1}^n {P\left( {X = {x_i}} \right)H\left( {Y|X = {x_i}} \right)} \]
  6. 信息增益(已知X的信息使得Y的信息的不确定性减少的程度,等价于互信息):\[g\left( {Y,X} \right) = H\left( Y \right) - H\left( {Y|X} \right)\]
  7. 信息增益比:\[{g_R}\left( {Y,X} \right) = \frac{{g\left( {Y,X} \right)}}{{H\left( X \right)}}\]
  8. 基尼指数:\[Gini\left( p \right) = 1 - \sum\limits_{k = 1}^K {p_k^2} \]
posted @ 2019-06-18 10:18  xd_xumaomao  阅读(176)  评论(0编辑  收藏  举报