自信息(可以理解为该事件发生后所带来的信息量):\[I\left( x \right) = - \log P\left( x \right)\],(注意:这里为什么要用log呢,假设x,y独立同分步,则应该满足I(x,y)=I(x)+I(y)。而I(x,y)=-logP(x,y)=-log(P(x)P(y))= -logP(x)-logP(y)=I(x)+I(y)
熵(度量整个概率分布的不确定性):\[H\left( x \right) = {E_{x \sim P}}\left[ {I\left( x \right)} \right] = - {E_{x \sim P}}\left[ {\log P\left( x \right)} \right] = - \sum\limits_{i = 1}^n {p\left( {{x_i}} \right)\log \left( {p\left( {{x_i}} \right)} \right)} \]
KL 散度(又称相对熵,用来衡量两个分布的差异,是非负的):\[{D_{KL}}\left( {P||Q} \right) = {E_{x \sim P}}\left[ {\log \frac{{P\left( x \right)}}{{Q\left( x \right)}}} \right] = {E_{x \sim P}}\left[ {\log P\left( x \right) - \log Q\left( x \right)} \right] = \sum\limits_{i = 1}^n {p\left( {{x_i}} \right)\log \left( {\frac{{p\left( {{x_i}} \right)}}{{q\left( {{x_i}} \right)}}} \right)} \],假设P是真实分布,Q是模型生成的数据分布(上面公式可以理解为从P的角度看,Q和P分布的差异),我们要通过Q来使KL散度最小化,P可以看成是确定的,对Q求最小就相当于最小化交叉熵。
交叉熵(针对Q 最小化交叉熵等价于最小化KL 散度):\[H\left( {P,Q} \right) = - {E_{x \sim P}}\left[ {\log Q\left( x \right)} \right] = - \sum\limits_{i = 1}^n {p\left( {{x_i}} \right)\log \left( {q\left( {{x_i}} \right)} \right)} \]
条件熵(已知随机变量X 的条件下随机变量Y 的不确定性):\[H\left( {Y|X} \right) = \sum\limits_{i = 1}^n {P\left( {X = {x_i}} \right)H\left( {Y|X = {x_i}} \right)} \]
信息增益(已知X 的信息使得Y 的信息的不确定性减少的程度,等价于互信息):\[g\left( {Y,X} \right) = H\left( Y \right) - H\left( {Y|X} \right)\]
信息增益比:\[{g_R}\left( {Y,X} \right) = \frac{{g\left( {Y,X} \right)}}{{H\left( X \right)}}\]
基尼指数:\[Gini\left( p \right) = 1 - \sum\limits_{k = 1}^K {p_k^2} \]
posted @
2019-06-18 10:18
xd_xumaomao
阅读(
176 )
评论()
编辑
收藏
举报