entropy, crosss entropy

Information Entropy

Information Entropy measures the information missing before reception, saying the level of uncertainty of a random variable X X .
Information entropy definition:

H(X)=i=1np(xi)logp(xi)

where, X X is a random variable, p(xi) is the probability of X=xi X = x i . When log l o g is log2 l o g 2 the unit of H(X) H ( X ) is bit. When log l o g is log10 l o g 10 the unit of H(X) H ( X ) is dit.

Example

English character

X X is a random variable. It could be one character of a,b,c...x,y,z. The information entropy of X X :

H(X)=i=126126log2126=4.7

This means the information entropy of a English character is 4.7 bit, meaning 5 binary numbers are able to encode a English character.

ASCII code

X X is a random variable. It could be one ASCII code. The total number of ASCII code is 128. The information entropy of X:

H(X)=i=11281128log21128=7 H ( X ) = − ∑ i = 1 128 1 128 ∗ l o g 2 1 128 = 7

This means the information entropy of a English character is 7 bit, meaning 7 binary numbers are able to encode an ASCII code. We use a Byte, which is 8 bits, to stand for a ASCII code. The extra one bit is used for checking.

================================================

Cross Entropy in Machine Learning

In information theory, the cross entropy between two probability distributions p p and q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an “unnatural” probability distribution q q , rather than the “true” distribution p.
Cross entropy definition:

S(p,q)=xp(x)logq(x) S ( p , q ) = − ∑ x p ( x ) log ⁡ q ( x )

where, x x is each certain value of the set.p is the target distribution, q q is the temporary, unreal or unnatural distribution.

The more similar p and q q , the smaller S(p,q). So S(.) S ( . ) could be used as training target. There is an application, named tf.nn.softmax_cross_entropy_with_logits_v2(), in tensorflow for this.

Example

The training instance one-hot label is y_target=[0,1,0,0,0] y _ t a r g e t = [ 0 , 1 , 0 , 0 , 0 ] . The one-hot label calculated by you algorithm y_tmp=[0.1,0.1,0.2,0.1,0.5] y _ t m p = [ 0.1 , 0.1 , 0.2 , 0.1 , 0.5 ] .

You want to make y_tmp y _ t m p approximating y_target y _ t a r g e t . In other word, you goal is to make y_tmp[0] y _ t m p [ 0 ] smaller, to make y_tmp[1] y _ t m p [ 1 ] greater, to make y_tmp[2] y _ t m p [ 2 ] smaller … It’s a complex task. so much to be consider.

How about make S(y_target,y_tmp) S ( y _ t a r g e t , y _ t m p ) be smaller? One shot all done. Better.

Ref

Cross entropy - Wikipedia
https://en.wikipedia.org/wiki/Cross_entropy

A Friendly Introduction to Cross-Entropy Loss
https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/#entropy

Entropy - Wikipedia
https://en.wikipedia.org/wiki/Entropy

posted on   yusisc  阅读(28)  评论(0编辑  收藏  举报

编辑推荐:
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· AI技术革命,工作效率10个最佳AI工具

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5
点击右上角即可分享
微信分享提示