In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet X and is distributed according to p:Xโ[0,1]:
E(X):=โxโXโโp(x)logp(x)=E[โlogp(X)],
Entropy (information theory)
If P=0, the code will be all zero.
What information can we send to the friend? Very little. The Entropy H is very low, and the information I is also very low.
If P=1, the code will be all oneโalso very low H and I.
H(x)โxโXโโp(x)lnp(x)
The joint entropy of random variable X and Y will be
H(x,y)=โxโXโโyโYโโp(x,y)lnp(x,y)
- Log base-2 for computer science.
- Log base-e for Physics and Mathematics.
The conditional entropy is also similar.
H(yโฃx)=โxโXโโyโYโโp(x,y)lnp(yโฃx)
Then we can calculate the mutual information:
I(x,y)=โxโXโโyโYโโp(x,y)lnp(x)p(y)p(x,y)โ
This is closer to the KL distance between p and q: KL(pโฃโฃq)
How close are X and Y being independent? If the mutual information is small, then they are almost independent.
Also, I(X,Y)=H(Y)โH(YโฃX)