In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet X and is distributed according to p:X→[0,1]:
E(X):=−x∈X∑p(x)logp(x)=E[−logp(X)],
Entropy (information theory)
If P=0, the code will be all zero.
What information can we send to the friend? Very little. The Entropy H is very low, and the information I is also very low.
If P=1, the code will be all one—also very low H and I.
H(x)−x∈X∑p(x)lnp(x)
The joint entropy of random variable X and Y will be
H(x,y)=−x∈X∑y∈Y∑p(x,y)lnp(x,y)
- Log base-2 for computer science.
- Log base-e for Physics and Mathematics.
The conditional entropy is also similar.
H(y∣x)=−x∈X∑y∈Y∑p(x,y)lnp(y∣x)
Then we can calculate the mutual information:
I(x,y)=−x∈X∑y∈Y∑p(x,y)lnp(x)p(y)p(x,y)
This is closer to the KL distance between p and q: KL(p∣∣q)
How close are X and Y being independent? If the mutual information is small, then they are almost independent.
Also, I(X,Y)=H(Y)−H(Y∣X)