In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to :
If , the code will be all zero.
What information can we send to the friend? Very little. The Entropy is very low, and the information is also very low.
If , the code will be all one—also very low and .
The joint entropy of random variable and will be
- Log base-2 for computer science.
- Log base- for Physics and Mathematics.
The conditional entropy is also similar.
Then we can calculate the mutual information:
This is closer to the KL distance between p and q:
How close are and being independent? If the mutual information is small, then they are almost independent.