Skip to main content


In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable XX, which takes values in the alphabet X\mathcal{X} and is distributed according to p:X[0,1]p: \mathcal{X}\to[0, 1]:

E(X):=xXp(x)logp(x)=E[logp(X)],E(X) := -\sum\limits_{x \in \mathcal{X}} p(x) \log p(x) = \mathbb{E}[-\log p(X)] ,

Entropy (information theory)

If P=0P = 0, the code will be all zero.

What information can we send to the friend? Very little. The Entropy HH is very low, and the information II is also very low.

If P=1P = 1, the code will be all one—also very low HH and II.

H(x)xXp(x)lnp(x)H(x) -\sum\limits_{x \in \mathcal{X}} p(x) \ln p(x)

The joint entropy of random variable XX and YY will be

H(x,y)=xXyYp(x,y)lnp(x,y)H(x, y) = -\sum\limits_{x \in \mathcal{X}}\sum\limits_{y \in \mathcal{Y}} p(x, y) \ln p(x, y)
  • Log base-2 for computer science.
  • Log base-ee for Physics and Mathematics.

The conditional entropy is also similar.

H(yx)=xXyYp(x,y)lnp(yx)H(y|x) = -\sum\limits_{x \in \mathcal{X}}\sum\limits_{y \in \mathcal{Y}} p(x, y) \ln p(y|x)

Then we can calculate the mutual information:

I(x,y)=xXyYp(x,y)lnp(x,y)p(x)p(y)I(x, y) = -\sum\limits_{x \in \mathcal{X}}\sum\limits_{y \in \mathcal{Y}} p(x, y) \ln{p(x, y) \over {p(x) p(y)}}

This is closer to the KL distance between p and q: KL(pq)KL(p || q)

How close are XX and YY being independent? If the mutual information is small, then they are almost independent.

Also, I(X,Y)=H(Y)H(YX)I(X,Y) = H(Y) - H(Y|X)