"Entropy is the minimum descriptive complexity of a random variable"
"Mutual information is the communication rate in the presence of noise"
In communication there exists a data compression minimum and a transmission maximum or channel capacity.
Kolmogorov Complexity: the idea that the complexity of a string of data can be defined by the length of the shortest binary computer program for computing the string.
Entropy
Is a measure of the uncertainty of a random variable.
Let be a discrete random variable with alphabet (set of all possible outcomes) and probability mass function .
Entropy is defined by
The entropy is the theoretical lower bound, in bits, on how efficiently you can compress the outcomes of a random variable X assuming you’re coding them in binary and want lossless reconstruction.
Entropy of fair coin toss:
Interpretation: A fair coin toss carries 1 bit of information, meaning it’s maximally uncertain—you gain 1 full bit of information every time you observe the outcome.
For other logarithm bases we notate entropy as
Entropy as Expectation
If the expected value of the random variable is:
using we can interpret the entropy of as an expectation.
Immediate Properties
Joint Entropy and Conditional Entropy
Joint Entropy
Let be a pair of discrete random variables with joint distribution . Their joint entropy is:
Conditional Entropy
If the conditional entropy is:
Information theoretic measure of correlation:
Chain Rule
Proof:
Equivalently
and
Note however
Relative Entropy
Also called the Kullback-Leiber distance between two probability mass functions and is:
However it is not symmetric and does not satisfy the triangle inequality therefore it is not a norm.
Mutual Information
The mutual information is the relative entropy between the joint distribution and the product distribution :
Note in general.
For jointly Gaussian variables and with Pearson Correlation, the mutual information is: