Differential Entropy
Prerequisite:
Let be a function defined on . Let be a positive number and be an arbitrary number inside the interval . Define the Riemann Integral as whenever the limit exists. Please refer to http://en.wikipedia.org/wiki/Riemann_integral for a more rigorous establishment of Riemann Integral.
Recall:
If is a discrete random variable, the entropy is where takes values in a discrete alphabet under a probability mass function .
Definition of differential entropy:
Now, if is a continuous random variable, let be the probability density function of on a continuous set , e.g. . Let be the support of . Define the differential entropy .
Two examples illustrating the properties of :
Example 1:
Example 2:
Exercise 1:
Construct a random variable with probability density function such that exists but does not exist.
Suggested answer: Consider for some constant .
For any continuous random variable discussed in the rest of this note, the corresponding exists.
An interpretation of :
Suppose the support of is inside the closed interval . From the Prerequisite above, where is a positive number and is an arbitrary number inside the interval . Note that
Define the entropy of the quantized version .Since is a probability density function, as . Therefore, it is intuitive to guess that the statement " as " is true. A rigorous proof of the above statement is provided in Cover's book on p.247 and 248. In their proof, the mean value theorem is used to obtain for sufficiently small . In addition, their proof applies to a more general case that the support of can be .
Why is called "differential entropy"?
It is clear that and as . The quantity can be viewed as , which is similar to the entropy of some discrete uniform random variable. Therefore, can be viewed as the limit of the difference of entropies between two random variables, which may explain why is called "differential entropy".
On n-bit quantization of X:
If we quantize such that each quantization step has the same length , the number of bits required to specify which step falls into is when is small. The n-bit quantization of is equivalent to setting for quantization. The entropy of an n-bit quantization of is . Therefore, is the number of bits on the average required to describe an n-bit quantization of . In Example 1, The number of bits required to describe an n-bit quantization of is In other words, bits suffice to describe in Example 1 to n-bit accuracy. Similarly, the number of bits required to describe in Example 2to n-bit accuracy is .
Properties of :
1. can be negative.
2. may not exist.
3. Define .
4. Define .
5. Define .
Jensen's Inequality for a continuous random variable :
for "reasonable" convex .
Consequences of Jensen's Inequality :
1. .
2. .
3. .
4. Data processing inequality: If , .
5. Chain rule for : .
6. Chain rule for : .
Exercise 2 (EXTRA CREDIT):
Show that where .
An upper bound of for with :
If , then .
Proof: Let be the probability density function of a real continuous random variable. Then, by Definition. Now, let be the probability density function of a real normal random variable with mean and variance . More specifically, for all . We can write . Note that and . Therefore,
.
Now, . Consequently, .
The achievability of the upper bound of for with :
Compute . Since , When is the random variable with probability density function , .
The entropy of a multivariate normal distribution:
The probability density function of with mean and convariance matrix is . Then, . The proof is contained in Cover's book on p.250.
Comments (0)
You don't have permission to comment on this page.