Differential Entropy
Prerequisite:
Let
be a function defined on
. Let
be a positive number and
be an arbitrary number inside the interval
. Define the Riemann Integral
as
whenever the limit exists. Please refer to http://en.wikipedia.org/wiki/Riemann_integral for a more rigorous establishment of Riemann Integral.
Recall:
If
is a discrete random variable, the entropy is
where
takes values in a discrete alphabet
under a probability mass function
.
Definition of differential entropy:
Now, if
is a continuous random variable, let
be the probability density function of
on a continuous set
, e.g.
. Let
be the support of
. Define the differential entropy
.
Two examples illustrating the properties of
:
Example 1:

Example 2:
Exercise 1:
Construct a random variable
with probability density function
such that
exists but
does not exist.
Suggested answer: Consider
for some constant
.
For any continuous random variable
discussed in the rest of this note, the corresponding
exists.
An interpretation of
:
Suppose the support of
is inside the closed interval
. From the Prerequisite above,
where
is a positive number and
is an arbitrary number inside the interval
. Note that 
Define the entropy of the quantized version
.Since
is a probability density function,
as
. Therefore, it is intuitive to guess that the statement "
as
" is true. A rigorous proof of the above statement is provided in Cover's book on p.247 and 248. In their proof, the mean value theorem is used to obtain
for sufficiently small
. In addition, their proof applies to a more general case that the support of
can be
.
Why is
called "differential entropy"?
It is clear that
and
as
. The quantity
can be viewed as
, which is similar to the entropy of some discrete uniform random variable. Therefore,
can be viewed as the limit of the difference of entropies between two random variables, which may explain why
is called "differential entropy".
On n-bit quantization of X:
If we quantize
such that each quantization step has the same length
, the number of bits required to specify which step
falls into is
when
is small. The n-bit quantization of
is equivalent to setting
for quantization. The entropy of an n-bit quantization of
is
. Therefore,
is the number of bits on the average required to describe an n-bit quantization of
. In Example 1, The number of bits required to describe an n-bit quantization of
is
In other words,
bits suffice to describe
in Example 1 to n-bit accuracy. Similarly, the number of bits required to describe
in Example 2to n-bit accuracy is
.
Properties of
:
1.
can be negative.
2.
may not exist.
3. Define
.
4. Define
.
5. Define
.
Jensen's Inequality for a continuous random variable
:
for "reasonable" convex
.
Consequences of Jensen's Inequality :
1.
.
2.
.
3.
.
4. Data processing inequality: If
,
.
5. Chain rule for
:
.
6. Chain rule for
:
.
Exercise 2 (EXTRA CREDIT):
Show that
where
.
An upper bound of
for
with
:
If
, then
.
Proof: Let
be the probability density function of a real continuous random variable. Then,
by Definition. Now, let
be the probability density function of a real normal random variable with mean
and variance
. More specifically,
for all
. We can write
. Note that
and
. Therefore,
.
Now,
. Consequently,
.
The achievability of the upper bound of
for
with
:
Compute
. Since
, When
is the random variable with probability density function
,
.
The entropy of a multivariate normal distribution:
The probability density function of
with mean
and convariance matrix
is
. Then,
. The proof is contained in Cover's book on p.250.
Comments (0)
You don't have permission to comment on this page.