IEG 5154--Information Theory

If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Scribe Notes 5-3

Page history last edited by MEI Yuchen 15 years, 1 month ago

Difference between differential entropy and entropy(for discrete r.v.)

1. Differential entropy can be negative;

2. In general, differential entropy has no upper bound. With given variance, Gaussian distribution maximizes the differential entropy.

3. Change of mean won't change its differential entropy, but scaling may change differential entropy.

Typical set for continuous random variables

The definition is similar to that of weakly typical set of discrete random variables.

Def: A set of sequences is said to be a typical set with respect to p.d.f f(x) if it contains all sequences Formula s.t.

Formula , or equivalently,

Formula .

Some properties of typical set for continuous r.v.

1. Formula (since Formula )

2. Formula

3. Formula

To prove 2 & 3, use the definition and the following inequality:

Formula

Error estimation

For discrete r.v., we have Fano's Inequality Formula .

However, for continuous r.v, we cannot use Fano's idea to estimate error because Formula .

We use mean square instead.

Two facts: I. Formula where Formula denotes the mean of X

II. The differential entropy for Formula is Formula

Then, Formula //mean of X gives the best estimation

Formula

Formula //Gaussian distribution maximizes the differential entropy

for given variance. The unit of h(X) here is bits.

"=" iff X is Gaussian and X-hat is equal to E(X).

Therefore, Formula

Channel Coding(PS6 Q1)

Channel capacity Formula

(a) Formula (Converse)

When X is uniformly distributed, H(X)=1 and we get I(X;Y) is exactly equal to 1. (Achievability)

Then the channel capacity=1

(b) Similar to (a), channel capacity=1

(e) Calculate I(X;Y) by its definition, we get I(X;Y)=(1-p)H(X). Therefore, channel capacity=1-p.

Another example,

Channel: p(0|0)=p(1|0)=p(1|1)=p(2|1)=p(2|2)=p(3|2)=p(3|3)=p(1|3)=0.5

I(X;Y)=H(Y)-H(Y|X)=H(Y)-1 To maximize I(X;Y) is equivalent to maximize H(Y).

When X is uniformly distributed, H(Y)=2 is maximized. Thus, channel capacity=1

Comments (2)

It is not clear to me why the last sentence of the section of Error Estimation holds. Can someone help me with a rigorous proof?

Can anyone modify the proof to give that result?

You don't have permission to comment on this page.

To join this workspace, request access.

Already have an account? Log in!

Loading…

Related Links:

About this Course

Loading…