• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

• You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View

Homework 1

last edited by 15 years, 7 months ago

Q1. (Sanov's Theorem:)} Let  be drawn i.i.d. from the distribution , over alphabet . Let  be a set  of type-classes of strings of length . Prove that the probability of observing a string in a type-class in the set  satisfies

where  is the p.m.f. in  that is closest to  in K.-L. divergence.

Q2. (Estimating type:) The true p.m.f. of a random source  is unknown, and needs to be estimated by observing a sequence of  symbols drawn i.i.d. from . Let  be the estimate of ,

obtained simply by taking the type of the string .

Obtain an upper bound on the expected variance of  from (with expectation over ), i.e., ? (An order of magnitude bound in terms of n is all that is expected)

Q3. (Universal source code:) A universal source code of rate  is a code such that for any memoryless source  satisfying , the decoding error is less than . Prove that such codes exist.

Q4. (Non-positivity of triple" mutual information:) As in class, for any three random variables , , and , define  as .

(a). Find a joint distribution  such that  is positive.

(b). Find a joint distribution  such that  is negative.

Q5. (Feedback does not improve the source coding rate:) Suppose feedback were allowed in a source coding system -- could a lower rate still guarantee small" error? More formally, suppose the encoder Alice possesses a discrete memoryless source that generates  i.i.d symbols , each with entropy . Alice generates bit  (which may be a function of ), and transmits this to Bob. Bob processes this bit and transmits back a symbol  from an arbitrary alphabet ( may be a function of only ) back to Alice. Alice generates  bit  (which may be a function of  and ), and transmits this to Bob, who then processes this bit and transmits back a symbol  from an arbitrary alphabet (it may be a function of and ) back to Alice.

At the th step, Alice's th bit  may be a function of  and , and Bob's feedback bit may be a function of \$ and .

This continues for  steps ( is the rate).

After  steps, Bob decodes the source as the sequence of bits , which may be a function of  and . Prove that if , the probability of error is bounded away from zero.

Q6. (Practical" Source code:) Attempt Q3 of PS1. Hints:

(a) How many elementary" operations does it take to compute ?

(b) When you are searching a dictionary (an English one :) for a word, do you use binary search?

(c) Think of the leaves of the encoding binary tree as the pages of a dictionary...

Q7. Attempt any three of the following questions from Cover/Thomas, Chapter 2.

1, 2, 40, 16, 8, 10.

MEI Yuchen said

at 11:31 pm on Feb 1, 2009

what is the definition for "expected variance of p from q"? Thanks

sidjaggi said

at 12:21 am on Feb 2, 2009

good question -- added clarifying remarks...

Mak said

at 2:20 pm on Feb 2, 2009

How should we read Q4? What does "define I(X;Y;Z) as I(X;Y;Z)" mean?

The other problems look very hard too. Can anyone share some hints?

sidjaggi said

at 6:04 pm on Feb 2, 2009

Mak, thanks for pointing out the typo -- corrected.

Hints: Note that Q1 and Q3 are based on material from Cover/Thomas (look in the index). However, I'd prefer it if you tried yourself, or talked to friends, before looking at the book. For other problems, I can be persuaded to release hints on this wiki :) However, for that to happen, I need to see at least an "active discussion" (whatever that means), with some attempts on attacking the problem, before I give some hints.

MEI Yuchen said

at 12:10 am on Feb 3, 2009

I do not quite understand Q5.
What is the meaning of "Alice generates H(X) bit Y1"?
As far as I can see, after Alice processes the sequence of length n, a length nR binary sequence should be generated and passed to Bob.
How come only one bit is transmitted at one time?
Moreover, what is Z? How does Bob get Z from Y? How does Alice use Z together with X to generate Y?

I stronly agree with Mak. The problems are very difficult.

sidjaggi said

at 12:36 am on Feb 3, 2009

Hi Yuchen,

The idea is that each bit of Alice's encoded codeword can, in principle, depend on the previous bits that Bob feeds back to her. Similarly, each bit of Bob's feedback can depend on the previous bits that Alice has fed to him. The definitions in the question set up conditions on which bits can depend on which bits. Since this gives more freedom to possible encoding/decoding schemes, the question is whether Alice can get by without transmitting nR bits. Question -- do you really think it'll help substantially, or not?

It is true that some of the questions may seem hard. As I said earlier, I'm willing to give hints, if people use this wiki to discuss their attempts thus far.
Note that the homework has been online for two weeks -- some discussion of it earlier might have generated hints earlier :) Also many questions are from the standard text-book, or are slight variants thereof.

Also, you're fully expected to talk to your classmates -- two heads are better than one, and many heads are even better (though unlikely with unbiased coins :)

Lastly, the homeworks are intended to get you thinking information theory -- I've tried to choose questions that are relevant to classwork, and have some teaching value -- giving you softball questions without a purpose helps no-one learn. Even if you don't get a couple of questions, it's only a homework set -- it's not the end of the world :)

MEI Yuchen said

at 12:47 am on Feb 3, 2009

Thanks a lot.
At least I do understand the question now and can start attempting.

Tong said

at 8:17 pm on Feb 4, 2009

in Q2 the p(x) is same as a constant in the E[] or not?

sidjaggi said

at 11:38 pm on Feb 4, 2009

Hmm, not quite sure what you mean by the question.

The observed \hat{p}(x) can take many values, and for each value that \hat{p}(x) takes, there is an associated probability of observing that \hat{p}(x) (say we call that probability \hat{q}(x)). Then the expected variance equals the sum (over all possible values of \hat{p(x)) of the quantity \hat{q}(x)|p(x) - \hat{p}(x)|^2

hope that makes it clearer?