Q1. (Sanov's Theorem:)} Let be drawn i.i.d. from the distribution , over alphabet . Let be a set of type-classes of strings of length . Prove that the probability of observing a string in a type-class in the set satisfies
where is the p.m.f. in that is closest to in K.-L. divergence.
Q2. (Estimating type:) The true p.m.f. of a random source is unknown, and needs to be estimated by observing a sequence of symbols drawn i.i.d. from . Let be the estimate of ,
obtained simply by taking the type of the string .
Obtain an upper bound on the expected variance of from (with expectation over ), i.e., ? (An order of magnitude bound in terms of n is all that is expected)
Q3. (Universal source code:) A universal source code of rate is a code such that for any memoryless source satisfying , the decoding error is less than . Prove that such codes exist.
Q4. (Non-positivity of ``triple" mutual information:) As in class, for any three random variables , , and , define as .
(a). Find a joint distribution such that is positive.
(b). Find a joint distribution such that is negative.
Q5. (Feedback does not improve the source coding rate:) Suppose feedback were allowed in a source coding system -- could a lower rate still guarantee ``small" error? More formally, suppose the encoder Alice possesses a discrete memoryless source that generates i.i.d symbols , each with entropy . Alice generates bit (which may be a function of ), and transmits this to Bob. Bob processes this bit and transmits back a symbol from an arbitrary alphabet ( may be a function of only ) back to Alice. Alice generates bit (which may be a function of and ), and transmits this to Bob, who then processes this bit and transmits back a symbol from an arbitrary alphabet (it may be a function of and ) back to Alice.
At the th step, Alice's th bit may be a function of and , and Bob's feedback bit may be a function of $ and .
This continues for steps ( is the rate).
After steps, Bob decodes the source as the sequence of bits , which may be a function of and . Prove that if , the probability of error is bounded away from zero.
Q6. (``Practical" Source code:) Attempt Q3 of PS1. Hints:
(a) How many ``elementary" operations does it take to compute ?
(b) When you are searching a dictionary (an English one :) for a word, do you use binary search?
(c) Think of the leaves of the encoding binary tree as the pages of a dictionary...
Q7. Attempt any three of the following questions from Cover/Thomas, Chapter 2.
1, 2, 40, 16, 8, 10.
Comments (9)
MEI Yuchen said
at 11:31 pm on Feb 1, 2009
what is the definition for "expected variance of p from q"? Thanks
sidjaggi said
at 12:21 am on Feb 2, 2009
good question -- added clarifying remarks...
Mak said
at 2:20 pm on Feb 2, 2009
How should we read Q4? What does "define I(X;Y;Z) as I(X;Y;Z)" mean?
The other problems look very hard too. Can anyone share some hints?
sidjaggi said
at 6:04 pm on Feb 2, 2009
Mak, thanks for pointing out the typo -- corrected.
Hints: Note that Q1 and Q3 are based on material from Cover/Thomas (look in the index). However, I'd prefer it if you tried yourself, or talked to friends, before looking at the book. For other problems, I can be persuaded to release hints on this wiki :) However, for that to happen, I need to see at least an "active discussion" (whatever that means), with some attempts on attacking the problem, before I give some hints.
MEI Yuchen said
at 12:10 am on Feb 3, 2009
I do not quite understand Q5.
What is the meaning of "Alice generates H(X) bit Y1"?
As far as I can see, after Alice processes the sequence of length n, a length nR binary sequence should be generated and passed to Bob.
How come only one bit is transmitted at one time?
Moreover, what is Z? How does Bob get Z from Y? How does Alice use Z together with X to generate Y?
I stronly agree with Mak. The problems are very difficult.
sidjaggi said
at 12:36 am on Feb 3, 2009
Hi Yuchen,
The idea is that each bit of Alice's encoded codeword can, in principle, depend on the previous bits that Bob feeds back to her. Similarly, each bit of Bob's feedback can depend on the previous bits that Alice has fed to him. The definitions in the question set up conditions on which bits can depend on which bits. Since this gives more freedom to possible encoding/decoding schemes, the question is whether Alice can get by without transmitting nR bits. Question -- do you really think it'll help substantially, or not?
It is true that some of the questions may seem hard. As I said earlier, I'm willing to give hints, if people use this wiki to discuss their attempts thus far.
Note that the homework has been online for two weeks -- some discussion of it earlier might have generated hints earlier :) Also many questions are from the standard text-book, or are slight variants thereof.
Also, you're fully expected to talk to your classmates -- two heads are better than one, and many heads are even better (though unlikely with unbiased coins :)
Lastly, the homeworks are intended to get you thinking information theory -- I've tried to choose questions that are relevant to classwork, and have some teaching value -- giving you softball questions without a purpose helps no-one learn. Even if you don't get a couple of questions, it's only a homework set -- it's not the end of the world :)
MEI Yuchen said
at 12:47 am on Feb 3, 2009
Thanks a lot.
At least I do understand the question now and can start attempting.
Tong said
at 8:17 pm on Feb 4, 2009
in Q2 the p(x) is same as a constant in the E[] or not?
sidjaggi said
at 11:38 pm on Feb 4, 2009
Hmm, not quite sure what you mean by the question.
The observed \hat{p}(x) can take many values, and for each value that \hat{p}(x) takes, there is an associated probability of observing that \hat{p}(x) (say we call that probability \hat{q}(x)). Then the expected variance equals the sum (over all possible values of \hat{p(x)) of the quantity \hat{q}(x)|p(x) - \hat{p}(x)|^2
hope that makes it clearer?
You don't have permission to comment on this page.