Source Coding Theorem/Channel Coding Theorem
Q1. (Converse for source coding theorem:) A source generates n i.i.d. random variables , each with entropy H(X). The encoder Alice encodes this into a string of nR bits (R is the rate). The decoder Bob decodes the source as the sequence of bits . For any (Alice, Bob) pair, justify each step in the following sequence of equalities and inequalities.
Use this to prove a lower bound on the probability of error if R < H(X). Under what conditions are the inequalities tight? Is it crucial that we use fixed-length block coding here?
Key concepts in Q1: Converse to a coding theorem, chain of entropy equalities and inequalities, positive bound on probability of error.
Q2. (Achievability for source coding theorem:) Consider the following encoding scheme for the source in Q1. For each source sequence , Alice assigns bits i.i.d, uniformly at random. (Note that for R<1 this may be a many-to-one assignment.) This assigment is known a priori to both Alice and Bob, and is called the codebook. To decode, given a string , Bob examines the codebook to see if there is a unique typical that generates this . If so, he outputs . Else he declares a decoding error.
(a) Why is this a terrible scheme? As described, what is the space-complexity of encoding and decoding? How about the complexity of checking whether the code is "good" (has "small" error)? Also, is a non-zero probability of error necessary for this problem of compressing ?
(b) Why is this not such a terrible scheme? Prove that the probability of error is asymptotically negligible (in fact, that it decreases exponentially in the block-length n). Proceed as follows:
(i) Prove that with the probability of transmitting an atypical is "small". (How small? Can you use ideas from PS1 and Q4 of PS2 to precisely quantify this?)
(ii) Prove that for any , the probability that is "bad", i.e., another typical sequence also gets encoded to , is "small". How small? Use a combinatorial argument to bound this. Can you again provide an exponentially decaying bound?
(iii) How many typical are there (bounds?) Therefore what are bounds on the probability of observing a typical ?
(iv) Combining (i), (ii) and (iii) (via the union bound), can you come up with an upper bound (hopefully exponential) on the probability of the code being "bad", i.e., it has a probability of error > ?
(v) Using (iv), can you obtain a bound on the fraction of codes that are bad? Are we therefore done?
Remark: Admittedly this is a very convoluted way of proving a theorem that we can prove much more easily in several different ways (how? you should be able to think of at least one, having proven it in an earlier problem set...). Nonetheless, this is a teaser for other, more complicated theorems to come, where this technique is in fact the only one we know of to prove corresponding achievability theorems.
Key concepts in Q2: Achievability for a coding theorem, complexity of design and implementation, random coding, Sanov's theorem, bound on divergence, combinatorial probabilistic bound, bounds on typical sets and probabilities of typical sequences, probability of a random code being bad, fraction of bad random codes.
Q3. (Intuition for channel coding theorem:) A channel is discrete and memoryless, with per-symbol transition probabilities given by the conditional distribution p(y|x). source generates m i.i.d. uniformly random bits . The encoder Alice encodes this into a string of n symbols (m/n = R < 1 is the rate). After passing through the channel, the decoder Bob receives , and decodes the source as the sequence of bits . An error e is said to have been made if . The probability of this event is denoted by .
(a) For each transmitted X, what is the size of the corresponding set of Y vectors conditionally typical on X?
(b) What is the maximal total number of possible Y vectors?
(c) What would you guess the capacity C (maximal rate) of the channel to be, therefore? Is there a reason why this guess may be somewhat optimistic?
Key concepts in Q3: Channel, discrete, memoryless, transition probabilities, rate, error, capacity of a channel.
Q4. (Comparison of the source coding theorem and the channel coding theorem:) (a) (Source coding) What's the benefit of source coding? That is, what's the ratio between the number of bits required by a naive coding scheme (uncoded), and the best possible coding scheme?
(b) (Channel coding) Consider the following naive scheme for channel coding. To transmit a single bit over a BSC(p), (Binary Symmetric Channel with crossover probability p), Alice repeats it k times (repetition code). To decode, Bob uses the majority rule. Prove that to transmit a single bit over k channel uses, this is indeed the best possible scheme. (Hint: Consider the "Hamming distance" between the two possible codewords...) What's the probability of error of this scheme? What if this scheme was used to transmit n bits? What's the block probability of error, i.e., the probability of error of the block of n bits? What if this probability is required to be exponentially small in n -- how large should k be, and therefore what would the rate of the code be? How does this compare with the rate we hoe to get, via Q3?
Key concepts in Q4: Repetition code, Hamming distance, majority rule decoding = minimum distance decoding, probability of error computations.
Q5. (Converse for channel coding theorem:) As in Q1, use a chain of inequalities to come up with an upper bound on the channel capacity. Proceed as follows.
(a) First prove that . Hint -- the quantities in the inequalities, in no particular order, are
,,,, ,,,, and .
Justify each step!
(b) Now use (a) to prove that . Hint -- the quantities in the inequalities, in no particular order, are
, , , , , ,
Justify each step!
Q6. (Achievability for channel coding theorem:) Consider the following encoding scheme for the source in Q1. For each source sequence , Alice assigns symbols i.i.d, uniformly at random. This assigment is known a priori to both Alice and Bob, and is called the codebook. To decode, given a string , Bob examines the codebook to see if there is a unique typical that is jointly typical with this , and if there's a unique that encodes to this . If so, he outputs . Else he declares a decoding error.
(a) Prove that the probability of error is asymptotically negligible (in fact, that it decreases exponentially in the block-length n). Proceed as follows:
(i) Prove that with the probability of an pair being jointly atypical is "small". (How small?)
(ii) Prove that for any , the probability that is "bad", i.e., another typical sequence also gets encoded to , is "small". How small? Use a combinatorial argument to bound this. Can you again provide an exponentially decaying bound?
(iii) Conditioned on (i) and (ii), what's the probability that at least two in the codebook are both jointly typical with ? How many codewords are there in the codebook? Therefore what the probability over all codewords in all codes that there's a decoding error?
(iv) Combining (i), (ii) and (iii) (via the union bound), can you come up with an upper bound (hopefully exponential) on the probability of the code being "bad", i.e., it has a probability of error > ?
(v) Using (iv), can you obtain a bound on the fraction of codes that are bad? Are we therefore done?
Q7. (Feedback does not improve the source coding rate:) Suppose noiseless feedback were allowed in a source coding system -- could a lower rate still guarantee ``small" error? More formally, suppose the encoder Alice possesses a discrete memoryless source that generates i.i.d symbols , each with entropy . Alice generates symbol (which may be a function of ), and transmits this to Bob over the noisy channel {p(y|x)}. Bob gets processes this symbol and transmits back (noiselessly) a symbol from an arbitrary (but fixed) alphabet ( must be a (possibly random) function of only ) back to Alice. For instance, this may equal . Alice generates symbol (which may be a function of and ), and transmits this to Bob over the noisy channel, who then processes this symbol and transmits back (again noiselessly) a symbol (it may be a function of and ) back to Alice.
At the th step, Alice's th transmitted symbol may be a function of and , and Bob's feedback symbol may be a function of and .
This continues for steps ( is the rate).
After steps, Bob decodes the source as the sequence of bits , which may be a function of and . Prove that if , the probability of error is bounded away from zero.
Comments (10)
MEI Yuchen said
at 12:15 am on Feb 6, 2009
In which chapter of Cover/Thomas can I find source coding theorem?
I have no idea how to prove the achievability of source coding theorem.
sidjaggi said
at 1:25 am on Feb 6, 2009
Both the achievability and the converse proofs we used were "novel". They are, however, both very similar to the proof of the channel coding theorem...
Leonard Wong said
at 3:49 am on Feb 7, 2009
I think there is a typo in (b)(iv) The probability of error should be < (...) rather than > (...)
Leonard Wong said
at 3:58 am on Feb 7, 2009
(in question 2)
Silas said
at 6:34 pm on Feb 9, 2009
Question 6 is almost identical to question 2. Is there an error?
Tong said
at 5:55 pm on Feb 12, 2009
Sorry, i miss the lecture of source coding.
I am confusing the Q2 a and b. Is there any hint?
Cho Yiu Ng said
at 6:04 pm on Feb 12, 2009
Basically, you are asking the whole question 2. First of all, do you understand how the codebook is generated?
Tong said
at 6:09 pm on Feb 12, 2009
Use the function Y (X^n) to generate the codebook,right?
Cho Yiu Ng said
at 7:26 pm on Feb 12, 2009
No. The codebook is generated as follows. For each sequence X^n, we randomly and independently assign a codeword and this codeword is denoted by Y(X^n). The collection of the pairs (X^n, Y(X^n)) is the codebook.
Next, given a codeword Y^nR, how is it decoded for a given codebook generated in the above manner?
Tong said
at 5:49 pm on Feb 19, 2009
Question 4 b How to judge the scheme is best?
You don't have permission to comment on this page.