If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Scribe Note 3_2

Page history last edited by Francis Yuen Piu Hung 12 years, 6 months ago

Channel Coding Theorem

Overview

We analyze a communication system, where U is the message, Xⁿ is the encoded input sequence, Yⁿ is the output sequence after transmission and Û is the decoded message.

The information channel capacity of a discrete memoryless channel is defined as the maximum mutual information in any single use of the channel, where the maximization is over all possible input probability distributions {p(x_i)} on X, i.e.

In channel coding, the converse is R≤max I (X ;Y )+ε while the achievability is R≥max I (X ;Y )-ε

For each (typical) input n-sequence, there are approximately 2^nH^(Y|X)possible Y sequences, all of them equally likely as Fig. 2. We wish to ensure that no two X sequences produce the same Y output sequence. Otherwise, we will not be able to decide which X sequence was sent. The total number of possible (typical) Y sequences is ≈ 2^nH^(Y) .This set has to be divided into sets of size 2^nH^(Y|X) corresponding to the different input X sequences. The total number of disjoint sets is less than or equal to 2ⁿ^{(H(Y)−H(Y|X))} = 2^nI^(X;Y). Hence, we can send at most ≈ 2^nI^(X;Y)distinguishable sequences of length n.

Prove the converse of channel coding (Ref. to the textbook Element of Information Theory from p.206)

For convenience, we will use W to denote U for input message and Ŵ to denote Û for output message in the following proof.

Firstly, we want to prove

By the dentition of a discrete memoryless channel, Y_i depends only on X_i and is conditionally independent of everything else. Then, we have the above derivation from (7.94), where (7.95) follows from the fact that the entropy of a collection of random variables is less than the sum of their individual entropies, and (7.97) follows from the dentition of capacity. Thus, we have proved that using the channel many times does not increase the information capacity in bits per transmission.

Secondly, we want to prove that any sequence of (2^nR,n) codes with λⁿ→0 must have R≤C. If the maximal probability of error tends to zero, the average probability of error for the sequence of codes also goes to zero [i.e., λⁿ→0 implies P_eⁿ→0. For a fixed encoding rule Xⁿ() and a ﬁxed decoding rule Ŵ = g(Yⁿ),we have W→Xⁿ(W)→Yⁿ→Ŵ. For each n, let W be drawn according to a uniform distribution over {1, 2,..., 2^nR}.Since W has a uniform distribution, Pr(Ŵ≠W) = P_eⁿ = 1/2^nR ∑_i λ_i. So, we have,

where (a) follows from the assumption that W is uniform distribution, (b) is an identity, (c) is Fano’s inequality for W taking on at most 2^nRvalues, (d) is the data-processing inequality, and (e) is from Lemma 7.9.2 in the first part. Then dividing by n, we obtain

When n→∞, the ﬁrst two terms on the right-hand side tend to 0, --> R≤C and we get

This equation shows that if R>C, the probability of error is bounded away from 0 for sufficiently large n, we can construct codes for large n with P_eⁿ = 0. Hence, we cannot achieve an arbitrarily low probability of error at rates above capacity.

In order to maximize the