Channel Coding Theorem
Overview
We analyze a communication system, where U is the message, Xn is the encoded input sequence, Yn is the output sequence after transmission and Û is the decoded message.
The information channel capacity of a discrete memoryless channel is defined as the maximum mutual information in any single use of the channel, where the maximization is over all possible input probability distributions {p(xi)} on X, i.e.
In channel coding, the converse is R≤max I (X ;Y )+ε while the achievability is R≥max I (X ;Y )-ε


For each (typical) input n-sequence, there are approximately 2nH(Y|X) possible Y sequences, all of them equally likely as Fig. 2. We wish to ensure that no two X sequences produce the same Y output sequence. Otherwise, we will not be able to decide which X sequence was sent. The total number of possible (typical) Y sequences is ≈ 2nH(Y) .This set has to be divided into sets of size 2nH(Y|X) corresponding to the different input X sequences. The total number of disjoint sets is less than or equal to 2n(H(Y)−H(Y|X)) = 2nI (X;Y). Hence, we can send at most ≈ 2nI (X;Y) distinguishable sequences of length n.
Prove the converse of channel coding (Ref. to the textbook Element of Information Theory from p.206)
For convenience, we will use W to denote U for input message and Ŵ to denote Û for output message in the following proof.
Firstly, we want to prove


By the dentition of a discrete memoryless channel, Yi depends only on Xi and is conditionally independent of everything else. Then, we have the above derivation from (7.94), where (7.95) follows from the fact that the entropy of a collection of random variables is less than the sum of their individual entropies, and (7.97) follows from the dentition of capacity. Thus, we have proved that using the channel many times does not increase the information capacity in bits per transmission.
Secondly, we want to prove that any sequence of (2nR,n) codes with λn→0 must have R≤C. If the maximal probability of error tends to zero, the average probability of error for the sequence of codes also goes to zero [i.e., λn→0 implies Pen→0. For a fixed encoding rule Xn() and a fixed decoding rule Ŵ = g(Yn),we have W→Xn(W)→Yn→Ŵ. For each n, let W be drawn according to a uniform distribution over {1, 2,..., 2nR}.Since W has a uniform distribution, Pr(Ŵ≠W) = Pen = 1/2nR ∑i λi. So, we have,

where (a) follows from the assumption that W is uniform distribution, (b) is an identity, (c) is Fano’s inequality for W taking on at most 2nR values, (d) is the data-processing inequality, and (e) is from Lemma 7.9.2 in the first part. Then dividing by n, we obtain
When n→∞, the first two terms on the right-hand side tend to 0, --> R≤C and we get

This equation shows that if R>C, the probability of error is bounded away from 0 for sufficiently large n, we can construct codes for large n with Pen = 0. Hence, we cannot achieve an arbitrarily low probability of error at rates above capacity.
In order to maximize the


C=(f,g), nR=H(X)+𝜀, where f is the coding function and g is the decoding function.
Probability of each codeword,


Binary Symmetric Channel BSC(p) with crossover probability p


Size = {U}=2nR
Decoding rule: If∃unique u s.t. f(u)=xn, y∈Aεn (Yn|Xn=xn), then g(yn)=u, otherwise there is an error
Error events:
1.
2. For fixed particular yn, the probability of error event,
EeEchannel =


R<1-H(p)<E//E
Let p*be the probability distribution over X, for every message u, we choose the codebook f
∀u: choose f(u)=xn∈ Xn by choosing each Xi in p*(x)

Channel with feedback

Xi(U, Z1i-1)
I(xn;yn) ≤nC
I(U;yn) =H(yn)-H(yn|U)
= H(yn)-∑ H(yi| yii-1, U)
= H(yn)-∑ H(yi| yii-1, X, U)
≤∑ H(yi)- ∑ H(yi| xi)≤nC
nR≤1+PenR+ I(U; yn)≤ 1+PenR+nC
Comments (0)
You don't have permission to comment on this page.