Channel Coding Theorem
Introduction
The mathematical model for communication system is shown as Fig. 1. The message U from some finite alphabet is encoded into the input sequence Xn, which when transmitted over the channel
produces the output sequence
Yn, which follows a probability distribution p(y|x). From the output sequence Yn, the recovered message is decoded. For error-free communication, the transmitted message U should
be the same as the recovered message . Since each input sequence Xn will induce a probability distribution on the output sequence Yn, if two different input sequences lead to the
same recovered sequence, an error is declared. Intuitively, larger rates will increase the probability of error due to greater chance of such an event.
The next section will discuss about the maximum rate (i.e. the capacity of channel)
at which an arbitrarily small probability of error can be achieved.
Fig. 1 Communication System
Channel Capacity
The "information" channel capacity of a discrete memoryless channel is defined as
where the maximum is taken over all possible input distributions p(x).
Converse for channel coding theorem
1). First we prove that; the proof is as following:
where (a) follows from the chain rule for entropy;
(b) follows from the fact that Yi depends only on Xi and is conditionally independent of everything else;
(c) follows from the fact that the entropy of a collection of random variables is less then the sum of their individual entropies;
(d) follows from the definition of the capacity.
2) Then we prove that
Proof:
Where (a) follows from the assumption that U is uniform distribution;
(b) follows from the definition of mutual information;
(c) follows from Fano's inequality;
(d) follows from data-processing inequality;
(e) follows from the above lemma.
Thus, we obtain that
From the inequality, we know if R>C, the probability of error is bounded away form 0 for sufficiently large n. Hence, to achieve arbitrarily small probability of error, it should be
Achievability for channel coding theorem
The basic idea to prove achievability for channel coding theorem is using random code selection, calculation of the average probability of error for a random choice of codewords, which symmetrizes the probability, and which can then be used to show the existence of at least one good code.
Now, we try to prove the achievability for channel coding theorem, that is
For a discrete memoryless channel, all rates below capacity C are achievable. Specifically, for every rate R<C, there exist a sequence of (2nR,n) with average probability of error Pe(n) converging to 0.
The proof is as follows.
1). Random codebook generation: Fix p(x) that achieves the capacity C. Randomly and independently generate 2nR sequences Xn(m) (i.e. codewords), each according to ;
then the probability of particular codebook C is
2). The chosen code book C and the channel transition p(y|x) are known to both sender and receiver before any transmissions take place;
3). A message U is chosen according to the uniform distribution with the probability of 2-nR;
4). Encoding: To send a message U(w), transmit Xn(w); w=1,2,3,...2nR
5). Decoding: We use joint typicality decoding. Let Yn be the received sequence. The receiver declares that message was sent if it is the unique such that
otherwise an error is declared;
6). Analysis of the probability of error: For a typical codeword, there are two different sources of error when we use jointly typical decoding: Either the output Yn is not jointly typical with the
transmitted codeword or there is some other codeword that is jointly typical with Yn;
7).; thus we can calculate the average probability of error, averaged over all codewords, channel outputs, and over all codebooks; that is
where (a) follows from the average probability of error, averaged over all codebooks, Pe(n)(c) is probability of error for particular codebook;
(b) follows from the average probability of error, averaged over all codewords in the codebook, which follows a uniform distribution;
(c) follows from the fact that by the symmetry of the code construction, the average probability of error averaged over all codes does not depend the particular index. Without loss of
generality that message U1was sent.
An error occurs if
By the union of events bound, we have
(a). The conditional typicality lemma shows that ;
(b). For the second term, the proof as following:
where (a) follows from the union of events bound;
(b) follows form the joint typicality lemma;
if R<C,
Thus, if R<C, the average probability of error averaged over the codebooks goes to zero as n goes to infinity. This indicates there must exist a sequence of (2nR,n) codes with (in fact, for any c < 1, one can show that fraction c of all possible codebooks must have probability of error less than 1/(1-c) times the probability derived above;
This proves that R<C is achievable.
Feedback does not improve the source coding rate
For communication system with feedback, we assume that a feedback signal Zi is sent back immediately and noiseless to the transmitter, where Zi is a function of Yi. Then, the transmitter decide which symbol to send based on feedback signals and the message. We define a (2nR,n) feedback code as a sequence of mapping Xi(U,Zi-1), where Xi is a function only of the message U and the previous received feedback signal Zi-1.
We denote CFB , the capcacity with feedback of a discrete memoryless channel to be the supremum of all rates achievable by feedback codes.
Now, we try to prove that CFB=C, which means that the feedback cannot help to increase the capactiy of the channel.
First, we prove that:
where (a) follows from the assumption that U is uniform distribution;
(b) follows from the definition of mutual information;
(c) follows from the Fano's inequality;
(d) follows from the data-processing inequality.
Then, we prove that:
where (a) follows from the chain rule for entropy;
(b) follows from the fact that Zi is a function of Yi;
(c) follows from the fact that Xi is a function of U and Zi-1.
(d) follows from the fact that Yi only depends on Xi, and is conditional independent of U,Yi-1,Zi-1;
(e) follows from the fact that the entropy of a collection of random variables is no more than the sum of their individual entropies;
(f) follows from the definition of capacity.
Thus, we obtain:
To achieve arbitrarily small probability of errors, it should be ; thus we prove that ;
On the other hand, since a nonfeedback code is also a special case of a feedback code, hence we have ;
Hence, we prove CFB=C;
For more details, please refer to Chapter 7 of " Elements of Information Theory".
Comments (1)
sidjaggi said
at 12:33 am on Nov 15, 2011
Very good job, Qike! Must've helped you in your midterms ;)
Really, a nice and clean explanation for most things! The one conceptual thing you could have given a bit more detail in was the fact that if average probability of error over all codebooks goes to zero, then in fact most codebooks have small probability of error.
You don't have permission to comment on this page.