# Channel Coding Theorem

## Introduction

The mathematical model for communication system is shown as Fig. 1. The message U from some finite alphabet is encoded into the input sequence X^{n}, which when transmitted over the channel
produces the output sequence

Y^{n}, which follows a probability distribution p(y|x). From the output sequence Y^{n}, the recovered message is decoded. For error-free communication, the transmitted message U should

be the same as the recovered message . Since each input sequence X^{n }will induce a probability distribution on the output sequence Y^{n}, if two different input sequences lead to the
same recovered sequence, an error is declared. Intuitively, larger rates will increase the probability of error due to greater chance of such an event.
The next section will discuss about the maximum rate (i.e. the capacity of channel)

at which an arbitrarily small probability of error can be achieved.

Fig. 1 Communication System

** **

**Channel Capacity**

** **

The "information" channel capacity of a discrete memoryless channel is defined as

where the maximum is taken over all possible input distributions p(x).

** **

**Converse for channel coding theorem**

1). First we prove that; the proof is as following:

where (a) follows from the chain rule for entropy;

(b) follows from the fact that Y_{i} depends only on X_{i} and is conditionally independent of everything else;

(c) follows from the fact that the entropy of a collection of random variables is less then the sum of their individual entropies;

(d) follows from the definition of the capacity.

2) Then we prove that

Proof:

Where (a) follows from the assumption that U is uniform distribution;

(b) follows from the definition of mutual information;

(c) follows from Fano's inequality;

(d) follows from data-processing inequality;

(e) follows from the above lemma.

Thus, we obtain that

From the inequality, we know if R>C, the probability of error is bounded away form 0 for sufficiently large n. Hence, to achieve arbitrarily small probability of error, it should be

** **

**Achievability for channel coding theorem**

The basic idea to prove achievability for channel coding theorem is using random code selection, calculation of the average probability of error for a random choice of codewords, which symmetrizes the probability, and which can then be used to show the existence of at least one good code.

Now, we try to prove the achievability for channel coding theorem, that is

For a discrete memoryless channel, all rates below capacity C are achievable. Specifically, for every rate R<C, there exist a sequence of (2^{nR},n) with average probability of error P_{e}^{(n)} converging to 0.

The proof is as follows.

1). **Random codebook generation**: Fix p(x) that achieves the capacity C. Randomly and independently generate 2^{nR} sequences X^{n}(m) (*i.e.* codewords), each according to ;

then the probability of particular codebook C is

2). The chosen code book C and the channel transition p(y|x) are known to both sender and receiver before any transmissions take place;

3). A message U is chosen according to the uniform distribution with the probability of 2^{-nR};

4). Encoding: To send a message U(w), transmit X^{n}(w); w=1,2,3,...2^{nR}

5). Decoding: We use joint typicality decoding. Let Y^{n} be the received sequence. The receiver declares that message was sent if it is the unique such that

otherwise an error is declared;

6). Analysis of the probability of error: For a typical codeword, there are two different sources of error when we use jointly typical decoding: Either the output Y^{n} is not jointly typical with the

transmitted codeword or there is some other codeword that is jointly typical with Y^{n};

7).; thus we can calculate the average probability of error, averaged over all codewords, channel outputs, and over all codebooks; that is

where (a) follows from the average probability of error, averaged over all codebooks, P_{e}^{(n)}(c) is probability of error for particular codebook;

(b) follows from the average probability of error, averaged over all codewords in the codebook, which follows a uniform distribution;

(c) follows from the fact that by the symmetry of the code construction, the average probability of error averaged over all codes does not depend the particular index. Without loss of

generality that message U^{1}was sent.

An error occurs if

By the union of events bound, we have

(a). The conditional typicality lemma shows that ;

(b). For the second term, the proof as following:

where (a) follows from the union of events bound;

(b) follows form the joint typicality lemma;

if R<C,

Thus, if R<C, the average probability of error averaged over the codebooks goes to zero as n goes to infinity. This indicates there must exist a sequence of (2^{nR},n) codes with (in fact, for any c < 1, one can show that fraction c of all possible codebooks must have probability of error less than 1/(1-c) times the probability derived above;

This proves that R<C is achievable.

**Feedback does not improve the source coding rate**

For communication system with feedback, we assume that a feedback signal Z_{i} is sent back immediately and noiseless to the transmitter, where Z_{i} is a function of Y^{i}. Then, the transmitter decide which symbol to send based on feedback signals and the message. We define a (2^{nR},n) feedback code as a sequence of mapping X_{i}(U,Z^{i-1}), where X_{i} is a function only of the message U and the previous received feedback signal Z^{i-1}.

We denote C_{FB }, the capcacity with feedback of a discrete memoryless channel to be the supremum of all rates achievable by feedback codes.

Now, we try to prove that C_{FB}=C, which means that the feedback cannot help to increase the capactiy of the channel.

First, we prove that:

where (a) follows from the assumption that U is uniform distribution;

(b) follows from the definition of mutual information;

(c) follows from the Fano's inequality;

(d) follows from the data-processing inequality.

Then, we prove that:

where (a) follows from the chain rule for entropy;

(b) follows from the fact that Z_{i} is a function of Y^{i};

(c) follows from the fact that X_{i} is a function of U and Z^{i-1}.

(d) follows from the fact that Y_{i} only depends on X_{i}, and is conditional independent of U,Y^{i-1},Z^{i-1};

(e) follows from the fact that the entropy of a collection of random variables is no more than the sum of their individual entropies;

(f) follows from the definition of capacity.

Thus, we obtain:

To achieve arbitrarily small probability of errors, it should be ; thus we prove that ;

On the other hand, since a nonfeedback code is also a special case of a feedback code, hence we have ;

Hence, we prove C_{FB}=C;

For more details, please refer to Chapter 7 of " Elements of Information Theory".

## Comments (1)

## sidjaggi said

at 12:33 am on Nov 15, 2011

Very good job, Qike! Must've helped you in your midterms ;)

Really, a nice and clean explanation for most things! The one conceptual thing you could have given a bit more detail in was the fact that if average probability of error over all codebooks goes to zero, then in fact most codebooks have small probability of error.

You don't have permission to comment on this page.