Scribe Note 6 week 10_2


1 Feedback Capacity

Figure 1. Discrete memoryless channel with feedback

 

A channel with feedback is illustrated in Figure 1. We assume that all the received symbols are sent back immediately and noiselessly to the transmitter, which can then use them to decide which symbol to send next. Can we do better with feedback? The surprising answer is no, which we shall now prove. We define a  Formulafeedback code as a sequence of mappings Formula, where each Formula  is a function only of the message Formula  and the previous received values, Formula , and a sequence of decoding functions Formula . 

Thus,

 

Formula

 

When W  is uniformly distributed over Formula .

 

Theorem 1 (Feedback capacity) 

 

Formula

 

Where Formula  is the capacity with feedback.

 

Proof: Since a non-feedback code is a special case of a feedback code, any rate that can be achieved without feedback can be achieved with feedback and hence

 

Formula

 

Let W  be uniformly distributed over Formula . Then Formula}  and 

 

Formula

 

Where (a) follows from W  is uniformly distributed;

 

(b) follows from Fano's inequality 

 

(c) follows from data-processing inequality.

 

Formula

 

 

Where (a) follows from the chain rule;

 

(b) follows from the fact that Formula  is a function of Formula  and Formula ;

 

(c) follows from the fact that conditional on Formula  is independent of W  and past samples of Y .

 

(d) follows from the chain rule and the fact that conditioning reduces entropy;

 

(e) follows from the definition of capacity for a discrete memoryless channel. Putting these together, we obtain

 

Formula

 

and dividing by n  and letting Formula , we conclude that Formula

 

Thus, we cannot achieve any higher rates with feedback than we can without feedback, and Formula .

 

 

 

2 Source-channel separation

 

Let us define the setup under consideration. We have a source V  that generates symbols from an alphabet Formula . We will not make any assumptions about the kind of stochastic process produced by V  other than that it is from a finite alphabet and satisfies the AEP. We want to send the sequence of symbols Formula  over the channel so that the receiver can reconstruct the sequence. To do this, we map the sequence onto a codeword Formula  and send the codeword over the channel. The receiver looks at his received sequence Formula  and makes an estimate Formula  of the sequence Formula that was sent. The receiver makes an error if Formula . We define the probability of error as Formula ,

where I  is the indicator function and g(y^{n})  is the decoding function. The system is illustrated in Figure 2.

Figure 2. Joint source and channel coding

 

 

Theorem 2 (Source-channel coding theorem) If Formula  is a finite alphabet stochastic process that satisfies the AEP and Formula , then there exists a source-channel code with probability of error Formula . Conversely, for any stationary stochastic process, if Formula , the probability of error is bounded away from zero, and it is not possible to send the process over the channel with arbitrarily low probability of error. 

 

Proof: 

 

Achievability. Since we have assumed that the stochastic process satisfies the AEP, it implies that there exists a typical set Formula  of size Formula  which contains most of the probability. We will encode only the source sequences belonging to the typical set; all other sequences will result in an error. This will contribute at most Formula  to the probability of error.

 

We index all the sequences belonging to Formula . Since there are at most Formula  such sequences, Formula  bits suffice to index them. We can transmit the desired index to the receiver with probability of error less thanFormula  if Formula .

 

The receiver can reconstruct Formula  by enumerating the typical set Formula  and choosing the sequence corresponding to the estimated index. This sequence will agree with the transmitted sequence with high probability. To be precise,

 

Formula  for n  sufficiently large. Hence, we can reconstruct the sequence with low probability of error for n sufficiently large if Formula .

 

Converse: We wish to show that Formula  implies that Formula  for any sequence of source-channel codes

 

Formula

 

 

Thus Formula  is an arbitrary assignment of codewords to data sequence Formula , and Formula  is any decoding function (assignment of estimates Formula  to output sequences Formula ). By Fano's inequality, we must have Formula.

 

Hence for the code,

 

Formula

 

 

Where (a) follows from the definition of entropy rate of a stationary process;

 

(b) follows from Fano's inequality;

 

(c) follows from the data processing inequality;

 

(d) follows from the memorylessness of the channel.

 

Now letting Formula , we have Formula  and hence Formula .

 

The joint source-channel separation theorem enables us to consider the problem of source coding separately from the problem of channel coding. The source coder tries to find the most efficient representation of he source, and the channel coder encodes the message to combat the noise and errors introduced by the channel. The separation theorem says that the separate encoders can achieve the same rates as the joint encoder.

 

3 Gaussian channels

 

The most important continuous alphabet channel is the Gaussian channel . This is a time-discrete channel with output Formula  at time i , where Formula  is the sum of the input Formula  and the noise Formula . The noise Formula  is drawn i.i.d from a Gaussian distribution with variance N . Thus, Formula . The noise Formula  is assumed to be independent of the signal Formula .

 

• If the noise variance is zero, the receiver receives the transmitted symbol perfectly. Since X  can take on any real value, the channel can transmit an arbitrary real number with no error. Thus, the capacity is infinite.

 

• If the noise variance is nonzero and there is no constraint on the input. Intuitively, we can choose an infinite subset of inputs arbitrarily far apart(the distance is much larger than the variance), so that they are distinguishable at the output with arbitrarily small probability of error. The channel still has an infinite capacity.

 

The most common limitation on the input is an average energy or power constraint. Formula.  

 

This communication channel models many practical channels, including radio and satellite links. And by the central limit theorem, the cumulative effect of a large number of small random effects will be approximately normal. So it is "simple" to analyze.

 

Theorem3 The capacity of a Gaussian channel with power constraint P and noise variance N is Formula

 

Consider any codeword of length n. The received vector is normally distributed with mean equal to the true codeword and variance equal to the noise variance. With high probability, the received vector is contained in a sphere of radius Formula  around the true codeword. If we assign everything within this sphere to the given codeword, when this codeword is sent there will be an error only if the received vector falls outside the sphere, which has low probability. Similarly, we can choose other codewords and their corresponding decoding spheres. The volume of an n -dimensional sphere is of the form Formula , where r  is the radius of the sphere. Each decoding sphere has radius Formula . These spheres are scattered throughout the space of received vectors. The received vectors have energy no greater than n(P+N) , so they lie in a sphere of radius Formula . The maximum number of nonintersecting decoding spheres in this volume is no more than Formula . Therefore, the rate of the code is Formula .

Figure 3. Sphere packing for the Gaussian channel

 

For the details of the proof(achievablity and converse), please refer to the chapter 9 of the text book.

 

Assume that we have a set of Gaussian channels in parallel. The output of each channel is the sum of the input and Gaussian noise. For channel j , Formula , j=1,2,...,k,  with Formula , and the noise is assumed to be independent from channel to channel. We assume that there is a common power constraint on the total power used, that is, Formula.  

 

The information capacity of the channel C  is Formula

 

We can reduce the problem to find the power allotment that maximizes the capacity subject to the constrain that Formula.  By the Lagrange multipliers, we getFormula . Since Formula  is nonnegtive, we use the KKT conditions to verify that the solution Formula.