Scribe_Note_7_2


Scribe Notes 7 (by Xixuan Wu)

Generally source coding reduce the redundancy, channel coding add the redundancy.

Description of an arbitrary real number requires an infinite number of bits. Finite representation of a continuous random variable can never be perfect. So it's necessary to define the "goodness" of a representation of a source. The problem is: Given a source distribution and a distortion measure, what is the minimum expected distortion achievable at a particular rate? or What's the minimum rate required to get a proper distortion?

 

1. Quantization

The representation of X is denoted as Formula.

 

1.1 1bit random variable Quantization Example

Let Formula, assume the distortion measure is Formula. We have only one bit to represent X, so it's clear that the bit will show whether X>0 or not. To minimize squared error, each reproduced symbol should be the conditional mean of its region, the equation and figure is showed as below:

The reconstruction points are at -0.79 and 0.79.

 

1.2 2bit Quantization Example

Like PS7 Q1, if we choose one bit to express X which is from {0,1,2,3}, we will choose 0 to express X=0 or 1, choose 1 to express X=2 or 3. When we're decoding, we decode 0 to X=0, decode 1 to X=1.

So the distortion's expectation E(distortion) = 0.25*1+0.25*1 = 0.5.f

If X is from {0,1,2,3}^2, we use 2 bits to express X, it's just like to select 4 positions in the graph:


The left side is the same choice as 1bit quantization, but for some positions(?), the distortion > 1. If we choose picture of right side, the distortion will keep 1.

 

2. Lloyd Algorithm

Given X'(X), to choose X which is most closest to X' as the representation is a good choice. It only reach local minimum, not global minimum. This partition over space is called Voronoi partition. It's used in K-means clustering. First the algorithm randomly selects K centers, then divide the space according to distances between all points and these centers. Then it adjust the centers according to the new partition over the space. After some iterations, centers never change. The algorithm reach the local optimum. 

 

3. Rate distortion function

2.1 Rate distortion encoder and decoder

2.2 distortion measure

Distortion measure is a mapping: Formula. It's said to be bounded if Formula

Distortion between sequences Formula and Formula is Formula

 

2.3 rate distortion function

A rate distortion pair (R,D) is said to be achievable if there exists a sequence of Formula-rate distortion codes Formula with Formula. R(D) is the infimum of rates R such that (R,D) is in the rate distortion region of the source for a given distortion D. D(R) is the infimum of all distortions D such that (R,D) is in the rate distortion region of the source for a given rate R.

 

The information rate distortion function : Formula

Why is this? Look at the figure below:

The size of typical set of Xn is Formula. Look reversely of the test channel, the ball size is Formula. So the number of ball is : Formula.

So the rate is Formula.

 

2.3.1 rate distortion function of binary source

The rate distortion rate R(D) for a Bernoulli(p) source with Hamming distortion is given by:

 

Proof: Let Formula denote modulo 2 addition. Formula means Formula.

Formula

The lower bound is found by a joint distribution that meets the distortion constraint, just like: 

Formula. If D >= p, just set Formula. The function is illustrated in:

 

2.3.2 rate distortion function of Gaussian Source

The rate distortion rate R(D) for a Formula source with squared-error distortion is given by:


Proof:

Formula

FormulaFormula

The joint distribution that achieves this lower bound is shown as:

if Formula, we choose Formula. If FormulaFormula.

The function is illustrated in:

 

 

2.3.2 rate distortion function of Independent normal random sources

The information rate distortion function of vector is :

Formula, where Formula

Formula

Formula

So, Formula. To solve this optimization problem, we use Lagrange multipliers. The solution is:

Like a kind of reverse water-filling, the result is shown in:

No bits are used to describe random variables with variances less than the given constant Formula, which is chosen so that Formula.

 

2.4 converse to the rate distortion theorem

To prove: we cannot achieve a distortion of less than D if we describe X at a rate less than R(D). R(D) is given as above.

Proof: we prove R >= R(D) for randomized mappings fn and gn, as long as fn takes on at most 2^nR values.

(a): the range of fn is at most 2^nR.

(b): entropy >= 0

(c): data-processing inequality

(d): Xi is independent

(e): chain rule for entropy

(f): the conditioning will reduce the entropy

(g): definition of the rate distortion function

(h): convexity of the rate distortion function, Jensen's inequality

(i): definition of distortion for blocks of length n

 

4. Strong typical sequences

 

For a fixed codebook C, There exist 3 categories of sequences Formula:

1. Nontypical sequences: we can choose n large enough so that Formula. So the nontypical sequences can contribute at most Formula to the expected distortion.

2. Typical sequences Formula such that there exists a codeword Formula that is jointly typical with Formula. Since the source sequence and the codeword are strongly jointly typical, the continuity of the distortion as a function of the joint distribution ensures that they are also distortion typical.  So the distortion between these Formula and their codewords is bounded by Formula. The contribution to expected distortion is 1*Formula = Formula.

3. Typical sequences Formula such that there doesn't exist a codeword Formula that is jointly typical with Formula. Let Pe be the total probability of such sequences. The contribution to expected distortion is at most Formula.

The first and last class are not well represented by the distortion code. But Pr(X in first class) is very small. For Pe:

Formula

and Formula

so Formula, also is very small if Formula and Formula.

So the bad representation of x^n is very small.  So the average distortion is close to D and the probability of second class is close to 1.

 

5. Computation of the rate distortion function

The problem is to minimize R(D) over a convex set of all Formula and Formula for all x and satisfying the distortion limit.

Formula

Let Formula, and differentiate with respect to Formula, we have

Formula, which means Formula.

Given Formula, we get: Formula.

Multiplying this by p(x) and summing over all x, we get :Formula

If Formula, according to KKT conditions, Formula.  So we get:

Formula, if FormulaFormula, if Formula.

 

If The distance measure is the relative entropy, we have lemma:


We use this lemma 10.131 to rewrite the R(D):

Formula

If A is the set of all joint distributions with marginal p(x) that satisfy the distortion constraints and if B is the set of product distributions p(x)r(x') with arbitrary r(x'), we can get: Formula.

With Lagrange multipliers' use, we can obtain: Formula

For this conditional distribution Formula, we calculate the output distribution Formula that minimizes the mutual information(10.132,133), which is:

Formula. So we can repeat the process to minimize objective function over q(*|*) and over r(*), finally get the optimum.