**Scribe Notes 7 (by Luk Hon Tung)**

For typing reasons, X’ is same as in this scribe notes.

**1. Introduction**

Source coding theory and channel coding theory. Suppose you have some information X, by source coding theory we know that we can compress it to no less than H(X) if we want it to be recovered exactly. But if allow some distortion on decoding, how much we can further compress on the information?

**2. Quantization\Rate-Distortion theory**

Suppose you have some information X^{n}, it is encoded to get f(X^{n}), usually the compression rate H(f(X^{n}))<H(X^{n}). After that, f(X^{n}) is decoded to get X’^{n}, usually X^{n}≠X’^{n} because of over compression. The distortion of the information is a function of X^{n} and X’^{n}, denoted by d(X^{n}, X’^{n}). The goal is to minimize the compression rate R = H(f(X^{n})) such that the expected distortion E _{X}^{n} [d(X^{n}, X’^{n})] < D, where D is a constant.

d(.) is additive and d(X^{n}, X’^{n}) is defined as

.

**3. 1 bit quantization example**

let X ∼ *N*(0, σ^{2}), and assume a squared-error distortion measure. In this case we wish to find the function X’(X) such that X’ takeson at most 2^{R} values and minimizes E(X – X’(X))^{2}. If we are given onebit to represent X, it is clear that the bit should distinguish whether ornot X > 0. To minimize squared error, each reproduced symbol should

be the conditional mean of its region. Thus,

For 1 bit case, the reconstruction points are at -0.79 and 0.79 for standard normal distribution.

**4. Problem Set 7 Q1.**

(a)

Scheme: Encode 1, 2 to 0 and encode 3, 4 to 1. Decode 0 to 0 and decode 1 to 2.

The expected distortion E[distortion] = 0.5.

(b) Suppose (X_{1},X_{2})∈{1,2,3,4}^{2} are two independent symbols.

Distribution of reconstruction points should be

Example: Distribution of reconstruction points of 2-D Gaussian Random Variable:

**5. Lloyd Algorithm**

Given a set {X’(X)} of reconstruction points, the distortion is minimized by mapping a source random variable X to the representation X’(X) that is closest to it. The set of regions defined by this mapping is called a Voronoi or Dirichlet partition defined by the reconstruction points.

The reconstruction points should minimize the conditional expected distortion over their respective assignment regions.

Problem of the algorithm:

Complexity is exponential. It only gives a local minimum, which may not be the global minimum.

**6. Covering problem\Reverse test channel**

The size of is 2^{nH(X)}, the size of ball is 2^{nH(x|x’)}. So the number of balls is

The rate

For a Bernoulli(p) source with Hamming distortion, the rate can be simplified to

Proof:

To achieve the rate of H(p)-H(D), (X,X’) should have the joint distribution of the following:

Rate distortion function when p = 1/2,

For a N(0, σ^{2}) Gaussian source, the rate can be simplified to

And the rate distortion function is

**7. Simultaneous Description of Independent Gaussian Random Variables**

Consider the case of representing m independent (but not identically distributed) normal random sources X1, . . . , Xm, where Xi are ∼ N(0, σ_{i}^{ 2} ), with squared-error distortion.

We choose a constant λ and only describe those random variables with variances greater than λ. No bits are used to describe random variables with variance less than λ. (λ is the Lagrange multiplier)

**8. Converse to the Rate Distortion Theorem**

(a) follows from the fact that the range of fn is at most 2^{nR}

(b) follows from the fact that H(f_{n}(X^{n})|X^{n}) ≥ 0

(c) follows from the data-processing inequality

(d) follows from the fact that the X_{i} are independent

(e) follows from the chain rule for entropy

(f) follows from the fact that conditioning reduces entropy

(g) follows from the definition of the rate distortion function

(h) follows from the convexity of the rate distortion function and Jensen’s inequality

(i) follows from the definition of distortion for blocks of length n

(j) follows from the fact that R(D) is a nonincreasing function of D and E_{d}(X^{n}, X’^{n}) ≤ D

**9. Class of source sequences in the rate distortion theorem**

*Nontypical sequences *

The total probability of these sequences can be made less thanεby choosing n large enough. Since the individual distortion between any two sequences is bounded by d_{max}, the nontypical sequences can contribute at most d_{max} to the expected distortion.

*Typical sequences x*_{n}∈ A_{ε}^{∗}(n) such that there exists a codeword X’^{n}(w)*that is jointly typical with x*^{n}. In this case, since the source sequence and the codeword are strongly jointly typical, the continuity of the distortion as a function of the joint distribution ensures that they are also distortion typical. Hence, the distortion between these x_{n }and their codewords is bounded by D +εd_{max}, and since the total probability of these sequences is at most 1, these sequences contribute at most D +εd_{max} to the expected distortion.

*Typical sequences x*_{n}∈ A_{ε}^{∗}(n) *such that there does not exist a codeword X’*^{n} that is jointly typical with x^{n}. Let P_{e} be the total probability of these sequences. Since the distortion for any individual sequence is bounded by d_{max}, these sequences contribute at most P_{e}d_{max} to the expected distortion.

## Comments (1)

## lt010@... said

at 1:13 pm on Dec 9, 2011

good job.

You don't have permission to comment on this page.