• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

• You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View

# Scribe_Note_7

last edited by 12 years, 8 months ago

Scribe Notes 7 (by Luk Hon Tung)

For typing reasons, X’ is same as  in this scribe notes.

1. Introduction

Source coding theory and channel coding theory. Suppose you have some information X, by source coding theory we know that we can compress it to no less than H(X) if we want it to be recovered exactly. But if allow some distortion on decoding, how much we can further compress on the information?

2. Quantization\Rate-Distortion theory

Suppose you have some information Xn, it is encoded to get f(Xn), usually the compression rate H(f(Xn))<H(Xn). After that, f(Xn) is decoded to get X’n, usually Xn≠X’n because of over compression. The distortion of the information is a function of Xn and X’n, denoted by d(Xn, X’n). The goal is to minimize the compression rate R = H(f(Xn)) such that the expected distortion E Xn [d(Xn, X’n)] < D, where D is a constant.

d(.) is additive and d(Xn, X’n) is defined as

.

3. 1 bit quantization example

let X N(0, σ2), and assume a squared-error distortion measure. In this case we wish to find the function X’(X) such that X’ takeson at most 2R values and minimizes E(X – X’(X))2. If we are given onebit to represent X, it is clear that the bit should distinguish whether ornot X > 0. To minimize squared error, each reproduced symbol should

be the conditional mean of its region. Thus,

For 1 bit case, the reconstruction points are at -0.79 and 0.79 for standard normal distribution.

4. Problem Set 7 Q1.

(a)

Scheme: Encode 1, 2 to 0 and encode 3, 4 to 1. Decode 0 to 0 and decode 1 to 2.

The expected distortion E[distortion] = 0.5.

(b) Suppose (X1,X2)∈{1,2,3,4}2 are two independent symbols.

Distribution of reconstruction points should be

Example: Distribution of reconstruction points of 2-D Gaussian Random Variable:

5. Lloyd Algorithm

Given a set {X’(X)} of reconstruction points, the distortion is minimized by mapping a source random variable X to the representation X’(X) that is closest to it. The set of regions defined by this mapping is called a Voronoi or Dirichlet partition defined by the reconstruction points.

The reconstruction points should minimize the conditional expected distortion over their respective assignment regions.

Problem of the algorithm:

Complexity is exponential. It only gives a local minimum, which may not be the global minimum.

6. Covering problem\Reverse test channel

The size of  is 2nH(X), the size of ball is 2nH(x|x’). So the number of balls is

The rate

For a Bernoulli(p) source with Hamming distortion, the rate can be simplified to

Proof:

To achieve the rate of H(p)-H(D), (X,X’) should have the joint distribution of the following:

Rate distortion function when p = 1/2,

For a N(0, σ2) Gaussian source, the rate can be simplified to

And the rate distortion function is

7. Simultaneous Description of Independent Gaussian Random Variables

Consider the case of representing m independent (but not identically distributed) normal random sources X1, . . . , Xm, where Xi are N(0, σi 2 ), with squared-error distortion.

We choose a constant λ and only describe those random variables with variances greater than λ. No bits are used to describe random variables with variance less than λ. (λ is the Lagrange multiplier)

8. Converse to the Rate Distortion Theorem

(a) follows from the fact that the range of fn is at most 2nR

(b) follows from the fact that H(fn(Xn)|Xn) ≥ 0

(c) follows from the data-processing inequality

(d) follows from the fact that the Xi are independent

(e) follows from the chain rule for entropy

(f) follows from the fact that conditioning reduces entropy

(g) follows from the definition of the rate distortion function

(h) follows from the convexity of the rate distortion function and Jensen’s inequality

(i) follows from the definition of distortion for blocks of length n

(j) follows from the fact that R(D) is a nonincreasing function of D and Ed(Xn, X’n) ≤ D

9. Class of source sequences in the rate distortion theorem

Nontypical sequences

The total probability of these sequences can be made less thanεby choosing n large enough. Since the individual distortion between any two sequences is bounded by dmax, the nontypical sequences can contribute at most dmax to the expected distortion.

Typical sequences xn Aε(n) such that there exists a codeword X’n(w)that is jointly typical with xn. In this case, since the source sequence and the codeword are strongly jointly typical, the continuity of the distortion as a function of the joint distribution ensures that they are also distortion typical. Hence, the distortion between these xn and their codewords is bounded by D +εdmax, and since the total probability of these sequences is at most 1, these sequences contribute at most D +εdmax to the expected distortion.

Typical sequences xn Aε(n) such that there does not exist a codeword X’n that is jointly typical with xn. Let Pe be the total probability of these sequences. Since the distortion for any individual sequence is bounded by dmax, these sequences contribute at most Pedmax to the expected distortion.

### Comments (1)

#### lt010@... said

at 1:13 pm on Dec 9, 2011

good job.

You don't have permission to comment on this page.