**Quantization/Rate-Distortion Theory**

**Q1.** (*Toy example*) (a) (*Scalar quantization*) A source generates an element from the set {0,1,2,3} with uniform probability. The source encoder needs to describe the source to the decoder, but is allowed to use only a single bit . The decoder's task is to reconstruct the source as the symbol **(which must also be in the set {0.1.2.3}) while minimizing the expected ***distortion*. The distortion between two symbols here is defined as the mean-squared error (MSE) **between** and **. Propose a scheme with a distortion that is as small as possible.**

**(b) (****Vector quantization**) Now, suppose the source generates two elements i.i.d. from the same source. The encoder encodes this as two bits , and the decoder decodes these as . **Propose a scheme with an MSE distortion** **that is as small as possible.**

**Q2.** (*Distortion measures/**Lloyd's algorithm*) (a) Give examples of "common" distortion measures. What fundamental property do they have in common?

(b) Given that a set of points needs to be reproduced with minimum average distortion,

(i) If the set of reproduction points is fixed, how would you determine the optimal encoding rule (i.e., the encoding rule that determines the optimal encoding procedure)? How does this partition the set of all possible inputs? (We call these partitions Voronoi cells.) What is the implicit assumption on the distortion measure here?

(ii) If the set of Voronoi cells are fixed, how would you determine the set of optimal reproduction points? What is the implicit assumption on the source alphabet here?

(iii) Consider starting with an arbitrary initial set of reproduction points , and iteratively applying (i) and (ii) above. Can you say something about the convergence properties (convergence in what?) of the corresponding iterations? Does this suggest an algorithm (the famous Lloyd algorithm) for (scalar/vector) quantization? (What would be the final step you'd have to do, after you have a stable output?)

(iv) (*Two problems with Lloyd's algorithm*) (a) Suppose it is required to reproduce a source using *R* bits per sample. How does the complexity of the algorithm scale with the number of samples *n*?

(b) Can you construct an example of a source distribution and an initial choice of reproduction points such that Lloyd's algorithm does NOT converge to the optimal set of reconstruction points?

**Q3.** (*Rate-distortion intuition/duality*) (a) Given a probability distribution *p(x)* over an i.i.d. source, and a convex distortion measure d(.,.), how would you define the rate-distortion function of the source?

(b) (*Intuition* *about the rate-distortion function/duality with channel coding*) Consider a (virtual) "test-channel" (reverse-channel) that noisily transmits the symbols to the symbols with the probability distribution (note that this is NOT a typo -- we do NOT mean noisily transmits the symbols to the symbols with the probability distribution ). In this case, can you draw an analogy with Q3 of PS3? Besides the reverse direction, is there another important distinction? (*Hint: Covering versus packing*).

(c) Given (b), can you guess the form of the rate-distortion function for the source in (a)?

**Q4.** (*Proof of rate-distortion theorem/separation*) (a) Using strong typicality, prove the achievability of the function in Q3(c). (What encoding/decoding scheme would you use?)

(b) How would you prove the converse to the rate-distortion theorem? Can you also prove that there is separation between rate-distortion coding and channel coding?

**Q5.** (*Particular rate-distortion problems*) (a) For a Bernoulli(p) source with Hamming distortion, derive a lower bound on R(D). Can you prove that this is also an upper bound? (Hint: What does the corresponding test-channel look like?)

(b) Do the same as in (a) for a Gaussian source with variance .

(c) How about if you have k independent Gaussian sources with variances respectively ?

**Q6.** (*Blahut-Arimoto algorithm for computing rate-distortion functions*) (a) Let p(x) be the input source distribution, be the conditional probability distribution that achieves the rate-distortion function, be the corresponding output distribution, be the distortion measure, D be the distortion, and be a dummy variable. Using the method of Lagrange multipliers, can you prove that the rate-distortion function of the source satisfies the equations

and , and the equation . Why are these equations hard to solve?

(b) If A and B are two convex sets, and *d(.,.)* is a "reasonable" distance measure, can you think of a scheme to find points *a* in A and *b* in B such that *d(a,b)* is as small as possible.

(c) Let p(x,y) be a given probability distribution (that induces the distributions p(x), p(y) and p(y|x)), and let D(.||.) be the Kullback-Leibler divergence. Prove that

.

(d) Use (c) to prove that

(e) Reformulate (d) as for some convex sets A and B.

(f) How can you use (a) -- (e) to set up an iterative procedure (similar in flavour to Lloyd's algorithm) to compute the rate-distortion function of the source? What is the role of ?

(g) What changes in the above if you wish to use a similar procedure to compute the capacity of a channel?

## Comments (0)

You don't have permission to comment on this page.