• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Buried in cloud files? We can help with Spring cleaning!

    Whether you use Dropbox, Drive, G-Suite, OneDrive, Gmail, Slack, Notion, or all of the above, Dokkio will organize your files for you. Try Dokkio (from the makers of PBworks) for free today.

  • Dokkio (from the makers of PBworks) was #2 on Product Hunt! Check out what people are saying by clicking here.


Scribe Note 7-1

Page history last edited by lt010@... 10 years, 5 months ago


a)We have defined the information rate distortion function as 



where the minimization is over all conditional distributions  Formula  for which the joint distribution Formula  satisfies the expected distortion constraint. This is a standard minimization problem of a convex function over the convex set of all Formula satisfying Formula for all Formula and Formula

We can use the method of Lagrange multipliers to find the solution.  We set up the functional

Differentiating with respect to Formula , setting Formula, we obtain

Since Formula , we must have Formula  or

for all Formula. We can combine these Formula equations with the equation defining the distortion and calculate λ and the  Formula unknowns Formula. We can  find the optimum conditional distribution.

The above analysis is valid if  Formula  is unconstrained. The inequality condition Formula is covered by the Kuhn–Tucker conditions, which reduce to

Substituting the value of the derivative, we obtain the conditions for the minimum as

This characterization will enable us to check if a given Formula is a solution to the minimization problem. However, it is not easy to solve for the optimum output distribution from these equations. In the next section we provide an iterative algorithm for computing the rate distortion function. This algorithm is a special case of a general algorithm for finding the minimum relative entropy distance between two convex sets of probability densities.


Consider the following problem: Given two convex sets A and B in Rn as shown in following figure, we would like to find the minimum distance between them:


where d(a, b) is the Euclidean distance between a and b. An intuitively obvious algorithm to do this would be to take any point x ∈ A, and find the y ∈ B that is closest to it. Then fix this y and find the closest point in A. Repeating this process, it is clear that the distance decreases at each stage. Does it converge to the minimum distance between the two sets? Csiszar and Tusnady have shown that if the sets are convex and if the distance satisfies certain conditions, this alternating minimization algorithm will indeed converge to the minimum. In particular, if the sets are sets of probability distributions and the distance measure is the relative entropy, the algorithm does converge to the minimum relative entropy between the two sets of distributions.

To apply this algorithm to rate distortion, we have to rewrite the rate distortion function as a minimum of the relative entropy between two sets.


let  Formula, we have


Use result from c), we have 


If A is the set of all joint distributions with marginal p(x) that satisfy the distortion constraints and if B the set of product distributions Formula with arbitrary Formula, we can write Formula


We now apply the process of alternating minimization, which is called the Blahut–Arimoto algorithm in this case. We begin with a choice of λ and an initial output distribution Formula and calculate the Formula that minimizes the mutual information subject to the distortion constraint. We can use the method of Lagrange multipliers for this minimization to obtain


For this conditional distribution Formula, we calculate the output distribution  Formula that minimizes the mutual information, which is  Formula.  

We use this output distribution as the starting point of the next iteration. Each step in the iteration, minimizing over q(·|·) and then minimizing over r(·). Thus, there is a limit, and the limit has been shown to be R(D)

The value of D and R(D) depends on λ. Thus, choosing λ appropriately sweeps out the R(D) curve.


A similar procedure can be applied to the calculation of channel capacity. Again we rewrite the definition of channel capacity. It is a double maximization problem.







Comments (0)

You don't have permission to comment on this page.