• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

• You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View

# Scribe Note 5

last edited by 12 years, 10 months ago

# Q.2 Arithmetic Code

## (i)     Introduction

The descriptions and diagrams below illustrate the idea of Arithmetic code:

Let p(0) = p, p(1) = 1 - p,

 Length of input Set of all input Probability Function Distribution Function Intervals 1 {0, 1} 2 {00, 01, 10, 11} 3 {000, 001, 010, 011, 100, 101, 110, 111} 4 {0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111} (Just leave it blank, too trouble) (leave it blank)

In this diagram, each line represents corresponding length of input of arithmetic code. This diagram also illustrates the code width of a input.

Examples:

(a)     code width of 01 and 10 is p(1-p)

(b)     code width of 001, 010, 100 is

Furthermore, the sum of all code width is 1.

This diagram illustrate the distribution function of arithmetic code up to length of input = 3. Red line, purple line and blue line represent length of input equal to 1, 2 and 3 respectively. The indices at x-axis are sets of inputs, while the vertical distance is the code width.

Here comes to the problem, what are the codes?

If  is the input,  is the arithmetic binary-code, then

Example: Let's say p = 0.4, n = 3

(a)     Arithmetic code 0011: in binary, .

As input , so, the arithmetic code of 010 is 0011.

(b)     For input , we find out that ,

So, the arithmetic code for 001 is 0010.

## (ii)     Arithmetic Code is almost best code

Arithmetic code requires  bits, 1 bit is lost due to quantization, another 1 bit is lost due to the open interval at the end

Arithmetic code is almost best code, since Huffman achieves  bits and Huffman is best code.

## (iii)     Encoding

Denote  be the input, width of interval = ,

and start of interval (a) =

1. Evaluate the interval for the input with the above formulae.

2. Find a code that binary floating point value falling into the range of the interval.

## (iv)     Decoding

1. Evaluate the floating point value of code.

2. Binary search the floating point value on [0, 1) for n steps, if the floating point value is on the left, pick [a, (1-p)a+pb), otherwise pick [(1-p)a+pb, b).

## (v)     Advantages of Arithmetic Code

1. Arithmetic code is almost best code, with at most 2 bits more than the best code.

2. It is easy for Arithmetic code to handle non-binary cases by adding more parameters.

3. The design arithmetic code facilitates different parameters at different position of the code.

4. Coding procedure is incremental and can be used for any blocklength.

5. Arithmetic code is memoryless.

# Q.4 Differential Entropy h(x)

## (i)     Definitions

Natural definition of differential entropy

Joint differential entropy

Conditional differential entropy

Mutual information

Kullback-leibler Divergence (or relative different entropy)

## (ii)     Properties of Differential Entropy (Highlight)

1. h(X) can be positive, zero and even negative

Examples: f(x) = 0.5 on [0, 2] gives positive h(X), f(x) = 1 on [0, 1] gives zero h(X), f(x) = 2 on [0, 0.5] gives negative h(X).

2.

3. Chain rules for differential entropy:

4. Chain rules for mutual information:

5. Kullback-leibler divergence and mutual information of 2 random variables are non-negative

6. Data Processing Inequality:

## (iii)     Relation of Differential Entropy to Discrete Entropy

Show that:

Proof:

So,

Note: This equation is an informal proof for the property of h(p), can be positive, zero and negative. If  close to 0+, h(p) is highly negative.

## (iv)     Maximum value of differential Entropy

Maximize h(X) over all probability density function f(x) satisfying:

1.

2.

3. , i = 1, 2, where  is called moment constraints

Theorem: h(X) attains maximum when , which is also called Guassian distribution , through completing square.

Proof:

step 1: construct h(X) with Lagrange Multiplier

step 2: differential h(X) with respect to f(x)

step 3: put  to obtain , furthermore, perform first derivative test to show this is the maximum of h(X)