| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Scribe_Note_1_2

Page history last edited by siyao 12 years, 5 months ago

                      A simple code scheme   (Scribe: Siyao)

 

 

1. Overview

 

In this lecture, we are interested in designing a code (a pair of encoder and decoder) for a particular source. (Let’s ignore noise in following discussion, since this is simply a compression problem.)

     

       (A few words about the code model, we use encoder to encode the message m from source into codeword n then send codeword to decoder, after that decoder decodes the codeword into m′. We care about the efficiency (the output length of encoder) and errors, so that we want rate Formula and error Formula to be as small as possible. It’s simple to see, there is a rate-error tradeoff. If we want no error, just let n = m so that R|n| = 1 and no saving of efficiency. If we want high efficiency, in extreme case,  R|n| = 0, then no information is left after encoding so that ε|n| should be very large.)

   

    We first analyze the source and show that the source has very good concentration property. Specifically, the source has almost all probability on a ”small” set S of events. Taking advantage of the good property, we will not lose too much by just considering events in the set S and ignoring other events. Note that if we just need to distinguish elements in S, log |S| bits for each element are enough. So that we can design a simply code in following way.

   

     The encoder maps one-to-one from S to the set  Formula and the decoder is the inverse function of encoder (the encoder will ignore any event not in S, and hence they correspond to error events). The output length of encoder is log |S| which is smaller than message length and the error Formula is very small as S defined.

     

     It’s already a very good code. Actually, we can improve the code by changing a little bit to get a variant-length code. For variant-length code, we consider the expectation of encoder output length. The idea is that instead of ignoring events not in S, we encode them also. Since the probability of events not in S is very small, letting output length of encoder for them be as large as their original length won’t increase the expectation of the output length by very much. At the same time, error ε will be reduced into 0 when we encode all events instead of S. So a new code come up as following:

 

     For events in S, the encoder maps one-to-one from S to the set Formula  and add an extra 0 to the beginning of encoding. For events not in S, the encoding will be the input itself, and an extra 1 is added to the beginning of encoding. The extra bit is for distinguishing events in S and events not in S. It’s simple to give the decoder so I skip the description.

   

    We've almost finished the design of the code. For completeness, we should care about the computational complexity of encoder/decoder, i.e, whether encoder/decoder can work with limit time and limit space. More specifically, can we give an efficient encoder/decoder algorithm? Fortunately, we can give such an algorithm.

   

    After all, it’s time to come up with the final question.  Is it possible for any other scheme to "substantially" improve on your scheme?

 

2. Preliminaries

 

In following discussion log is base 2.

 

Definition 1. A p-biased distribution X over Formula , denoted as Formula , is a distribution such that for Formula , Formula, where |x| is the number of 1s in x. In other words, every bit is 1 with probability p, and 0 with probability 1 − p.

 

Source distributions which we consider are over Formula, and where p < 1/2. It’s straightforward to show that the expectation of |x| is np and that the variance of |x| is np(1 − p).

 

Definition 2. T(k, n) is the type-class of all sequences over Formula with |x| = k (that is, k ones, and n-k zeroes). 

The size of type-class T(k,n) is Formula .

 

Definition 3. The entropy of a distribution X over Formula is defined as following:

 

                                                Formula 

 

Entropy is a measure of randomness. The larger entropy is, the more random distribution is. By the definition of entropy, we can know uniform distribution has the maximal entropy. In the case Formula, a bit with 1/2 to be 1 and 0 has the maximal entropy.  Let's also see the  Taylor expansion of H(p+x) to verify that a random bit has maximal entropy.

                                             Formula

 

   where

Formula

Formula

Formula

and  

Formula

When we consider p=1/2,  we have 

FormulaFormula

Hence, we can know 

Formula

that's

Formula


 

Lemma 4.  Formula  , where Formula 

 

Proof. By Stirling’s approximation  Formula, we have

                  Formula

                  Formula 

                  Formula

 

Denition 5. For Formula, let  Formula be the binary relative entropy dened as   Formula  with Formula.

 

It’s a measure of the (entropy) distance between two distribution. Considering the Taylor expansion of the function Formula up to the second derivative we can obtain  Formula (see Pinsker's inequality).

     

3. Property of source

 

                            Formula

 

So that the probability of observing type-class T (k, n) is

 

                           Formula 

 

By Lemma 4, we obtain

 

 

 Formula     

 

 Formula 

 

 Formula

 

 Formula 

 

 

The last inequality is because Formula . Intuitively, Formula tells us if   k/n is far from p, the probability of observing type-class T (k, n) will be very small. Concretely, let’s try to bound the probability of observing an element from an ϵ-strongly atypical set, i.e., T (k, n)  with Formula as follows:

                                     

Formula

 

By the union bound, the probability of observing an element from any ϵ-strongly atypical set can be bounded by

                                    

Formula

 

In other word, the source has most of its probability concentrated in ϵ-strongly typical sets. Let the union of all ϵ-strongly typical sets be S. The size of S can be upper bounded by

                                Formula

 

where c is constant and p+ϵ<=1/2. And |S| can be lower bound by

                                 Formula

 

This tells us that the size of S is roughly 2nH(p+ϵ) and |S| ≪ 2n . In a word, the source has most probability on a set S of size 2nH(p+ϵ) , specically, the union of all ϵ-strongly typical type-classes.

 

4. Efficient encode/decode algorithm

       

     

    The encoding algorithm is as following: On input x over Formula ,

                 1.   If  Formula, output 1x.

                 2.  Else, let Formula, b=0.   

                     

                       From i = 1 to n, we do following for each i,

                         

                        1)  If xi=0, continue.

                        2)   Else,   Formula and b++.

                  

                       After that, convert a into a binary string a' of length nH(p+ϵ) and output 0a'.

         

          The decoding algorithm is as following: On input x over Formula or  Formula

                 

                  1.   If x0 = 1, output x1x2 .....xn.

                  2.   Else, let k = n(p-ϵ  ) and do following until Formula : Formula

                     

                       Then, let  b=0. From i=1 to n, we do following for each i,

 

                         1)   If Formula, then Formula, and output 1;

 

                         2)   Else, output 0 and continue.

 

         First we define the order for elements in S. String with less 1s are said to be smaller than other strings with more 1s. Strings of the same number of 1 will follow the outcome of comparing them by their value in base 10. The encoding algorithm gives the rank for elements of S, i.e., counting how many string in S is smaller than it. For elements not in S, we just encode them as themselves. The extra bit in  beginning of encoding is to identify whether input belongs to S or not

        If we calculate Formula in advance and store them in memory, encoding and decoding algorithm only takes O(n) time. If we calculate all Formula locally, we need at most O(n^2) time by following code, where c[i][j]=Formula

       for(i=0;i<=n;++i) c[i][0]=1;

       for(i=1;i<=n;++i)

         for(j=1;j<=i;++j)

              c[i][j]=c[i-1][j]+i*c[i-1][j-1]

 

 The expected encoding length is bounded by

 

                              Formula

 

5. Can we improve?

 

         Note that each element in the  ϵ-strongly typical set has "nearly equal" probability of being observed.  By symmetry, we should give them almost equal length.

  Since Formula bits are necessary to distinguish Formula nearly equal elements. So that we can't  substantially improve our code.  Even we relax our requirement to only require error to be constant, because thatFormula  bits are still necessary.

 

6.  What changes for general alphabets?

        Consider general alphabet  Formula instead of {0,1}.  We denote the outcome for one toss be X and  Formula where 0<=i<=m and Formula.  Type class will be T(k0,...,km;n) which means i happens for ki times in the outcome for n tosses.  The the probability of a particular sequence in Type class occur T(k0,...,km;n) will be  Formula and type class T(k0,...,km;n) has Formula possible sequences. So we observe type class T(k0,...,km;n) with probability Formula.

        Next, we should argue Formula and Formula where D is a generalized binary relative entropy. And show that  D(a||b) satisfied  Formula by  the Taylor expansion for multiple-variable functions. 

         After that, we can show  the typical set should be class T(nq0,...,nqm;n) and design a variant code with expected length nH(p0,...,pm).

 

 

 

 

 

 

 

 




Comments (3)

sidjaggi said

at 2:44 am on Oct 11, 2011

Great job, Siyao! Corrected a few typos (mostl grammatical :) in your write-up.

Comments:
1. The way you define \doteq in class (and indeed the way I defined it the first time, though not the second time) is not the standard way in textbooks, where the definition is usually \lim_{n\rightarrow \infty} \log(A_n\B_n) \rightarrow 0. Sequences satisfying this definition will satisfy your definition, but not necessarily the other way round.

2. Pinsker's inequality -- the inequality between K-L divergence and the L_1 distance is originally due to Pinsker -- perhaps put in a link to the wiki site for Pinsker's inequality?

3. The Taylor series expansion of H(p+\epsilon) would be good to show, to relate it to H(p)

4. What changes in your discussions for general alphabets? (Multinomials coefficients, Taylor series expansions, etc.). Be good to have a short section at the end describing, at a high level these changes.

5. The order complexity of computing {n}\choose{k} -- your description is admirably succinct :), but not all readers have a background in algorithms. Would you mind specifying it in some more detail?

Thanks again for a sterling job!

Chin Ho Lee said

at 1:43 pm on Oct 18, 2011

I think the definition of \doteq is \lim_{n\rightarrow \infty} 1/n \log(A_n\B_n) \rightarrow 0 in Cover and Thomas. (there is an extra 1/n)

siyao said

at 12:16 pm on Oct 12, 2011

Thanks for your comments! I will fix them by this Friday.

You don't have permission to comment on this page.