• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

• Dokkio Sidebar (from the makers of PBworks) is a Chrome extension that eliminates the need for endless browser tabs. You can search all your online stuff without any extra effort. And Sidebar was #1 on Product Hunt! Check out what people are saying by clicking here.

View

# Scribe Note 2-Feb

last edited by 13 years, 5 months ago

A digression

Claude Shannon is the father of information theory. He also made fundamental contributions to digital computer and circuit. Check out this video to know more about him.

Review

Last time, we proved the converse of the source coding theorem by ingenious use of several information-theoretic inequalities. The result is as follows. For any source  with alphabet  and any code with rate , we have

.

If , the block error probability is bounded below by

,

which is positive for  sufficiently large. Hence, information loss is inevitable when . What about ? The next result shows that data compression with negligible risk of information loss is possible at any rate above .

Achievability for source coding theorem

We first give an elementary proof of the theorem. This is a simplified version of section 3.2 of the textbook and does not give an explicit bound of the error probability. Then, we give another proof using the method of random coding.

Elementary proof

Let  and . By Theorem 3.1.2(3)1, . Since , there exists an injective map  from   to .The code is defined as follows:

(1) Encoding: If , we encode it as . If , we send an error message2.

(2) Decoding:Given , we decode it as   whenever this is defined. Else we declare a decoding error.

It is obvious that the probability of error is just . By Theorem 3.1.2 (2), . Hence the rate  is achievable.

Proof using random coding

Random coding is a fundamental tool of information theory. Although not computationally feasible, the random code has a relatively simple description that makes mathematical analysis possible. The same idea is used in the proof of the channel coding theorem. We begin by describing the code. Let  and .

(1) Encoding: A map   is chosen randomly3 from the set of all functions from  to . That is, for each source sequence , a binary sequence  is assigned. The assignment  is called the codebook, and is agreed a priori by both Alice (encoder) and Bob (decoder). Note that , once chosen, is a deterministic function.

(2) Decoding: Given a string , Bob examines the codebook to see if there exists a unique strongly typical4  such that . If so, he outputs . Else he declares a decoding error.

For fixed  and , a decoding error occurs if and only if one of the following happens: (i) There is no typical  that is mapped to . (ii) There are two or more typical  that are mapped to . We shall show that the probability of error is asymptotically negligible. In the following analysis, it is important to distinguish between two sources of randomness: The randomness of  and the randomness of . The notation is somewhat confusing because the same notation  is used for both sources of randomness. See footnote 5 for further explanation of my notation.

Our first result bounds the probability of error (i).

Result 1 (b)(i):  . More precisely, we have the bound , where .

Proof: The first statement follows directly from the law of large numbers. From the definition of ,  implies that

where  is the type of . By Pinsker's inequality, . Hence, by Theorem 11.2.1,

.

Result 2 (b)(ii): For each fixed , we have the bound .

Proof: For each , . From (10.168), . Using the union bound, we have

Hence

Note that  is fixed in result 2. But our real interest is in estimating the probability of error (ii). Our next result is a "randomized" version of result 2. Note that the randomness here is due to both  and .

Result 3 (b)(iii): Let  be the event . Then .

Proof: Using result 2 and indepedence, we have

Hence .

We are ready to prove that the random code has a small probability of error.

Result 4 (b)(iv):

Proof: Since , we have

In the last inequality, we used the fact that if  and , then (see footnote 6).

We have NOT finished because we still allow the randomness of . To finish off, we need to show that a particular  has small probability of error. Indeed, we can show that at least half of all possible  are "good".

Result 5 (b)(v)With probability , a randomly chosen  (and the associated code) has probabiliy of error at most .

Proof: Define . Then

.

By Markov's inequality, for any ,

.

Pick . Then

.

Hence, with probability , a randomly chosen  has probabiliy of error at most .

Since each  has the same probability of being chosen, we have proved that at least half coding schemes (according to our definition) have small probabilities of error. This completes our proof of the source coding theorem. We will consider the more general problem of channel coding next time.

Footnotes

1. Recall that  (see Page 59 of textbook).

2. The case  is trivial. Otherwise, from the proof of Theorem 3.1.2(3), we have . We can just pick any  not in the range of.

3.  Following standard usage, we assume the uniform distribution. Probabilistically, this is equivalent to choosing  elements from  randomly with replacement.

4. To get the exponential bound of result 1 using Pinsker's Inequality (Lemma 11.6.1), I have to use strong typicality.  is defined by

.

(See Page 357 of textbook.) Without loss of generality, we can assume .

5. I use  to denote the random variable and  to denote a particular value of the variable. If I need emphasize the randomness of , I use the notation . For example, in result 1, the randomness is due to . In result 2, the randomness is due to . We also assume that  and  are independent.

6. Consider the expression , where both  and  are . Note that

.

Clearly   and  are . Hence . Hence

#### MEI Yuchen said

at 1:30 am on Feb 8, 2009

I'm not clear about the proof of Result 5.
Could someone please explain the funcion g(f)? What is g(f)? Why g(f)?

The notes helps me a lot solving problem set 3.
Thank you!

#### Leonard Wong said

at 11:01 pm on Feb 8, 2009

g(f) is the probability of error if a partifular f is chosen as the encoding scheme.

#### Mak said

at 10:29 am on Feb 9, 2009

You wrote, "Last time, we proved the converse of the source coding theorem by ingenious use of several information-theoretic inequalities."

Where can I find the related info? It doesn't seem to be contained in 22 Jan's notes? I was not available on that day so I don't know...

#### Silas said

at 11:08 am on Feb 12, 2009

This scribe note is well-written. Steps are clear. Good job!

#### Tong said

at 2:53 pm on Feb 19, 2009

what' s the difference between Result 2 and 3?
Different events have the same probability?

#### Leonard Wong said

at 10:19 pm on Feb 19, 2009

In result 2, the sequence x_n is fixed and f is random. In result 3, both X_n and f are random. The statement does not imply that the two events have the same probability: (1) we only give the upper bounds; (2) the o(n) are not the same. It only says that the same asymptotic upper bound works for both probabilities.

#### Lingxiao XIA said

at 9:32 pm on Feb 22, 2009

result 4 doesnt make so much sense to me, the inequalities and equalities are hard to understand, for example, why 2^(-0.5xn) + 2^(-0.5xn) = 2^(-xn) (last equality) ? and also about the previous inequality, what is "k" exactly, and what is pinsker's inequality?

#### Leonard Wong said

at 11:28 pm on Feb 22, 2009

I added a footnote. Also, I discovered that I made a mistake in the proof. If my method works, there should be a 1/2 in the exponent. (Certainly this exponent can be improved.)

Pinsker's inequality is just Lemma 11.6.1 of the textbook. k is 1/2ln2.

#### Lingxiao XIA said

at 12:51 am on Feb 23, 2009

thanks a lot

#### sidjaggi said

at 1:59 am on Feb 23, 2009

good to see late night information theory :)

#### Mak said

at 6:11 pm on Mar 13, 2009

I don't understand the proof of Result 5 at all. You write:
g(f) = \Pr\{X^n \neq \hat{X}^n | \tilde{f} = f\}

#### Leonard Wong said

at 12:47 am on Mar 14, 2009

See footnote 5. \tilde{f} means that the codebook f is chosen at random. In words, Eg(\tilde{f}) is the average probability of error where the randomness is due to f.

#### Mak said

at 10:46 am on Mar 16, 2009

What I mean is, if you define g(f) = \Pr\{X^n \neq \hat{X}^n | \tilde{f} = f\} , then
g(\tilde{f}) = \Pr\{X^n \neq \hat{X}^n | \tilde{f} = \tilde{f}\} ??
Then Eg(\tilde{f}) = E{\Pr\{X^n \neq \hat{X}^n | \tilde{f} = \tilde{f}\}}?? Or is it not? Thanks again.

#### Leonard Wong said

at 11:14 pm on Mar 16, 2009

I see. The notation is bit confusing when we talk about g(\tilde{f}).

In words, g(\tilde{f}) means the probability of error if the codebook is random. So g(\tilde{f}) is a random variable while g(f) is just a real number. Can you suggest a better notation?