If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Scribe Note 2-Feb

Page history last edited by Leonard Wong 15 years, 2 months ago

A digression

Claude Shannon is the father of information theory. He also made fundamental contributions to digital computer and circuit. Check out this video to know more about him.

Review

Last time, we proved the converse of the source coding theorem by ingenious use of several information-theoretic inequalities. The result is as follows. For any source Formula with alphabet Formula and any code with rate Formula , we have

Formula .

If Formula , the block error probability is bounded below by

Formula ,

which is positive for Formula sufficiently large. Hence, information loss is inevitable when Formula . What about Formula ? The next result shows that data compression with negligible risk of information loss is possible at any rate above Formula .

Achievability for source coding theorem

We first give an elementary proof of the theorem. This is a simplified version of section 3.2 of the textbook and does not give an explicit bound of the error probability. Then, we give another proof using the method of random coding.

Elementary proof

Let Formula and Formula . By Theorem 3.1.2(3)¹, Formula . Since Formula , there exists an injective map Formula from Formula to Formula .The code is defined as follows:

(1) Encoding: If Formula , we encode it as Formula . If Formula , we send an error message².

(2) Decoding:Given Formula , we decode it as Formula whenever this is defined. Else we declare a decoding error.

It is obvious that the probability of error is just Formula . By Theorem 3.1.2 (2), Formula . Hence the rate Formula is achievable. Formula

Proof using random coding

Random coding is a fundamental tool of information theory. Although not computationally feasible, the random code has a relatively simple description that makes mathematical analysis possible. The same idea is used in the proof of the channel coding theorem. We begin by describing the code. Let Formula and Formula .

(1) Encoding: A map Formula is chosen randomly³ from the set of all functions from Formula to Formula . That is, for each source sequence Formula , a binary sequence Formula is assigned. The assignment Formula is called the codebook, and is agreed a priori by both Alice (encoder) and Bob (decoder). Note that Formula , once chosen, is a deterministic function.

(2) Decoding: Given a string Formula , Bob examines the codebook to see if there exists a unique strongly typical⁴ Formula such that Formula . If so, he outputs Formula . Else he declares a decoding error.

For fixed Formula and Formula , a decoding error occurs if and only if one of the following happens: (i) There is no typical Formula that is mapped to Formula . (ii) There are two or more typical Formula that are mapped to Formula . We shall show that the probability of error is asymptotically negligible. In the following analysis, it is important to distinguish between two sources of randomness: The randomness of Formula and the randomness of Formula . The notation is somewhat confusing because the same notation Formula is used for both sources of randomness. See footnote 5 for further explanation of my notation.

Our first result bounds the probability of error (i).

Result 1 (b)(i): Formula . More precisely, we have the bound Formula , where Formula .

Proof: The first statement follows directly from the law of large numbers. From the definition of Formula , Formula implies that

Formula

where Formula is the type of Formula . By Pinsker's inequality, Formula . Hence, by Theorem 11.2.1,

Formula . Formula

Result 2 (b)(ii): For each fixed Formula , we have the bound Formula .

Proof: For each Formula , Formula . From (10.168), Formula . Using the union bound, we have

Formula

Hence Formula . Formula

Note that Formula is fixed in result 2. But our real interest is in estimating the probability of error (ii). Our next result is a "randomized" version of result 2. Note that the randomness here is due to both Formula and Formula .

Result 3 (b)(iii): Let Formula be the event Formula . Then .

Proof: Using result 2 and indepedence, we have

Hence Formula . Formula

We are ready to prove that the random code has a small probability of error.

Result 4 (b)(iv): Formula

Proof: Since Formula , we have

In the last inequality, we used the fact that if Formula and Formula , then Formula (see footnote 6).

We have NOT finished because we still allow the randomness of Formula . To finish off, we need to show that a particular Formula has small probability of error. Indeed, we can show that at least half of all possible Formula are "good".

Result 5 (b)(v): With probability Formula , a randomly chosen Formula (and the associated code) has probabiliy of error at most Formula .

Proof: Define Formula . Then

Formula .

Formula

By Markov's inequality, for any Formula ,

Formula .

Pick Formula . Then

Formula .

Hence, with probability Formula , a randomly chosen Formula has probabiliy of error at most Formula . Formula

Since each Formula has the same probability of being chosen, we have proved that at least half coding schemes (according to our definition) have small probabilities of error. This completes our proof of the source coding theorem. We will consider the more general problem of channel coding next time.

Footnotes

1. Recall that Formula (see Page 59 of textbook).

2. The case Formula is trivial. Otherwise, from the proof of Theorem 3.1.2(3), we have Formula . We can just pick any Formula not in the range of Formula .

3. Following standard usage, we assume the uniform distribution. Probabilistically, this is equivalent to choosing Formula elements from Formula randomly with replacement.

4. To get the exponential bound of result 1 using Pinsker's Inequality (Lemma 11.6.1), I have to use strong typicality. Formula is defined by

Formula .

(See Page 357 of textbook.) Without loss of generality, we can assume Formula .

5. I use Formula to denote the random variable and Formula to denote a particular value of the variable. If I need emphasize the randomness of Formula , I use the notation Formula . For example, in result 1, the randomness is due to Formula . In result 2, the randomness is due to Formula . We also assume that Formula and Formula are independent.

6. Consider the expression Formula , where both Formula and Formula are Formula . Note that

Formula .

Clearly Formula and Formula are Formula . Hence Formula . Hence

Formula

Comments (14)

I'm not clear about the proof of Result 5.
Could someone please explain the funcion g(f)? What is g(f)? Why g(f)?

The notes helps me a lot solving problem set 3.
Thank you!

g(f) is the probability of error if a partifular f is chosen as the encoding scheme.

You wrote, "Last time, we proved the converse of the source coding theorem by ingenious use of several information-theoretic inequalities."

Where can I find the related info? It doesn't seem to be contained in 22 Jan's notes? I was not available on that day so I don't know...

This scribe note is well-written. Steps are clear. Good job!

what' s the difference between Result 2 and 3?
Different events have the same probability?

In result 2, the sequence x_n is fixed and f is random. In result 3, both X_n and f are random. The statement does not imply that the two events have the same probability: (1) we only give the upper bounds; (2) the o(n) are not the same. It only says that the same asymptotic upper bound works for both probabilities.

result 4 doesnt make so much sense to me, the inequalities and equalities are hard to understand, for example, why 2^(-0.5xn) + 2^(-0.5xn) = 2^(-xn) (last equality) ? and also about the previous inequality, what is "k" exactly, and what is pinsker's inequality?

I added a footnote. Also, I discovered that I made a mistake in the proof. If my method works, there should be a 1/2 in the exponent. (Certainly this exponent can be improved.)

Pinsker's inequality is just Lemma 11.6.1 of the textbook. k is 1/2ln2.

thanks a lot

good to see late night information theory :)

I don't understand the proof of Result 5 at all. You write:
g(f) = \Pr\{X^n \neq \hat{X}^n | \tilde{f} = f\}

What is your \tilde{f}? What about Eg(\tilde{f})? Thanks.

See footnote 5. \tilde{f} means that the codebook f is chosen at random. In words, Eg(\tilde{f}) is the average probability of error where the randomness is due to f.

What I mean is, if you define g(f) = \Pr\{X^n \neq \hat{X}^n | \tilde{f} = f\} , then
g(\tilde{f}) = \Pr\{X^n \neq \hat{X}^n | \tilde{f} = \tilde{f}\} ??
Then Eg(\tilde{f}) = E{\Pr\{X^n \neq \hat{X}^n | \tilde{f} = \tilde{f}\}}?? Or is it not? Thanks again.

I see. The notation is bit confusing when we talk about g(\tilde{f}).

In words, g(\tilde{f}) means the probability of error if the codebook is random. So g(\tilde{f}) is a random variable while g(f) is just a real number. Can you suggest a better notation?

You don't have permission to comment on this page.

To join this workspace, request access.

Already have an account? Log in!

Loading…

Scribe Note 2-Feb

Scribe Note 2-Feb

Page Tools

Insert links

Comments (14)

MEI Yuchen said

Leonard Wong said

Mak said

Silas said

Tong said

Leonard Wong said

Lingxiao XIA said

Leonard Wong said

Lingxiao XIA said

sidjaggi said

Mak said

Leonard Wong said

Mak said

Leonard Wong said

Join this workspace

Navigator

SideBar

Related Links:

Recent Activity