If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Scribe_Note_2 (redirected from Scribe_Note_3)

Page history last edited by sidjaggi 12 years, 6 months ago

Entropy: Definitions and Properties (Scribe by Zirui)

1. Overview

In the previous lecture, we have already introduced a simple coding theorem. We have seen that much of the definitions and properties are relevant with the entropy function Formula . In this lecture, fundamental quantities and tools will be introduced and rigorously defined, which is a prerequisite for further development of information theory. Besides entropy, we will introduce relative entropy, joint entropy, conditional entropy and mutual information. After defining entropy and mutual information, we establish chain rules, the nonnegativity of mutual information, the data-processing inequality, Jensen's inequality and Fano's inequality.

Keywords: entropy, relative entropy, mutual information, chain rules.

2. Definitions

2.1 Entropy

Def. 1 The entropy Formula of a discrete random variable Formula is defined by Formula .

In the previous lecture, we have seen that this entropy has many properties that agree with the intuitive notion of what a measure of information should be. More precisely, it is the leading term of the exponent of the strongly typical set's size. Also, entropy is a function only of the distribution of Formula -- it describes the uncertainty of that random variable.

2.2 Joint Entropy

Def. 2 The joint entropy Formula of a pair of discrete random variables Formula with a joint distribution Formula is defined as Formula .

Intuitively, the jointly Formula -strongly typical set could be defined as Formula where Formula denotes Formula .

Claim 1 Formula .

The proof of this claim is the same as in the previous lecture when proving Formula , and we will give a similar proof when we prove the claim about conditional entropy below.

2.3 Conditional Entropy

Def. 3 If Formula , the conditional entropy Formula is defined as Formula .

Intuitively, given any Formula , define Formula as the strongly conditional typical set.

Claim 2 Formula .

Proof: For simplicity, we only consider the condition that X Formula and Formula can only take two values in the sample space, say 0 and 1. Assume that the marginal distribution of Formula is Formula . Then the number of strongly typical class should be

Formula .

Using Stirling formula and the proposition that Formula , we reach the following equation

Formula

From the definition we know that the last term above is just Formula , hence we completed the proof.

2.4 Relative Entropy

Def.4 The relative entropy or Kullback-Leibler distance between two probability mass functions Formula and Formula is defined as Formula .

Intuitively, the relative entropy is a measure of the distance between two distributions. However, its not true distance because it disobeys the symmetry property. Later we will show that relative entropy is nonnegative.

2.5 Mutual Information

Def. 5 Consider two random variables Formula and Formula with a joint probability mass function Formula and marginal probability mass functions Formula and Formula . The mutual information Formula is the relative entropy between the joint distribution and the product distribution Formula

Formula .

Mutual information is a measure of the amount of information that one random variable contains about another random variable. It is the reduction in the uncertainty of one random variable due to the knowledge of the other. We will have deeper understanding after establishing the following properties.

3. Properties and Chain Rules.

3.1 Information Theory Equalities

(a) Equalities of mutual information

Formula

All the above equalities can be proved directly using the definition of mutual information and entropy.

Also, there is a Venn diagram that can express the relationship of mutual information and entropy accurately. See Figure 1 below.

(b)Chain Rule for Entropy

Let Formula be drawn according to Formula . Then Formula .

If we consider the two r.v case, the proof is easy using the definition. For three or more r.v case, using mathematical induction can help us complete the proof.

(c)Chain Rule for Mutual Information

Define the conditional mutual information as Formula , then we have Formula

Proof: Formula

Formula

3.2 Information Theory Inequalities.

(a), , .

These three inequalities are nothing but follows the property of log function.

(b),,,.

The proof of the above inequalities needs an important lemma called Jensen's Inequality.

Lemma 1(Jensen's inequality) If Formula is a convex function and Formula is a random variable, then we have Formula Moreover, if Formula is strictly convex, the equality implies that Formula with probability 1, i.e. Formula is a constant.

Proof: for a two-mass-point distribution, the inequality becomes Formula , which follows directly from the definition of convex functions. Suppose that the theorem is true for distributions with Formula mass points. Then writing Formula for Formula , we have

Formula

Formula ,

hence the proof is completed.

First we prove Formula . Let Formula be the support set of Formula . Then

Formula

Formula .

Secondly, we prove the nonnegativity of mutual information. Actually, the inequality is trival after we see that Formula .

Also, as Formula , Formula holds, and Formula , we have Formula .

The last inequality needs to be prove is Formula . Denote Formula be the uniform probability mass function over Formula , and let Formula be the probability mass function for Formula . Then Formula ,

Scribe_Note_2 (redirected from Scribe_Note_3)

Scribe_Note_2

Page Tools

Insert links

Comments (0)

Join this workspace

Navigator

SideBar

Related Links:

Recent Activity