If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Scribe Notes 2

Page history last edited by Silas 15 years, 2 months ago

Intuition behind "Entropy":

Joint Entropy: Total uncertainty of 2 or more random variables (r.v.)

Conditional Entropy: Given the knowledge of some r.v., how much we still don't know about others

Mutual Information: Given the knowledge of some r.v., how much we do know about others

Divergence: A special quantity used to indicate the difference between two distributions

Notice: divergence is not any kind of distance from mathematical view, because generally Formula .

Formal Definitions:

Entropy:

Formula

Joint Entropy:

Formula

Mutual Information:

Formula

Conditional Entropy:

Formula

Divergence:

Formula

Relationship between H and I (presented in venn diagram):

Plugin error: That plugin is not available.

copied from text book: <Elements of information theory (2nd edition)>

Some Equalities:

(1)

Formula

Pf:

Formula

By similar manner, we can easily prove the whole equation.

(2) Chain Rule

Formula

Pf(1):

Formula

Pf(2):

Formula //Apply log function to both sides

Formula //Calculate the expectation of both sides

Formula

Pf:

Formula

Some Inequalities:

(1) Definition of convexity, strict convexity, concavity, strict concavity.

Convexity: f(x) is said to be convex on (a,b) if

Formula Formula

Strict Convexity: f(x) is said to be strict convex on (a,b) if

Formula Formula

Definitions for concavity and strict concavity are similar to the above two, just change the inequality signs.

Geometric meaning for convexity: The line segment between Formula and Formula is above the curve of f(x) between x1 and x2.

Generalize the idea: Formula (Jensen's inequality)

For strictly convex function, Formula if and only if there is only one x in the domain.

(2)

Formula

Pf:

Formula

Formula // log is a concave function

Formula

Therefore, Formula "=" if and only if p and q have the same distribution

Moreover, Formula "=" if and only if X and Y are independent

(3)

Formula (Conditioning reduces entropy)

Pf:

Formula

Since Formula , we directly draw the conclusion, Formula . "=" if and only if X and Y are independent

(4)

Let Formula be a Markov Chain (i.e. given Y, X and Z are independent). Then Formula .(Data processing inequality)

Pf:

Formula

By definition of Markov Chain, Formula .

Also, Formula .

So, Formula as we desired. "=" if and only if Formula i.e. Formula forms a Markov Chain. Q.E.D

How to understand this inequality?

Consider that X is the source of information, after some processing we get Y. Finally we want to reconstruct X, and we get Z.

The inequality simply tells us that the relationship between X and Y is closer than that between X and Z. (Since Y is obtained directly from X)

Comments (5)

i don really understand the markov chain for random variables

it's a markov chain for one r.v. called distribution?

Dear Lingxiao,

I think you should view Markov chain as a joint distribution of 3 random variables. It's because when the definition of Markov chain relates to independence of random variables, the joint distribution must first be specified.

Best Regards,
Silas

Good answer, Silas.

Here's another question for everyone -- in everything we've done so far, we implicitly assume that it's a 1-Markov process -- i.e., if X <-> Y <-> Z, then Z and X are conditionally independent given Y.

How would you, first of all, define 2-Markov process? How about k-Markov processes? How would the formal definitions change? How about the data processing inequality?

Not that we have any use for such definitions in the near future (or perhaps at all in this course), but an interesting mental exercise to think about :)

The proof for D(p||q) >= 0 in this scribe note is not rigorous enough because q/p is undefined when p is zero. The proof of this theorem in p.28 of Thomas Cover's book writes in a more rigorous way.

By the way, I think this scribe note is well organised and is easy to read.

Thank you for your positive review, Silas.

About the proof of non-negativity of KL divergence, the proof in the text book is more rigurous than mine.
But we can view it from another angle, if p(x)=0 for some x, the original definition for D(p||q) is not "well defined" since log(0) is undefined.
I think we have the convention that p*log(anything)=0 if p=0.
Again, thank you for your comment!

You don't have permission to comment on this page.

To join this workspace, request access.

Already have an account? Log in!

Loading…

Scribe Notes 2

Scribe Notes 2

Page Tools

Insert links

Comments (5)

Lingxiao XIA said

Silas said

sidjaggi said

Silas said

MEI Yuchen said

Join this workspace

Navigator

SideBar

Related Links:

Recent Activity