Intuition behind "Entropy":
Joint Entropy: Total uncertainty of 2 or more random variables (r.v.)
Conditional Entropy: Given the knowledge of some r.v., how much we still don't know about others
Mutual Information: Given the knowledge of some r.v., how much we do know about others
Divergence: A special quantity used to indicate the difference between two distributions
Notice: divergence is not any kind of distance from mathematical view, because generally .
Formal Definitions:
Entropy:
Joint Entropy:
Mutual Information:
Conditional Entropy:
Divergence:
Relationship between H and I (presented in venn diagram):
Plugin error: That plugin is not available.
copied from text book: <Elements of information theory (2nd edition)>
Some Equalities:
(1)
Pf:
By similar manner, we can easily prove the whole equation.
(2) Chain Rule
Pf(1):
Pf(2):
//Apply log function to both sides
//Calculate the expectation of both sides
Pf:
Some Inequalities:
(1) Definition of convexity, strict convexity, concavity, strict concavity.
Convexity: f(x) is said to be convex on (a,b) if
Strict Convexity: f(x) is said to be strict convex on (a,b) if
Definitions for concavity and strict concavity are similar to the above two, just change the inequality signs.
Geometric meaning for convexity: The line segment between and is above the curve of f(x) between x1 and x2.
Generalize the idea: (Jensen's inequality)
For strictly convex function, if and only if there is only one x in the domain.
(2)
Pf:
// log is a concave function
Therefore, "=" if and only if p and q have the same distribution
Moreover, "=" if and only if X and Y are independent
(3)
(Conditioning reduces entropy)
Pf:
Since , we directly draw the conclusion, . "=" if and only if X and Y are independent
(4)
Let be a Markov Chain (i.e. given Y, X and Z are independent). Then .(Data processing inequality)
Pf:
By definition of Markov Chain, .
Also, .
So, as we desired. "=" if and only if i.e. forms a Markov Chain. Q.E.D
How to understand this inequality?
Consider that X is the source of information, after some processing we get Y. Finally we want to reconstruct X, and we get Z.
The inequality simply tells us that the relationship between X and Y is closer than that between X and Z. (Since Y is obtained directly from X)
Comments (5)
Lingxiao XIA said
at 6:08 am on Feb 2, 2009
i don really understand the markov chain for random variables
it's a markov chain for one r.v. called distribution?
Silas said
at 9:12 am on Feb 2, 2009
Dear Lingxiao,
I think you should view Markov chain as a joint distribution of 3 random variables. It's because when the definition of Markov chain relates to independence of random variables, the joint distribution must first be specified.
Best Regards,
Silas
sidjaggi said
at 10:42 am on Feb 2, 2009
Good answer, Silas.
Here's another question for everyone -- in everything we've done so far, we implicitly assume that it's a 1-Markov process -- i.e., if X <-> Y <-> Z, then Z and X are conditionally independent given Y.
How would you, first of all, define 2-Markov process? How about k-Markov processes? How would the formal definitions change? How about the data processing inequality?
Not that we have any use for such definitions in the near future (or perhaps at all in this course), but an interesting mental exercise to think about :)
Silas said
at 10:31 am on Feb 4, 2009
The proof for D(p||q) >= 0 in this scribe note is not rigorous enough because q/p is undefined when p is zero. The proof of this theorem in p.28 of Thomas Cover's book writes in a more rigorous way.
By the way, I think this scribe note is well organised and is easy to read.
MEI Yuchen said
at 3:15 pm on Feb 4, 2009
Thank you for your positive review, Silas.
About the proof of non-negativity of KL divergence, the proof in the text book is more rigurous than mine.
But we can view it from another angle, if p(x)=0 for some x, the original definition for D(p||q) is not "well defined" since log(0) is undefined.
I think we have the convention that p*log(anything)=0 if p=0.
Again, thank you for your comment!
You don't have permission to comment on this page.