Introduction to Information/"Worstcase compression" (Scribe: XIA Lingxiao)
Claude Elwood Shannon, the father of Information Theory, once said "Information is the resolution of uncertainty". Things that cannot be predicted, i.e., what we don’t know about the world, usually draw most of our attention. In this class we shall not discuss the semantic meaning of information; rather, we formulate an operational definition. That is, we try to model information mathematically to model randomness, rather than pondering the philosophical implications of this randomness. More precisely, we examine the statistical and probabilistic properties of "information". Our basic example, one we shall return to many times, is, What happens when a (biased) coin is tossed many times? For instance, in a blackandwhite picture, a dark pixel may be denoted by a 1 and a light pixel by a 0. Statistically, the probability of a 1 should be significantly higher than that of a 0  perhaps this observation can be used to compress the image (i.e., transfer fewer bits than the number of pixels in the original image). If we are willing to tolerate some distortion in the image, we may be able to compress the image even more. These examples correspond to two formulations of data compression problems, both of which we shall examine in this first course in information theory.
1.Lossless source coding (Exact reconstruction of the source's data is required).
2. Ratedistortion theory (Given a budget of maximal reconstruction distortion, the goal is to minimize the number of bits in the compressed data.)
Another problem of interest might be that of transmitting information over a noisy channel. For example, a sequence of bits is transmitted wirelessly, and if the receiver decodes a bit to the same value that the transmitter had transmitted, then a 0 is said to have been added to the signal, or else a 1 is.Depending on how noisy the ether is, careful design of coding schemes might allow an entire message to be transmitted with low error, even though individual bits might have a significant probability of decoding error. The set of problems corresponding to this problem formulations is called
3. Channel coding (Transmission of information over a noisy channel)
These are but three examples of a vast array of possible problems relating to information compression, storage, and tranmission, that one could conceivably be interested in. In this course we examine the theory and applications surrounding these three problems, and leave other problem formulations for more advanced classes.
Some knowledge of basic probability theory is assumed in this class  this is summarized here.
Problem Set 1
A certain fair coin is tossed n times. Each time it is tossed, the result is independent of the previous toss. This is an example of an i.i.d. binary random variable.
Q1: (Worstcase "compression")
(a) If the coin described above is tossed n times, how many distinct lengthn sequences of coin tosses are possible?
(b) Given N possible outcomes of a random variable, how many bits (elements from {0,1}) are required to assign a unique lengthN bitsequence to each possible outcome?
(the ceiling function of the binary logarithm of N)
(c) Describe a method based on binary trees, to assign a lengthN bitsequence to an outcome, based on a binary tree, and a corresponding method to reconstruct the cointoss sequence from the bitsequence.
Algorithm:
Coin tossing


Bit sequence

Heads


0

Tails


1

Property: easy to construct/recover, onetoone
(d) If the encoding alphabet is ternary (eg: {0,1,2}) instead of binary, how does your answer to (b) change? How about your answer to (c), if ternary trees are allowed? What if you were tossing a sixsided dice instead of a coin? How do your answers change? What is the timecomplexity of your assignment schemes?
Time complexity: O(n), more details in the next class.....
"I don't get the names of things, I just know how often they happen. " [Lingxiao: I don't know where I heard this, but seems like this quote belongs somewhere in this note]
Homework: check out Stirling's Approximation.
Comments (11)
Leonard Wong said
at 7:30 pm on Jan 11, 2009
I think the first sentence can be modified.
Original: There’re many ways of talking about information, but basically it’s more about new things that cannot be predicted.
In information theory, we try to model information mathematically. In doing so we neglect many features of information, such as meaning and language. Here, we focus on the statistical properties of information, and say that information (in information theory) is the decrease of uncertainty. Meaning and language are very basic and important. But we do not consider them here. Hence, I think that the word "basically" is not appropriate.
My recommendation: There are many ways of talking about information. Information theory attempts to model information mathematically by focusing on a particular aspect of information. Claude Shannon, the father of information theory, said that information is the resolution of uncertainty.
Lingxiao XIA said
at 9:26 pm on Jan 11, 2009
im amazed by how long you guys' comments can be, even when the thing is still unfinished, im dying trying to recall any new things to add when the thing was still three lines long, anyway it's a scribe note, which means im only writing down what is said in the class rather than creating new things, thanks for your comment though
sidjaggi said
at 11:28 pm on Jan 11, 2009
:)
Don't feel restricted into saying just what was talked about in class  if there are better ways of saying things, or even other things we didn't talk about that are relevant/interesting, feel free to dive in.
You're right, though, about "unfinished" business  the goal of giving you a week to edit it before other people commented, was to allow you some time to think/collect your thoughts before people started breathing down your neck. Starting tomorrow, guys, feel free to jump in with comments, and Lingxia, over the next week you'll be the judge of which of those comments are relevant to include/edit, and in general how to handle them.
Lingxiao XIA said
at 11:38 pm on Jan 11, 2009
Lingxiao, but never mind
Lingxiao XIA said
at 12:20 am on Jan 12, 2009
"There are many ways of talking about information. Information theory attempts to model information mathematically by focusing on a particular aspect of information. Claude Shannon, the father of information theory, said that information is the resolution of uncertainty."
a little bit confused about "model information mathematically". doesn't information denote things that we already know? in my opinion, we dont have to model it anymore, we use it, to model the future, or uncertainty. so information theory should be about how to use what we already know to predict things, mathematically.
i'm sorry, i didnt mean to be so picky about choosing of words, just a little bit confused
Leonard Wong said
at 10:49 pm on Jan 13, 2009
Thanks a lot! Your notes helped me a lot to recall things mentioned in class.
The question is "What is (mathematical) modelling". I don't know a lot about the philosophy of science/math, but I can share a little about my understanding. My definition is "We model something mathematically by creating a mathematical system that intends to represent it in some ways." Once the mathematical system is established, it is independent of the phenomena that we wish to describe because it exists in the form of axioms, definitions and theorems. We derive consequences of the mathematical system, and wish to link them to the phenomena. This linkage is not automatic, and emprical experiments must be carried out to see whether the model's predictions are consistent with nature.
Consider probability theory. Probability is a branch of mathematics that models uncertainty. The (weak/strong) law of large number is a THEOREM derived from the axioms of probability. We know that the relative frequency of coin tosses tends to 1/2 as n goes to infinity. This is NOT a logical consequene of the law of large number, but only indicates that the binomial distribution is a reasonable approximation of the problem at hand. If the relative frequency does not stablize, we can no longer use the binomial distribution to model coin tosses. But the theorem of law of large number is STILL RIGHT in the mathematical system. See the introduction of William Feller's classic for a better discussion of the interplay of math and nature.
sidjaggi said
at 11:08 pm on Jan 13, 2009
Hey Lingxiao  for the relatively small amount of math you have on the page, it shouldn't be too hard to use the LaTeX plugin (use the insert plugin tab on the top right) to enter the math more neatly. For example, the equation F(x) = P(X <= x) becomes F(x) = \Pr(X \leq x) in the plugin.
Also, some conventions  a random variable will always be boldface and capitalized, a "normal" variable with be neither. With this notation, the equation becomes F(x) = \Pr(\mathbf{X} \leq x).
Hope that helped.
Mak said
at 2:14 am on Jan 16, 2009
I think the notes have mentioned all the main points discussed in the first lecture.
>> Similar rules also apply to the independence of multiple variables. It is interesting to note here that the independence
>> of each two of the multiple variables doesn’t lead to the independence of all variables.
I remember that a particular example that illustrates this fact has been mentioned in class. It would be great if you could include that.
sidjaggi said
at 7:13 pm on Jan 18, 2009
Lingxiao  edited your scribe notes. Do tell me if you agree with the reworking of the material (I realize not everything is as I said it in class, but perhaps this is a better way of saying it, and perhaps I SHOULD have said it like this :)
Lingxiao XIA said
at 9:23 pm on Jan 18, 2009
wow, sorry i didnt modify this article to add the example before you did, i was lazy this weekend....sorry again.. holiday time...is close...
this looks like a brand new article now, everything looks more professional, great success!
Mak said
at 6:56 pm on Jan 19, 2009
I just turned the 宋体 into Times New Roman. I think it looks more comfortable :)
You don't have permission to comment on this page.