Some elementary combinatorics/probability theory leading to a simple coding theorem
A certain fair coin is tossed n times. Each time it is tossed, the result is independent of the previous toss. This is an example of an i.i.d. binary random variable.
Q1: (Worst-case "compression") (a) If the coin described above is tossed n times, how many distinct length-n sequences of coin tosses are possible?
(b) Given N possible outcomes of a random variable, how many bits (elements from {0,1}) are required to assign a unique length-N bit-sequence to each possible outcome?
(c) Describe a method based on binary trees, to assign a length-N bit-sequence to an outcome, based on a binary tree, and a corresponding method to reconstruct the coin-toss sequence from the bit-sequence.
(d) If the encoding alphabet is ternary (eg: {0,1,2}) instead of binary, how does your answer to (b) change? How about your answer to (c), if ternary trees are allowed? What if you were tossing a six-sided dice instead of a coin? How do your answers change? What is the time-complexity of your assignment schemes?
Key concepts in Question 1: Random variables, independence, input alphabet, output alphabet, encoder/encoding, decoder/decoding, trees, prefix-free codes, rounding-off, space complexity, computational complexity (notation)
A second coin is biased with probability p, i.e., it comes up Heads with probability p and Tails with probability 1-p. Each time it is tossed, the result is independent of the previous toss. This is an example of an i.i.d. binary random variable, also called a Bernoulli(p) random variable.
Q2: (Average-case compression) (a) How do your answers to Q1 differ in this case?
(b) What is the probability of observing a particular coin-toss sequence with Heads?
(c) What is the size of the type-class T(k,n) of all coin-toss sequences with exactly k Heads? What is therefore the probability of observing the type-class T(k,n)?
(d) Use Stirling's approximation to show that , where H(p) is the binary entropy function . Use this to simplify (c). In what sense are the approximations true? (Check out the link to this derivation of Stirling's approximation to find the answer, and try deriving it yourself to get some practice).
(e) For what value of k is your answer in (c) maximized?
(f) For any type-class T(k,n) that is -strongly atypical, i.e., for which , can you show that the probability of observing any element of T(k,n) is "small"? (A full proof is forthcoming in PS3)
(g) How many type classes are there? Use the union bound and your answer from (f) to obtain an upper bound of observing an element from an -strongly atypical set.
(h) (Achievability) Does the answer of (g) suggest a "better" encoding scheme for biased coins? What is the expected encoding length (average rate) of your scheme? How does this compare with your answers in Q1? (To obtain a better "feel" for your answer, plot the binary entropy function using some mathematical software, such as SCILAB, which is like MATLAB, but is freeware).
(i) (Converse) Is it possible for any other scheme to "substantially" improve on your scheme in (h)? Can your give a reason for your assertion? (Hint: Note that each element in the -strongly typical set has "nearly equal" probability of being observed.)
(j) (Errors) Suppose we relax our requirement, to say that we only require that an -fraction of the observed sequences must be reconstructed correctly the the decoder, and we don't care about the others. Do your answers in (h) and (i) change substantially? Justify your assertions.
(k) (Binary Source Coding Theorem) Summarize your conclusions from solving this question in the form of a Binary Source Coding Theorem.
(l) (Source Coding Theorem) Can you state (and prove?!) a general source coding theorem (for arbitrary finite input and output alphabets, with arbitrary distributions over them?) (Again, a full proof is forthcoming in PS3 -- just a guess will suffice here)
Key concepts in Question 2: Blocklength, Probability of an i.i.d. sequence, type class, size of type class, probability of a type class, number of type classes, Stirling's approximation, binary entropy function, asymptotic equality in exponent, strong typicality (properties), strong atypicality (properties), average rate, achievability, converse, errors.
Q3: (Computational complexity) Can you describe an algorithm, with running time that is polynomial in n, that implements the scheme of Q2(h)?
Comments/Discussion
Sid: This is where you can ask questions from each other/the instructor/tutor about this particular problem set. Please remember to enter your name in boldface before your comment, as I just did.
Any material that we don't cover in class, you should attempt at home.
Note that some text is underlined. This corresponds to exercises that you really should attempt at home.
Comments (27)
Poppin Ke said
at 3:07 pm on Jan 6, 2009
i want to ask the solution in (d) of Q1, the time complexity of the assignment scheme if a six-sided dice is tossed and binary encoding scheme is used?
if i use the truth table to map the outcome to bit-sequence, the time complexity is O(2^N), where N is length of bit sequence or O(N), where N is the length of outcome sequence. the reason is that we need to enumerate all the possible outcomes and assign the bit sequence to it given the number of possible outcomes.
for example,
if a coin is tossed three times and ternary encoding sequence is used, we get
OUTCOMES BIT-SEQUENCE
HHH 00
HHT 01
HTH 02
HTT 10
THH 11
THT 12
TTH 20
TTT 21
the time complexity is O(3^2) given that two bit is used
sidjaggi said
at 3:19 pm on Jan 6, 2009
Good question. Note that you say "if i use the truth table" -- can you think of a more efficient encoding scheme?
Hint: What if humans had six fingers instead of ten? How would we count then?
Anyone else wants to join in the discussion?
Poppin Ke said
at 4:08 pm on Jan 6, 2009
what you mean is to use senary numerical system to encode the outcome sequence in tossing a dice, but we just use 0 or 1 to represent the information in transmission, it seems senary representation cannot find any application in information engineering.
sidjaggi said
at 4:28 pm on Jan 6, 2009
well, (a) the question didn't specify applications :) (b) don't be too sure about not having applications. all sorts of alphabets can make sense, depending on context. for example, people are talking about making flash memories that have more than one level (so can store more than just a 0 or a 1), to increase storage densities. Another example -- passwords typically accept keyboard inputs from large alphabets -- it's not binary there -- to attack this one needs to think of other alphabets. Can you think of more examples of other alphabets that might be useful in other situations?
MEI Yuchen said
at 10:49 pm on Jan 6, 2009
For Q1(d), I think the principle behind is the transformation between number systems such as binary, ternary, senary, decimal and so on.
If we toss a coin to form a sequence, then the coin-toss sequence is equivalent to a binary number according to (c) part.
For example, HHHTTHTHTHTT is equivalent to 111001010100
If ternary system is allowed, then we can simply transform the above binary number into a ternary number, 1020221, which may be the answer.
Now suppose the coin is replaced by a dice and we still use ternary system. Our job becomes transforming a senary number into a ternary number.
For example, the dice-toss sequence is 13432651(in order to be consistent we can first decrease each digit by 1. [02321540], the length of the sequence, 8, should be given), then the ternary representation should be 20002010120. To recover the the dice-toss sequence, we first get 2321540 from transforming 20002010120 into senary. Then add a 0 in the front to make it a length-8 sequence and finally, increase each digit by 1.
I'm not quite clear about the concept of time-complexity, as well as the method used to compute it.
Thanks.
Cho Yiu Ng said
at 11:22 pm on Jan 6, 2009
Roughly speaking, time complexity refers to the relationship/scaling of the computation time with respect to certain parameters (e.g. the number of coin tosses). You may refer to the following link:
http://en.wikipedia.org/wiki/Big_O_notation
sidjaggi said
at 12:42 am on Jan 7, 2009
To add to Michael's comment, think of how many "basic" operations (eg, add, subtract, multiply, divide -- these are standard for most computer ALUs) are required to carry out the calculations. In particular, how many such operations are required to transform from one base to another? What is therefore the complexity of your proposed algorithm?
Poppin Ke said
at 11:42 am on Jan 7, 2009
For example, HHHTTHTHTHTT is the sequence, if the ternary system is used, what is the time complexity?
firstly, the time-complexity of converting from HHHTTHTHTHTT to 111001010100 is the sequence length, 12. then converting binary sequence to ternary sequence, in this case, we firstly convert binary sequence into a integer, M = 1*2^11 + 1*2^10 + 1*2^9 + 1*2^6 + 1*2^4 + 1*2^2, so the complexity is roughly 3*sequence length = 3*12 since we see three basic operations, and then convert integer into a ternary sequence, 1020221, M=1*3^6+2*3^6+2*3^2+2*3^1+2*3^0, the complexity is 3*ternary sequence length = 3*7. is that right? so overall, the time-complexity is O(N) where N is the outcome sequence length
Or we can think that the time-complexity of this conversion is constant.
sidjaggi said
at 1:51 pm on Jan 7, 2009
Good intuition, Poppin. Here's a question, though -- both the binary and ternary sequences are already integers, are they not, albeit in different bases? So perhaps you were thinking of converting into base 10 as an intermediate step? Is this necessary?
Also, about counting operations, I guess we have to be more careful than i'd said earlier -- a typical implementation of a computer allows FIXED size add/multiply/... in constant time. If the numbers are large, then typically it has to break it up into smaller steps. Therefore, computing 2^11, from your example above, could take as many as 11 steps (multiply by 2 in each step) -- however, can you think of a more efficient implementation? One that could compute 2^n with O(log^2(n)) multiplications? Or even more efficient?!
Here's a "simple" fact you might want to think about -- multiplying two numbers, each with n bits, in the "usual" implementation (the one you were probably taught in school) takes time O(n^2).
There are, however, much cleverer implementations for multiplying two n-bit numbers (for instance, by using a type of Fourier transform!) -- the fastest known algorithm has time complexity O(n ln(n) ln(ln(n))).
See http://en.wikipedia.org/wiki/Multiplication_algorithm if you're interested in the details (well beyond the scope of this class :)
Clearly time-complexity can be subtler than at first sight. We'll discuss this, at least briefly, during class. The reason I'm stressing time-complexity is that it's an important parameter in judging whether an algorithm is actually "practical" (there could be many others -- can you think of some?) -- however, it's often skipped over in many theory courses, especially information theory. And to be honest, most of the algorithms we'll cover in the later part of this course will have horrible time-complexity.
Mak said
at 4:32 pm on Jan 7, 2009
Hi,
Are we considered to have finished discussing Q1 in class? :-)
I'm thinking about part d). I also get the same idea as MEI Yuchen. Since his post is way up from here, let me rewrite the idea:
In part c), we have obtained for each outcome a bit-sequence. This sequence can of course be considered to be a binary number. So, if the encoding alphabet is {0, 1, 2}, we may get our desired sequence by performing base conversion on this binary number to obtain a base-3 number.
What I don't understand is: how is this related to a ternary tree? For the binary tree case, we move to the left subtree if we get a Head, and to the right if we get a Tail. If we have a ternary tree, what are we supposed to do with it? Can anyone share his ideas?
Thanks.
Cho Yiu Ng said
at 5:10 pm on Jan 7, 2009
1. Besides binary-to-ternary conversion, what are the other ways to code the coin tosses?
2. If you are given a ternary tree, how many branches are from a node? Then, in this case, what can a branch mean in the encoding process?
Mak said
at 6:30 pm on Jan 7, 2009
Thanks tutor.
But isn't what you said above just a rephrasing of my question without any new information? :-p
I believe binary-to-ternary conversion is the most straightforward way to get our desired ternary sequence, which is of optimal length (fewest characters). And the conversion does not require the whole binary number at every step. For example, result of the first two tosses alone is enough for us to compute the first ternary character in our desired sequence. Please tell me if I am thinking in a completely wrong direction. Maybe there are some ways to tackle this problem in which the use of ternary tree is obvious but I don't know. I need some hints... Thank you.
sidjaggi said
at 6:46 pm on Jan 7, 2009
Thanks for your comments, Mak.
To answer your first question from your first comment, yes, Q1 is (mostly :) done with -- we'll go on to Q2, with a minor detour on complexity issues.
To answer your question about ternary trees, when I was phrasing it, I was only thinking about the "end" result (i.e., properties of the tree such as -- how many leaves does it have, what's the average depth, and so on). You raise an interesting question about whether there's some intuition to be gained from directly trying to use the tree structure to come up with an encoding process, instead of going via the arithmetic approach. I don't know of any good answer :)
Your suggestion of doing it in an "online" manner -- i.e, using a few bits at a time to compute a partial output, is a good idea -- can you translate your arithmetic intuition into the required graphical intuition?
A very similar idea might be very useful in solving Q3, if you want to look ahead and try to solve it (if you've already tried solving Q2 :)
Btw, all you guys should try your hand at Q2. I admit it's non-trivial. But it's fun! :)
MEI Yuchen said
at 11:17 am on Jan 8, 2009
where can I find the method to directly convert a binary number to a ternary number?
I am not able to work it out by myself.
sidjaggi said
at 1:39 pm on Jan 8, 2009
How would you convert a decimal number into a ternary number?
There's really nothing special about the decimal system. One could argue that in fact it's a pretty inconvenient choice for a base...
Of course we're really getting way off the topic we're "supposed" to cover, but it's alright -- all knowledge is good knowledge, and mental exercise is always good :)
See you in class tomorrow, and remember that the venue is changed (see the Front Page for details).
Cho Yiu Ng said
at 4:48 pm on Jan 8, 2009
An hour ago, a student asked me about question 2d. If you're attempting this question, here's a warning question for you: what are the assumptions behind the Stirling approximations?
It's very important when you try to apply a theorem.
Mak said
at 6:12 pm on Jan 9, 2009
Hi. I'm responsible for scribing the notes for today's class. I would like know if I should put down the answers for Q2 discussed today. I ask this because I'm afraid that someone will think that this encourages copying of answers :-).
***************************************************************************
I wrote before about doing the base conversion in an "online" manner. I wrote:
"And the conversion does not require the whole binary number at every step. For example, result of the first two tosses alone is enough for us to compute the first ternary character in our desired sequence."
Now I see that the last sentence is incorrect. Consider n = 3:
HHH 000 00
HHT 001 01
HTH 010 02
HTT 011 10
THH 100 11
THT 101 12
TTH 110 20
TTT 111 21
We can conclude that:
if first two tosses are HH, then the first ternary character = 0
if first two tosses are TH, then the first ternary character = 1
if first two tosses are TT, then the first ternary character = 2
But if the first two tosses are HT, we don't know what the first ternary character is until we know the third toss.
sidjaggi said
at 6:34 pm on Jan 9, 2009
Good question, Yan Kei, about answers to the questions.
Let's set the expectation that each student will try to turn in his answers in his/her (well, actually, since currently this class comprises only of males, let's just stick with "his :) own words. Presumably these will be at least somewhat different than what the scribe has put down in his notes. In fact, if someone has something interesting to add to the scribe's notes, they're encouraged to write than in in the comments (at any time, even when the scribe is still editing).
The goal is to encourage learning. Discussion (between students, with instructor, tutor, ...) aids this. The wiki is a form of discussion.
Copying is defined as writing down without understanding. Do not copy. How will we judge copying? Only if it's blatantly obvious, else given the benefit of the doubt.
What's to stop you from copying "cleverly"? (a) Shame :) (b) You're here to learn (c) It's not that hard to learn. Ask, don't copy. Share email addresses/phone numbers/contact info with each other. Form study/discussion groups.
Thanks for correcting your previous comment as well -- the error had escaped me.
sidjaggi said
at 6:34 pm on Jan 9, 2009
Good question, Yan Kei, about answers to the questions.
Let's set the expectation that each student will try to turn in his answers in his/her (well, actually, since currently this class comprises only of males, let's just stick with "his :) own words. Presumably these will be at least somewhat different than what the scribe has put down in his notes. In fact, if someone has something interesting to add to the scribe's notes, they're encouraged to write than in in the comments (at any time, even when the scribe is still editing).
The goal is to encourage learning. Discussion (between students, with instructor, tutor, ...) aids this. The wiki is a form of discussion.
Copying is defined as writing down without understanding. Do not copy. How will we judge copying? Only if it's blatantly obvious, else given the benefit of the doubt.
What's to stop you from copying "cleverly"? (a) Shame :) (b) You're here to learn (c) It's not that hard to learn. Ask, don't copy. Share email addresses/phone numbers/contact info with each other. Form study/discussion groups.
Thanks for correcting your previous comment as well -- the error had escaped me.
Tong said
at 6:21 pm on Jan 15, 2009
Class on Jan 26th
postponed to Feb 5th
???(i think no class on Jan 26th and 30th)
I suggest that we postpone the Class Jan 23rd to Feb 5th
sidjaggi said
at 12:46 am on Jan 16, 2009
My mistake in Announcements 3. and 4. You're right -- there's no class on the 26th anyway. On the 30th, technically the university is open, but I'm guessing people will still be on vacation, hence postponing to the 5th. Also, I'll be around the week of the 2nd, so I'll be teaching the 2nd, 5th (postponed class), and 6th, but the next week is when Michael will substitute for me.
Apologies for the confusion -- I misread the calendar.
Lingxiao XIA said
at 1:55 am on Jan 16, 2009
hahaha, nice to know that not only i am still up, im having trouble with problem set 1, sidddddd, i think im gonna have trouble hand it in tmr
Lingxiao XIA said
at 2:02 am on Jan 16, 2009
plus, i don think there's no class on jan 30th also, the chinese new year break is from "24-31 Sat-Sat" on CUHK handbook. http://rgsntl.rgs.cuhk.edu.hk/aqs_prd_applx/Public/Handbook/document.aspx?id=2&lang=en, but i don mind an extra class on Feb 5
Lingxiao XIA said
at 2:02 am on Jan 16, 2009
plus, i don think there's class on jan 30th also, the chinese new year break is from "24-31 Sat-Sat" on CUHK handbook. http://rgsntl.rgs.cuhk.edu.hk/aqs_prd_applx/Public/Handbook/document.aspx?id=2&lang=en, but i don mind an extra class on Feb 5
Lingxiao XIA said
at 2:04 am on Jan 16, 2009
plus, i think there's no class on jan 30th either[sorry for my crappy english], the chinese new year break is from "24-31 Sat-Sat" on CUHK handbook. http://rgsntl.rgs.cuhk.edu.hk/aqs_prd_applx/Public/Handbook/document.aspx?id=2&lang=en, but i don mind an extra class on Feb 5
Cho Yiu Ng said
at 11:36 am on Jan 20, 2009
I'm now grading problem set 1. Up to this point, I have some suggestions for some of you. Please state the answers more clearly. Sometimes, I don't even know which part of the question you are answering. This is also helpful for you in your research and your future work. Presentation skills DO matter!
ZhanLei_Jacky said
at 1:29 pm on Sep 20, 2011
As for the problem 1. d), "What if you were tossing a six-sided dice instead of a coin? How do your answers change? What is the time-complexity of your assignment schemes?" is this assumption based on binary tree or ternary tree?
You don't have permission to comment on this page.