Complexity notation/Stirling's Approximation and Strong Typicality (Scribe: MAK Yan Kei)
I. Complexity of an algorithm
We use the following noatation to measure the efficiency of an algorithm. Let the problem input have n bits. Let the number of steps that the algorithm takes to solve the problem be denoted . By a “step” here we mean some simple operation such as adding two fixed-length (say, 8-bit) integers. Adding two arbitrarily long (say, 1000-bit) integers, therefore, would not be a single step.
Usually, we only care about the order of magnitude behaviour of when n is large. To talk about the asymptotic complexity of an algorithm, we introduce the following notations (See this page from Wikipedia):
l (Big O): There exist constants such that , i.e., f grows no faster than g.
2 (Big Omega): There exist such that , i.e., f grows no slower than g.
3 (Theta) : and , i.e., f grows roughly at the same rate as g.
4 (Little O): For all c, there exists N such that , i.e., f grows strictly slower than g.
5 (Little Omega): For all c, there exists N such that i.e., f grows strictly faster than g.
Take the byte-wise addition of two n-bit numbers as an example. We need roughly steps (addition byte by byte takes n/8 steps, another n/8 steps for the "carry" bit). Let be the number of steps required. Then, , , and . Look at the comments to the first scribe notes for more examples.
The use of such notations is of course not restricted to complexity analysis. We can think of this notation as giving a way to compare the growth rate of two functions. Note that these estimates need not hold when n is small.
II. Average-case Vs. Worst-case Compression
In Lecture 1, we have derived an encoding algorithm with optimal worst-case behaviour (i.e. the number of bits in the longest encoded sequence is minimized). In what follows we aim to find an algorithm that works better on average.
Intuition: Consider a biased coin that gives Heads much more frequently than Tails. In this case, coin-toss sequences like TTTTTT are less likely to occur then those with many heads, like HTHHHT. It seems common-sensical to assign more common coin-toss sequences to shorter encoded bit-sequences, so as to minize the average number of bits. However, it is still not clear what the optimal relationship between the "commonness" of a coin-toss sequence and the encoding length should be.
Situation
We have a biased coin, which comes up Heads with probability p (0 < p < 1), and Tails with probability 1-p. The result of each toss is an example of a Bernoulli(p) random variable. If we toss it n times, the probability of observing a particular coin-toss sequence with k Heads () is .
Now define the type-class T(k,n) as the set of all sequences of n tosses with exactly k Heads. Since all coin-toss sequences with k Heads are equally likely to occur, and there are totally of them, we conclude that the size of T(k,n) is .
Binary entropy function and Stirling’s approximation
We define binary entropy function as follows: (all logarithms here and henceforth are binary unless otherwise specified).
We also need Stirling’s approximation for factorials: (this approximation holds in the sense that the ratio of the left and right hand sides tends to 1 for large n)
With these, we deduce the approximation as follows.
Note that the RHS
Now, the LHS
Hence, . Again, the approximation holds in the sense that for large n the ratio of the LHS and RHS tends to 1.
When the probability is maximized
With the approximation above, we have:
Denote
Then, the above expression can be written simply as:
Recall that is a probability, so it is at most 1. Note that if , and this happens if , i.e. (we examine the properties of the D(.) function in future classes -- for now, suffice it to say that it equals 0 if and only if k=np). Hence, the probability is “maximized” if (assuming this is an integer, if not, we round to the nearest integer), which is the expected value of Heads. This is not surprising, since is theoretically the most likely outcome. Of course, is not exactly equal to 1 (since our calculations involved order of magnitude estimates). The largest indeed it is very possible that we get slightly varied results (e.g. Heads) in our experiment.
Probability of -strongly atypical sequences
We say that a type-class T(k,n) is -strongly atypical if
To proceed, we will need a result that would be proved later (PS2):
Bound on D(.||.): Suppose then
From this, we get
The last inequality holds since for all atypical type-classes.
Next, we use the union bound:
For events
The type classes ranges from T(0,n) to T(n,n), so there are totally n+1 of them. There are at least integers (and at most ) in the range . Hence, we get the following upper bound:
This should help us find an encoding scheme with a better average complexity, as we shall discuss in the next class.
Comments (12)
Mak said
at 2:05 am on Jan 16, 2009
Whew... it is really troublesome typing all those formulae...... Let me have some sleep now, will proofread again in tomorrow morning. :- )
sidjaggi said
at 8:25 am on Jan 16, 2009
I did warn you to get an earlier start :)
Very good job for a first draft... let's see the comments as they drift in...
Lingxiao XIA said
at 9:53 am on Jan 16, 2009
that's really impressive, look at all the formulae...i feel sorry for you man
sidjaggi said
at 11:54 am on Jan 16, 2009
Heh -- and nobody believed me when I said that writing the first scribe notes would be easiest :)
sidjaggi said
at 8:43 pm on Jan 18, 2009
Hi Mak -- some comments.
1. Feel free to hyperlink to web resources that might help clarify terms/notation/etc (wikipedia is fine. see for example the edited version of Scribe notes 1).
2. You were right about the mistake in the definition of the K-L divergence. Look up the correct definition from a reference, and fix.
3. Your last statement is "note that what we write above is not very rigorous. Quite a number of the steps need more careful justification." In what senses are the steps not rigorous? If this statement (of non-rigour) is true, understanding the cases when the equations we derive are not correct gives more intuition -- on the other hand, if you can actually make them more rigorous, then you've made a valuable contribution. Either way, a bit more work would make the notes much more valuable...
sidjaggi said
at 8:44 pm on Jan 18, 2009
Oh, also, do try to make the formatting sort of consistent with the formatting of the previous scribe notes...
Silas said
at 4:06 pm on Jan 19, 2009
The statement "There are at most 2(\epsilon)n integers" should be replaced by "There are at least 2(\epsilon)n - 1 integers" when calculating an upper bound for atypical sequences because we need to calculate a lower bound for the number of typical sequences in order to find an upper bound for atypical sequences.
Mak said
at 5:53 pm on Jan 19, 2009
Hi,
By "not very rigorous", I just recall that you said in class that the computations are "not done very carefully".
I'm revising the notes now.
Mak said
at 6:50 pm on Jan 19, 2009
Hi Silas,
You're right. I meant to write "There are _approximately_ 2\epsilon n integers..." there, but....
Thanks for pointing out the mistake!
sidjaggi said
at 12:20 am on Jan 23, 2009
If you're done editing, Mak, and commenting, the rest of you guys, I'll edit now?
I'll wait for an hour or two for responses -- if none, I'll jump in.
(I'm traveling tomorrow morning, so either I edit tonight, or after Wed).
sidjaggi said
at 12:32 am on Feb 2, 2009
Alright Mak, done with my comments -- let me know if you disagree with anything.
sidjaggi said
at 12:34 am on Feb 2, 2009
excellent job, by the way :)
You don't have permission to comment on this page.