Information Theory Scribe Notes for Week 9
1. Review on code construction
In code design, we need to consider three things:
- the probability of getting errors;
- compressing rate or transmitting rate;
- complexity in encoding and decoding.
2. Information theory v.s. coding theory
Information theory assumes the noise is random. Like in Binary Symmetric Channel (BSC), transform 

We used typical set to encode and decode for channel coding.
If transmitting a codeword of all zeros with block size n, the typical set with largest error will contains about
zeros. It’s like a ball with a radius of
, corresponding to
ones in transmitted codeword.

On the shell of the ball, we have
elements, while there are
elements inside the ball.

Since
, errors shown on the shell of the ball is about the overall errors for the whole ball.
The idea of increasing the transmit rate while reducing errors is to fill in as much as balls of volume
into the typical set of
with no interaction of each other. So we comes an upper bound:

Now, we compare the two cases:
A. Information theory

The noise is random following a Bernoulli distribution. Code can reduce the error to a small level.
B. Coding theory

The noise is rare with no more than pn changes in block size n. Code can achieve zero error as the same rate as above.
3. Codes for binary channels
Here, we assume errors for a block of n bits is no more than np, (p<1/2).
1) Repeating code
In order to decode correctly, we need to map each bit
to a word
. So, in the worst case, even pn bits were flipped, we can still get the original bit by taking a majority vote. If the source has
bits, then we encode them into
. The rate now is
, which goes to zero for large n.
2) Hamming code
Definition of symbols:
- q: character set, eg:

- n: block size, eg:

- k: = Rn, code length, where R is the rate, eg:

- d: minimum distance, eg: 3
Give
, we have code
. The relationship between each x is shown in the graph.

We first fill in
, and then fill
by maintaining the number of ones in each circle is an even number i.e. 0, 2, 4. No matter which bit is changed afterwards, we can always identify it by checking each circle. The combination of changes in three circle can uniquely be mapped to the change of one bit.
In comparison, repeating code would require 12 bits to achieve same performance as Hamming code using 7 bits, because the later tries to encode many bits together to make correction and so achieve a better performance.
In this case,
, which is similar to
with
. For general case, we can proof that Hamming distance is a measure of distance by satisfying triangular inequality. Now, knowing that the ball radius is
, we can get the Hamming bound by choosing non-overlapping balls:

The Hamming bound is

However, we don't sure how to achieve the largest set of balls with centers between any two balls is greater than
.
3) Coding-theoretic bounds
Here comes a new way of packing the balls as many as possible. Instead of considering balls of size pn, we use a ball size of 2pn, and remove the spaces of one such ball totally in each step. Once a ball of size 2pn is removed, the center of the next ball can be anywhere in the remaining space to maintain the condition of 2pn+1. Therefore, the process of choosing balls can be repeated until all spaces removed. Since we use a larger ball size, the maximum number of ball set we can get is
, which is the Gilbert–Varshamov bound. The relationship between those bounds are:

**The plot is a little different from the one in lecture. I use this reference.
An example of Plotkin bound is:
If
and
, we have
.
4. Codes for large alphabet symmetric channels
For binary code, if we write all codewords into rows of a table, we have a matrix with n columns:

Any pairwise distance is larger than
, and for the first column we can set
ones and
of zeros and so on.
Extended into large alphabet, we change
into
. The full size of code is
and the volume of ball is
. Similarly, we have the codeword matrix:

Graph between error probability p and best rate R.

Where (A) means information theory and (B) means coding theory.
In (A) ,
is uniform in pn locations. We have
becomes
for large q.
Comments (0)
You don't have permission to comment on this page.