• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

• Dokkio Sidebar (from the makers of PBworks) is a Chrome extension that eliminates the need for endless browser tabs. You can search all your online stuff without any extra effort. And Sidebar was #1 on Product Hunt! Check out what people are saying by clicking here.

View

# Scribe Notes 4-2

last edited by 13 years, 5 months ago

# Processes with memory

## 1. Entropy rate

In previous lectures, we had covered basic notions of information theory, and the source coding and channel coding theorems. We assumed for simplicity that the channel sources are memoryless and i.i.d., which is not always reasonable. For example, in English, obviously the characters are not i.i.d., combinations like "THE" is very common. Also, in reality we often encounter sequences of random variables coming from different sources, and their distributions are more likely to be non-identical. Hence, we need to consider sources with memory and not necessarily identically distributed. Markov chains are used to model sources with memory, such as texts, speech, etc.

Definition (Stationary process). A stochastic process X = {Xi} is said to be stationary if

for any xi and every n, k, where indexes are added modulo n.

Note that being stationary does not imply independence. For example, take a collection of i.i.d. random variables {Xi} and define Zn = Xn+1 + Xn.

Definition (Stationary distribution). A probability distribution on the states of a stochastic process is stationary if it is the same at time n + 1 as at time n.

Definition (Markov chain/Markov process, time invariance). A stochastic process X = {Xi} is a Markov chain or a Markov process if for any n = 1, 2,...,

for all Xi. The Markov chain is said to be time-invariant if p(xn+1|xn) does not depend on n.

Let X = {Xi} be a time-invariant Markov process. From the above definition, we see that it is sufficient to determine the state Xn at time n if the initial state and the transition matrix P = (Pij) are given, where Pij = Pr(Xn+1 = j | Xn = i). Namely,

.

If v is a probability vector, then v is a stationary distribution if it satisfies v = vP. Hence, if the transition matrix P of X  has an eigenvalue 1, then it has a stationary distribution. Now, we may ask does a stationary distribution exist, and is it unique when exists? To answer this, we need to define two properties which a Markov chain can possess.

Definition (Irreducibility, aperiodicity). We say X is irreducible if it is possible to go from one state to any other states with positive probabilities in finite steps. It is said to be aperiodic if the greatest common divisor of the lengths of paths from one state to itself is 1.

Theorem (Existence and uniqueness of stationary distribution).  A Markov chain has a unique stationary distribution if and only if it is both irreducible and aperiodic.

We now define the entropy rate of a process, which measures how the entropy of the sequence of random variables grows with n.

Definition (Entropy rate). The entropy rate H(X) of a stochasic process X = {Xi} is defined as the following:

For a stationary stochastic process, its entropy rate has an alternative expression.

Theorem (Entropy rate of stationary processes). Let X= {Xi} be a stationary stochastic process. Then we have

For a stationary time-invariant Markov chain, we have from the above

.

## 2. Entropy and the second law of thermodynamics

The second law of thermodynamics states that the entropy of an isolated system can never decrease. In the context of information theory, there are situations where entropy decreases with time. Consider a Markov process with the following transition matrix:

Since the number of possible states decreases after several iterations, the entropy decreases if the initial distribution is (1, 0, 0, 0) or (1/4, 1/4, 1/4, 1/4).

In general, if there is a non-uniform stationary distribution, then entropy has to decrease with time for some initial distributions, and vice versa. Such situation does not happen in physics circumstance, as stationary distribution must be uniform for many physical systems.

## 3. Compressing a Markov source

To compress a Markov source, we may use Huffman code. We discuss two variants of this method.

Method 1: Compute the stationary distribution of the Markov process, and then use Huffman code with for compression. Here, n is chosen to be large.

Method 2:  Two schemes of compression are employed. If the last symbol is j, then we compress using Huffman code with distribution (p1j,p0j). The method works since the both encoder and decoder know the initial distribution, the first symbol indicating which scheme is used and transition probabilities in advance.

For further information about entropy rate, one may refer to Chapter 4 of the textbook.