| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Scribe Note 5_1

Page history last edited by Chin Ho Lee 12 years, 5 months ago

5.1 Entropy rate of a process

 

Key concepts: Non i.i.d. processes, entropy rates, Markov sources, stationary distributions, existence and uniqueness of stationary distributions of Markov processes, convergence to stationary distribution, second law of thermodynamics, compression of non i.i.d. sources.

 

In the previous discussions we have looked at the entropy of a single random variable and the entropy of a finite number of random variables. Today we discuss the entropy of a stochastic process, which is a sequence of random variables. In particular, our focus will be on "stationary (time-homogeneous/time invariant) Markov chain with a finite state space". We begin by introducing the definition of a stationary process, markov chain, and stationary distribution. Then we will discuss its relationship with the second law of thermodynamics, and finish with a brief description of how to compress a process using Huffman code.

 

 

A stochastic process Formula is a sequence of random variables. It is said to be stationary if the distribution of any subset of the sequence of the random variables is invariant under shifts of time. That is

 

Formula

 

for any Formula and Formula, and ALL Formula.

 

Let us recall the definition of a Markov chain. A Markov chain is a stochastic process such that it has the property of "knowing the present, the future is independent of the past", that is

 

Formula.

 

In our discussion, we will be focusing on stationary (also called time-homogeneous/time-invariant) Markov chain, which has the following additional constraint:

 

Formula

 for all Formula. We also define the transition matrix Formula for a Markov chain to be Formula.

 

A stationary distribution is a probability distribution that is the same at any time Formula, that is

 

Formula

for all Formula. In particular, a distribution Formula is stationary for a Markov chain if Formula.

 

Remark: A stationary distribution is NOT the same as a stationary process.

 

One would immediate question if a stationary distribution always exists and whether this distribution is unique. For a Markov chain with finite state space, one can show that it always has at least one stationary distribution. However, it may not be unique, as illustrated in the following example. 

 

Example 1:  Consider a Markov chain with the transition matrix being the identity matrix Formula. Then every distribution is stationary but the Markov chain does not have a unique stationary distribution.

 

Given a transition matrix of a Markov chain, we can always consider it as a weighted directed graph Formula. In this graph, there is an edge connecting node Formula to node Formula with weight Formula if and only if Formula. We say a Markov chain is irreducible if its graph Formula is strongly connected, that is, for every pair of nodes Formula and Formula, there exists a path connecting Formula to Formula. Moreover, we also call a Markov chain aperiodic if the largest common divisor of the lengths of different paths from a state to itself is 1.

 

Proposition: If a finite stationary Markov chain with transition matrix Formula is irreducible and aperiodic, then it has a unique stationary distribution Formula. Moreover, for any distribution FormulaFormula.

 

While we are not going to give the proof here, we consider what happens if either one of the properties is missing. Suppose a Markov chain is not irreducible. Then our example above shows that it is possibly to have more than one stationary distribution. Now consider a Markov chain with transition matrix Formula, this chain is irreducible but not aperiodic. It is easy to see that if Formula, then when n is odd Formula and when n is even Formula. Hence it does not tend towards the stationary distribution Formula.

 

We are now ready to define the entropy of a stochastic process. One attempt to define the entropy of a process would be Formula if the limit exists. However, this definition does not capture the dependency of the variables. Consider the following definition

 

Definition (Entropy rate of a stochastic process): The entropy of a stochastic process is defined by Formula if the limit exists.

 

At first sight, although this definition captures the dependency of the random variables, the R.H.S. does not seem easy to compute compared to the first one. Now we show that for a stationary Markov chain this is not the case. By its definition, one has

 

Formula

Therefore, in order to compute the entropy of a Markov process, we can simply find a stationary distribution Formula, and compute Formula.

 

Let Formula be the unique stationary distribution of a Markov chain, and Formula be the distribution of the Formulath random variable in the chain. We now show that Formula decreases to Formula as Formula tends to infinity. More precisely, we show that Formula and Formula.

 

Proof: We prove both inequality by first showing a more general inequality. Recall the chain rule for relative entropy says that given two joint probability distributions Formula one has

 

Formula, and

Formula

 

Note that if Formula and Formula are two distributions of the same Markov chain, Formula, where Formula is the transition probability of the chain. Hence Formula. Previously we also showed that Formula for any Formula and Formula. Therefore we have

 Formula.

 

The first inequality then follows from replacing Formulaand Formula by Formula and Formula respectively, since Formula is a stationary distribution, Formula. The second inequality is similar.

 

We have just showed that the relative entropy of a Markov chain decreases with respect to Formula. However this does NOT imply whether its entropy always increases or decreases. This can be easily shown by considering the fact that the entropy of a random variable is maximized when its distribution is uniform. If the stationary distribution is non-uniform and we choose a uniform distribution to be our initial distribution, then the entropy must decrease. Likewise, if the stationary distribution is uniform, then its entropy is maximized and there is no way we can start with a distribution that has a higher entropy rate.

 

Therefore one possible implicit assumption in the second law of thermodynamics, which states that the entropy of an isolated system never decreases, is the stationary distribution of the system is uniform.

 

We finish this section by considering how to compress a Markov source with Huffman coding. One way to compress a Markov source is to do Huffman coding according to the stationary distribution of the Markov chain, since this is the limiting distribution of the process (assuming it is irreducible and periodic). Another way to compress a Markov source is to do Huffman coding according to the transition probability of the process.

 

5.3 Weak typicality

 

Key concepts: Weak typicality, definition, comparison with strong typicality, properties (size, probability), weak law of large numbers

 

We first recall the definitions of convergence in probability in mathematical analysis, then we state the weak law of large number. After that we introduce weak typicality and finish the section by briefly comparing weak typicality and strong typicality.

 

Definition (Convergence in probability) A sequence of random variables Formula is said to converge to Formula in probability (denoted by Formula) if for any Formula, Formula, there exists an Formula, such that for all Formula,Formula.

 

Theorem (Strong law of large numbers) Let Formula be a sequence of i.i.d. random variables with expected value Formula. The weak law of large numbers states that Formula.

 

Definition (ε-weakly typical set) Let Formula be n random variables. We define the set

Formula

to be its ε-weakly typical set.

 

Now we show that the weakly typical sets have some good properties that we used for strongly typical sets

 

Proposition For a sequence of i.i.d. random variables FormulaFormula.

 

Proof. Formula

 

where the first quality follows from independence of the variables, convergence in probability follows from weak law of large numbers and the last equality follows from the definition of entropy.

 

Proposition For all Formula, there exists Formula such that for all FormulaFormula.

 

Proof. By the previous proposition, Formula. Hence for any Formula, choose Formula, then there exists N such that for all FormulaFormula, equivalently, Formula. Therefore, Formula.

 

Proposition Formula for Formula sufficiently large.

 

Proof. For each Formula in Formula, Formula. There are Formula typical sequences. Hence one has Formula and it follows that Formula.

 

Recall that the ε-strongly typical set is defined to be Formula. Weak-typicality may not be as "representative" of typicality as strong typicality as weak typical not only includes strong typical classes, but also other exponentially small type classes. However, this definition for strong typicality is not well-defined when the alphabet size is not finite. Hence sometimes people may still prefer using the weak typicality.

 

 

Comments (0)

You don't have permission to comment on this page.