Markov Chains

1. Introduction

​ Let {Xn,n=0,1,2,...} be a stochastic process that takes on a finite number of possible values. If Xn=i, then the process is said to be in state i at time n.

​ We suppose that whatever the process is in state i, there is a fixed probability Pij that it will next be in state j.

2. Chapman-Kolmogorov Equations

​ We now define the n-step transition probabilities Pijn to be the probability that a process in state i will be in state j after n additional transitions. That is,

Pijn=P{Xn+k=j | Xk=i},n0,i,j0

The Chapman-Kolmogorov equations tell us that

(2.1)Pijn+m=k=0PiknPkjmfor all n,m0, all i,j

3. Classification of States

​ State j is said to be accessible from state i if Pijn>0 for some n0. Two states i and j are accessible to each other are said to communicate, we write ij.

​ Obviously, the relation of communication has the following properties:

  • if ij, then ji
  • if ij and jk, then ik.

​ Two states that communicate are said to be in the same class. Any two classes of states are either identical or disjoint.

​ The concept of communication divides the state space into a number of separate classes. The Markov chain is said to be irreducible if there is only one class, that is, if all states communicate with each other. In other words, it is possible to move from any state to any other state.

​ For any state i, we let fi denote the probability that, starting in state i, the process will ever reenter state i. State i is said to be recurrent if fi=1 and transient if fi<1.

​ Each time the process enters a transient state i, there will be a positive probability 1fi that the process will never enter state i again. Therefore, starting in state i, the probability that the process will be in state i for exactly n time periods equals fin1(1fi),n1. In other words, starting in state i, the number of time periods that the process will be in state i has a geometric distribution with finite mean 1/(1fi).

​ A transient state will only be visited a finite number of times. A finite-state Markov chain not all states can be transient.

Corollary 3.1 If state i is recurrent, and state i communicates with state j, then state j is recurrent. Conversely, if i is transient and communicates with state j, then state j must also be transient.

4. Limiting Probabilities

​ 2nd definition of period: The states in a recurrent class are periodic if they can be grouped into d>1 groups so that all transitions from one group lead to the next group.

​ State i is said to have period d if Piin=0 whenever n is not divisible by d, and d is the largest integer with this property. A state with period 1 is said to be aperiodic. Periodicity is a class property, that is, if state i has period d, and i communicate with j, then j also has period d.

​ If state i is recurrent, then it is said to be positive recurrent if, starting in i, the expected time until the process returns to state i is finite. Positive recurrent, aperiodic states are called ergodic.

Theorem 4.1 For an irreducible ergodic Markov chain limnPijn exists and is independent of i. Furthermore, letting

πj=limnPijn,j0

then πj is the unique nonnegative solution of

(4.1)(1)πj=i=0πiPij,j0,(2)j=0πj=1

The equation can be interpreted as such: The probability of into state j is the sum of probability of the last state i into j weighted by the probability of the transition from i to j.

πj can be considered as the frequency of the state j will be visited when the number of total visiting time N.


The workers problem

​ Consider an organization whose workers are of r distinct types. Suppose that a worker who is currently type i will in the next period become type j with probability qij for j=1,...,r or will leave the organization with probability 1j=1rqij. In addition, suppose that new workers are hired each period, and that the numbers of types 1,...,r workers hired are independent Poisson random variables with mean λ1,...,λr. If we let Xn=(Xn(1),...,Xn(r)), where Xn(i) is the number of type i workers in the organization at the beginning of period n, then Xn,n0 is a Markov chain.

​ To compute its stationary probability distribution, suppose that the initial state is chosen so that the number of workers of different types are independent Poison random variables, with αi being the mean number of type i workers. Also, let Nj,j=1,...,r be the number of new type j workers being hired during the initial period. Now, fix i, and for j=1,...,r, let Mi(j) be the number of the X0(i) type i workers who become type j in the next period. Then

X1(j)=Nj+i=1rMi(j),j=1,...,r

are independent Poisson random variables with means

E[X1(j)]=λj+i=1rαiqij,j=1,...,r

Hence, if we let

αjo=λj+i=1rαiqij,j=1,...,r

then the stationary distribution of the Markov chain is the distribution that takes the number of workers in each type to be independent Poisson random variables with means α1o,...,αro. That is

limnP{Xn=(k1,...,kr)}=i=1reαio(αio)kiki!


Mean Pattern Times in Markov Chain Generated Data

​ Consider an irreducible Markov chain {Xn,n0} with transition probabilities Pij and stationary probabilities πj,j0. Starting in state r, we are interested in determining the expected number of transitions until the pattern i1,i2,...,ik appears. That is, with

N(i1,i2,...,ik)=min{nk | Xnk+1=i1,...,Xn=ik}

we are interested in

E[N(i1,i2,...,ik) | X0=r]

​ Let μ(i,i1) be the mean number of transitions for the chain to enter state i1, given that the initial state is i,i0. The quantities μ(i,i1) can be determined as the solution of the following set of equations, obtained by conditioning on the first transition out of state i:

μ(i,i1)=1+ji1Pijμ(j,i1),i0

For the Markov chain {Xn,n0} associate a corresponding Markov chain, which we will refer to as the k-chain, whose state at any time is the sequence of the most recent k states of the original chain. Let π(j1,...,jk) be the stationary probabilities for the k-chain. Because π(j1,...,jk) is the proportion of time that the original Markov chain k units ago was j1 and the following k1 states, in sequence, were j2,...,jk, we can conclude that

π(j1,...,jk)=πj1l=1k1Pjljl+1

Moreover, because the mean number of transitions between successive visits of the k-chain to the state i1,i2,...,ik is equal to the inverse of the stationary probability of that state, we have that

(3)E[number of transitions between visits to i1,i2,...,ik](4)=1π(i1,...,ik)

Let A(i1,...,im) be the additional number of transitions needed until the pattern appears, given that the first m transitions have taken the chain into states X1=i1,...,Xm=im.

​ Now we consider whether the pattern has overlaps, where we say that the pattern i1,i2,...,ik has an overlap of size j,j<k, if the sequence of its final j elements is the same as that of its first j elements.

Case 1

Later


Proposition 4.1 Let {Xn,n1} be a irreducible Markov chain with stationary probabilities πj,j0, and let r be a bounded function on the state space. Then

limNn=1Nr(Xn)N=j=0r(j)πj

Proof If we let aj(N) be the amount of time the Markov chain spends in state j during periods 1,...,N, then

n=1Nr(Xn)=j=0aj(N)r(j)

Since aj(N)/Nπj, we can divide both sides by N and let N.

5. Some Applications

The Gambler's Ruin Problem
A Model for Algorithmic Efficiency

6. Mean Time Spent in Transient States

​ For transient state i and j, let sij denote the expected number of time periods that the Markov chain is in state j, given that it starts in state i. Let δij=1 when i=j and let it be 0 otherwise. Condition on the initial transition to obtain

(5)sij=δij+kPikskj(6)=δij+k=1tPikskj

where the final equality follows since it is impossible to go from a recurrent to a transient state, implying that skj=0 when k is a recurrent state.

​ Let S denote the matrix of sij,i,j=1,...,t, the above equation can be written as

S=I+PTSS=(IPT)1

7. Branching Processes

​ Consider a population consisting of individuals able to produce offspring of the same kind. Suppose that each individual will have produced j new offsprings with probability Pj,j0, independently of the numbers produced by other individuals. We suppose that Pj<1 for all j0. The size of the nth generation is denoted as Xn,n=0,1,.... It follows that {Xn,n=0,1,...} is a Markov chain having as its state space the set of nonnegative integers.

​ Note that state 0 is a recurrent state, since clearly P00=1. Also, if P0>0, all other states are transient, since Pi0=P0i, which implies that starting with i individuals there is a positive probability that no longer generation will ever consist of i individuals. This leads to the important conclusion that, if P0>0, the population will either die out or its size will converge to infinity.

​ Let μ=j=0jPj denote the mean number of offspring of a single individual, and let σ2=j=0(jμ)2Pj be the variance of the number of offspring produced by a single individual.

Xn can be written as Xn=i=1Xn1Zi, where Zi represents the number of offsprings produced by the ist individual of the (n1)st generation. By conditioning on Xn1, we obtain

(7)E[Xn]=E[E[Xn|Xn1]](8)=E[E[i=1Xn1Zi|Xn1]](9)=E[i=1Xn1E[Zi|Xn1]](10)=E[μXn1](11)=μE[Xn1]

Since E[X0]=1, the equation yields that E[Xn]=μn.

​ Similarly, Var(Xn) may be obtained by using the conditional variance formula

Var(Xn)=E[Var(Xn|Xn1)]+Var(E[Xn|Xn1])

The law of total variance or conditional variance formulas, states that if X and Y are random variables on the same probability space, and the variance of Y is finite, then

Var(Y)=E[Var(Y|X)]+Var(E[Y|X])

Given Xn1, Xn is just the sum of Xn1 independent random variables each having the distribution {Pj,j0}. Hence,

E[Xn|Xn1]=μXn1,Var(Xn|Xn1)=Xn1σ2

Therefore,

Var(Xn)={(12)σ2μn1(1μn1μ),μ1(13)nσ2,μ=1

​ Let π0 denote the probability that the population will die out (under the assumption that X0=1).

π0=limnP(Xn=0|X0=1)

Suppose that X1=j, and since all individuals are independent, then the probability of these j individuals to die out is π0j, that is

limnP(Xn=0|X1=j)=π0j

Thus the following equation holds:

π0=j=0π0jPj

If fact when μ>1, it can be shown that π0 is smallest positive number satisfying the equation.

8. Time Reversible Markov Chains

​ Suppose that starting at some time we trace the sequence of states going backward in time. That is, starting at time n, consider the sequence Xn,Xn1,.... It turns out that this sequence of states is itself a Markov chain with transition probabilities Qij defined by

(14)Qij=P(Xm=j|Xm+1=i)(15)=P(Xm=j,Xm+1=i)P(Xm+1=i)(16)=P(Xm+1=i|Xm=j)P(Xm=j)P(Xm+1=i)(17)=PjiP(Xm=j)P(Xm+1=i)(18)=Pjiπjπi

To prove that the reversed process is indeed a Markov chain, we must verify that

P{Xm=j|Xm+1=i1,Xm+2=i2,...}=P{Xm=j|Xm+1=i}

​ As independence is a mutual relationship, when we say future states Xm+1,Xm+2,... given the present state Xm are independent of the past state Xm1, it is also true that past states Xm1,Xm2,... given the present state Xm are independent of the future state Xm+1.

​ Thus, the reversed process is also a Markov chain with transition probabilities given by Qij=Pjiπjπi. If Qij=Pij, the Markov chain is said to be time reversible. Which means

(8.1)Pijπi=Pjiπj

This condition can be interpreted as for all states i and j, the rate at which the process goes from i to j is equal to the rate from j to i.

Theorem 8.1 A Markov chain with transition probabilities Pij is said to be time reversible if for any two states i and j, the equation Pijπi=Pjiπj holds.

​ If we can find nonnegative numbers, summing to one, that satisfy Equation (8.1), then we can say the Markov chain is time reversible. Actually, if we sum over i yields

iPijπi=πjiPji=πj,iπi=1

which is equivalent to Equation (4.1).

Theorem 8.2 An ergodic Markov chain for which Pij=0 whenever Pji=0 is time reversible if and only if starting in state i, any path back to i has the same probability as the reversed path. That is, if

Pii1Pi1i2Pikj=PjikPjik1Pi1i

for all states i,i1,...,ik. The converse is not true.

Proposition 8.1 Consider an irreducible Markov chain with transition probabilities Pij. If we can find positive numbers πi,i0, summing to one, and a transition probability matrix Q=[Qij] such that

Pijπi=Qjiπj

then the Qij are the transition probabilities of the reversed chain and the πi are the stationary probabilities both for the original and reversed chain.

9. Markov Chain Monte Carlo Methods

​ A Monte Carlo method usually follows the following pattern:

  1. Define a domain of possible inputs
  2. Generate inputs randomly from a probability distribution over the domain
  3. Perform a deterministic computation on the inputs
  4. Aggregate the results

​ However, when the inputs are vectors, it is difficult to generate a random vector having the specified probability mass function. In these cases, we can generate a sequence, not of independent random vectors, but of the successive states of a vector-valued Markov chain X1,X2,... whose stationary probabilities are P{X=xj},j1.

​ Let b(j),j=1,2,... be positive numbers whose sum B=j=1bj is finite. The following, known as Hastings-Metropolis algorithm, can be used to generate a time reversible Markov chain whose stationary probabilities are

π(j)=b(j)B,j=1,2,...

​ To begin, let Q be any specified irreducible Markov transition probability matrix on the integers, with q(i,j) representing the row i column j element of Q. Now define a Markov chain {Xn,n0} as follows. When Xn=i, generate a random variable Y such that P{Y=j}=q(i,j),j=1,2,.... If Y=j,then set Xn+1 equal to j with probability α(i,j), and set it equal to i with probability 1α(i,j). Under these conditions, it is easy to see that the sequence of states constitutes a Markov chain with transition probabilities Pij given by

(19)Pij=q(i,j)α(i,j),ji(20)Pij=q(i,i)+kiq(i,k)(1α(i,k))

​ This Markov chain will be time reversible and have stationary probabilities π(j) if

π(i)Pij=π(j)Pji,ij

which is equivalent to

π(i)q(i,j)α(i,j)=π(j)q(j,i)α(j,i)

If we take πj=b(j)/B and set

α(i,j)=min(π(j)q(j,i)π(i)q(i,j),1)

For if α(i,j)=π(j)q(j,i)π(i)q(i,j), then α(j,i)=1; if α(i,j)=1, then α(j,i)=π(i)q(i,j)π(j)q(j,i).

​ Since π(j)=b(j)/B, then

α(i,j)=min(b(j)q(j,i)b(i)q(i,j),1)

which shows that the value of B is not needed to define the Markov chain.

10. Markov Decision Processes

​ Consider a process with M possible states, after every transition, an action must be chosen. The next state of the system is determined according to the transition probabilities Pij(a). Let Xn denote the process state at time n, let an denote the action chosen at time a. Then

P{Xn+1=j|X0,a0,X1,a1,...,Xn=i,an=a}=Pij(a)

​ By a policy, we mean a rule for choosing actions. The rule shall be restricted to choosing the action at time n only depends on time n and state of process at time n. On the other hand, we allow the policy to be "randomized", that it may choose actions according to a probability distribution.

​ Under any given policy β, the sequence of states constitutes a Markov chain with transition probabilities Pij(β) defined by

(21)Pij(β)=aPij(a)βi(a)

Let us suppose for every choice of a policy β, the resultant Markov chain is irreducible.

​ For any policy β, let πia denote the stationary probability that the process will be in state i and action a will be chosen if policy β is employed.

πia=limnPβ{Xn=i,an=a}

The vector π=(πia) must satisfy

  1. πia0 for all i,a
  2. iaπia=1
  3. aπja=iaPij(a) for all j

​ It turns out that for any policy β, the vector satisfying these three conditions exists. The converse is also true, suppose that π=(πia) is a vector that satisfies the three conditions, then let the policy β=(βi(a)) be

βi(a)=πiaaπia

we see that (Pia) is the unique solution of

(22)Pia0,(23)iaPia=1,(24)Pja=iaPiaPij(a)πjaaπja

Hence, to show that Pia=πia, we need show that

(25)Pia0,(26)iaPia=1,(27)πja=iaπiaPij(a)πjaaπja

The third equation is equivalent to

aπja=iaπiaPij(a)

​ Thus we have shown that a vector ββ=(πia) will satisfy the three conditions if and only if there exists a policy ββ such that πia is equal to the steady-state probability of being in state i and choosing action a when ββ is used. In fact, the policy is defined by ββ=πia/aπia.

​ The above conclusion is quite important in the determination of "optimal" policies. Consider that a reward R(i,a) is earned whenever action a is chosen in state i. Let R(Xi,ai) denote the reward earned at time i. The expected average reward per time unit under policy ββ can be expressed as

limnEββ[i=1nR(Xi,ai)n]=limnE[R(Xn,an)]

The limiting expected reward at time n equals

limnE[R(Xn,an)]=iaπiaR(i,a)

Hence, the problem of determining the policy that maximizes the expected average reward is

(28)maxiaπiaR(i,a)(29)subject toπia0,for all i,a,(30)iaπia=1,(31)aπja=iaπiaPij(a),for all j

11. Hidden Markov Chains

​ Let {Xn,n=1,2,...} be a Markov chain with transition probabilities Pi,j and initial state probabilities pi=P(X1=i),i0. Suppose that there is a finite set I of signals, and that a signal from I is emitted each time the Markov chain enters a state. Further, suppose that when the Markov chain enters state j then, independently of previous Markov chain states, the signal emitted is s with probability p(s|j),sIp(s|j)=1.

​ A model of the preceding type in which the sequence of signals S1,S2,... is observed, while the sequence of the underlying Markov chain states X1,X2,... is unobserved, is called a hidden Markov chain. That is, if Sn represents the nth signals emitted, then

(32)P(S1=s|X1=j)=p(s|j),(33)P(Sn=s|S1,X1,...,Sn1,Xn1,Xn=j)=p(s|j)

​ Let Sn={s1,s2,...,sn} denote the random vector of the first n signals. For a fixed sequence of signals s1,s2,...,sn, let ssk={s1,...,sk},kn. To begin, we determine the conditional probability of the Markov chain state at time n given that Sn=ssn. Let

Fn(j)=P(Sn=ssn,Xn=j)

Then

(34)Fn(j)(35)= P(Sn1=ssn1,Sn=ssn,Xn=j)(36)= iP(Sn1=ssn1,Xn1=i,Sn=ssn,Xn=j)(37)= iP(Sn=ssn,Xn=j|Sn1=ssn1,Xn1=i)Fn1(i)(38)= iP(Sn=ssn,Xn=j|Xn1=i)Fn1(i)(39)= iPijP(Sn=ssn|Xn=j,Xn1=i)Fn1(i)(40)= iPijP(Sn=ssn|Xn=j)Fn1(i)(11.1)= p(sn|j)iPijFn1(j)

In case you don't understand, in the preceding we used that

P(AB|C)=P(B|C)P(A|BC)

So

(41)P(Sn=ssn,Xn=j|Xn1=i)(42)= P(Xn=j|Xn1=i)P(Sn=ssn|Xn=j,Xn1=i)(43)= Pij p(sn|j)

Based on Fn(i), we can recursively determine Fn+1(i).

​ This computation of P{Sn=ssn} by recursively determining the functions Fk(i) is known as the forward approach. There is also a backward approach, which is based on the quantities Bk(i) defined by

Bk(i)=P{Sk+1=sk+1,...,Sn=sn|Xk=i}

A recursive formula for Bk(i) can be obtained by conditioning on Xk+1.

(44)Bk(i)(45)=P{Sk+1=sk+1,...,Sn=sn|Xk=i}(46)=jP{Sk+1=sk+1,...,Sn=sn|Xk=i,Xk+1=j}Pij(47)=jP{Sk+1=sk+1,...,Sn=sn|Xk+1=j}Pij(48)=jP(Sk+1=sk+1|Xk+1=j)(49)×P{Sk+2=sk+2,...,Sn=sn|Sk+1=sk+1,Xk+1=j}Pij(11.2)=jp(sk+1|j)Bk+1(j)Pij

​ Starting with

Bn1(i)=P{Sn=sn|Xn1=i}=jPijp(sn|j)

we would then use Equation (11.2) to determine the function Bn2(i), then recursively down to B1(i). This would then yield P{Sn=ssn} via

(50)P(Sn=ssn)(51)=iP{S1=s1,...,Sn=sn|X1=i}pi(52)=iP{S1=s1|X1=i}×P{S2=s2,...,Sn=sn|S1=s1,X1=i}pi(53)=ip(s1|i)P{S2=s2,...,Sn=sn|X1=i}pi(54)=ip(s1|i)B1(i)pi

​ Another approach to obtaining P{Sn=ssn} is to combine the forward and backward approaches. Suppose that for some k we have computed both functions Fk(j) and Bk(j). Because

(55)P{Sn=ssn,Xk=j}(56)= P{Sk=sk,Xk=j}(57)× P{Sk+1=sk+1,...,Sn=sn|Sk=sk,Xk=j}(58)= P{Sk=sk,Xk=j}P{Sk+1=sk+1,...,Sn=sn|Xk=j}(59)= Fk(j)Bk(j)

we see that

P(Sn=ssn)=jFk(j)Bk(j)

Now we may simultaneously compute the sequence of forward functions, starting with F1, as well as the sequence of backward functions, starting at Bn1. The parallel computations can then be stopped once we have computed both Fk and Bk at some k.

11.1 Predicating the States

​ Suppose that the first n observed signals are {S1,S2,...,Sn}, and that given this data, we want to predict the first n states of the Markov chain. The best predicator depends on what we are trying to accomplish. If our objective is to maximize the expected number of correctly predicated states, then for each j=1,...,n, we compute P(Xk=j|Sn=ssn) and let the value of j that maximizes this quantity be the predicator of Xk.

(60)P(Xk=j|Sn=ssn)=P(Sn=ssn,Xk=j)P(Sn=ssn)(61)=Fk(j)Bk(j)jFk(j)Bk(j)

​ If we consider a sequence of states as an entity, then our objective is to choose a sequence of states that maximizes the conditional probability, given the sequence of signals, is maximal. Letting XXk=(X1,...,Xk) be the vector of the first k states, the problem of interest is to find the sequence of states i1,...,in that maximizes P{XXn=(i1,...,in)|Sn=ssn}. Because

P{XXn=(i1,...,in)|SSn=ssn}=P{XXn=(i1,...,in),SSn=ssn}P(SSn=ssn)

this is equivalent to finding the sequence i1,...,in that maximizes PXXn=(i1,...,in),SSn=ssn.

​ Let, for kn,

Vk(j)=maxi1,...,ik1P{XXk1=(i1,...,ik1),Xk=j,SSk=ssk}

To recursively solve Vk(j),

(62)Vk(j)(63)= maximaxi1,...,ik2P{Xk2=(i1,...,ik2),Xk1=i,SSk1=ssk1,(64)Xk=j,Sk=sk}(65)= maximaxi1,...,ik2P{Xk2=(i1,...,ik2),Xk1=i,SSk1=ssk1}(66)× P{Xk=j,Sk=sk|Xk2=(i1,...,ik2),Xk1=i,SSk1=ssk1}(67)= maximaxi1,...,ik2P{Xk2=(i1,...,ik2),Xk1=i,SSk1=ssk1}(68)× P{Xk=j,Sk=sk|Xk1=i}(69)= maxiPij p(sk|j) Vk1(i)(70)= p(sk|j)maxiPijVk1(i)

​ To obtain the maximizing sequence of states, we work in the reverse direction. Let jn be the value that maximizes Vn(j). Also, for k<n, let ik(j) be a value of i that maximizes PijVk(i). Then

(71)maxi1,...,inP{XXn=(i1,...,in),SSn=ssn}(72)= Vn(jn)(73)= maxi1,...,in1P{XXn=(i1,...,in1,jn),SSn=ssn}(74)= p(sn|jn)maxiPi,jnVn1(i)(75)= p(sn|jn)Pin1(jn)jnVn1(in1(jn))

Thus, in1(jn) is the next to last state of the maximizing sequence. Continuing in this manner, the second from the last state of the maximizing sequence if in2(in1(jn)), and so on.

​ The preceding approach to finding the most likely sequence of states given a prescribed sequence of signals is known as the Viterbi Algorithm.

posted @   kaleidopink  阅读(116)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
点击右上角即可分享
微信分享提示