HMM_understand

why it could be defined as below:

zn=[0,0,...,1,...,0]T,represent probability of length = K states, only when znk=1, it would be counted into p, random variable

p(z1|π)=k=1Kπkz1kp(zn|zn1,A)=j=1Kk=1KAj,kzn1,jznkp(xn|zn,ϕ)=j=1Kt=1any dimensionϕj(t)znjxn(t)

Define p constant matrix

p(zn|θ)[p(zn|θ)|zn=state 1p(zn|θ)|zn=state K]=[znp(zn|X,θ old )zn]state k:zn=[010]p(z1|θ)=πp(zn|zn1,θ)[p(zn|zn1,A)|zn=state i,zn1=state j]=Ap(xn|zn,θ)[p(xn|zn,ϕ)|xn=state i,zn=state j]=ϕ

where, for short: write p(zn)p(zn|θ),p(xn)p(xn|θ)

p(z1)=πp(zn)=Ap(zn1)p(xn)=ϕp(zn)

size(p(xn|ϕ)) = any dimension×K , ϕ=[ϕ1,ϕ2,...,ϕj,...ϕK]

ϕk is probability distribution sequence of k th state, any dimension×1

xnis probability distribution sequence, any dimension×1

znis probability distribution sequence, K×1

In P(z,theta), z is observation value;

In zn=Azn1 , z is random variable;

 

Thus

p(xn|zn,ϕ)=j=1Kt=1any dimensionϕj(t)znjxn(t)

Goal

p(x1,...,xn,z1,...,zn|π,A,ϕ)=p(x1,...,xn,z1,...,zn|θ)=p(xn|x1,...,xn1,z1,...,zn,θ)p(x1,...,xn1,z1,...,zn|θ)=[p(xn|zn,ϕ)]p(x1,...,xn1,z1,...,zn|θ)=[p(xn|zn,ϕ)]p(zn|x1,...,xn1,z1,...,zn1,θ)p(x1,...,xn1,z1,...,zn1|θ)=[p(xn|zn,ϕ)][p(zn|zn1,A)]p(x1,...,xn1,z1,...,zn1|θ)

thus we get

p(x1,...,xn,z1,...,zn|θ)=[p(xn|zn,ϕ)][p(zn|zn1,A)]p(x1,...,xn1,z1,...,zn1|θ)

So

p(X,Z|θ)p(x1,...,xN,z1,...,zN|θ)=[n=1Np(xn|zn,ϕ)][n=2Np(zn|zn1,A)]p(z1|π)

Here

p(xn|zn,ϕ)=j=1Kt=1any dimensionϕj(t)znjxn(t)p(zn|zn1,A)=j=1Kk=1KAk,jzn1,jznkp(z1|π)=k=1Kπkz1k

So

lnp(X,Z|θ)=n=1Nj=1Kt=1any dimensionznjxn(t)lnϕj(t)+n=2Nk=1Kj=1Kzn1,jzn,klnAkj+k=1Kz1klnπk=n=1NznT<lnϕ,xn>+n=2Nzn1TlnATzn+lnπTz1

if t is continuous

<lnϕ,xn>=+lnϕ(t)xn(t)dtϕ=[ϕ1(t)ϕ2(t)ϕK(t)]

if m is discrete

<lnϕ,xn>=lnϕTxnϕ=[ϕ11ϕ1mϕ1Mϕ21ϕ2mϕ2MϕK1ϕKmϕKM]TM×Kmatrixxn:M×1 matrix

Only xn is known, n{1,...,N}

 

 

for example X is random variable, then Yf(X) is another variable, then P(X=x)p(x)

then

P(Y=y)dyP(X=x)dx=p(x)dx,y=f(x)P(Y=y)=p(x)dxdf(x)E[Y]=+yP(Y=y)dy=+f(x)p(x)dx

期望最⼤化算法,或者EM算法

maxlnp(X|θ)=L(q,θ)+KL(qp)

where

L(q,θ)=Zq(Z)ln{p(X,Z|θ)q(Z)}KL(qp)=Zq(Z)ln{p(Z|X,θ)q(Z)}

Where

So

lnp(X|θ)L(q,θ)=Zq(Z)ln{p(X,Z|θ)q(Z)},q(Z)if q(Z)p(Z|X,θ),here KL(qp)=0lnp(X|θ)=L(q,θ)=Zp(Z|X,θ)ln{p(X,Z|θ)p(Z|X,θ)}

notice: X is fixed, Z,θ are variables,转而用q(Z),θ 替代;

调整法

lnp(X|θ)=L(q,θ)+KL(q|p)
  1. fix θ, change q(Z) [E步骤]
q(Z)p(Z|X,θ)lnp(X|θ)=L(q,θ)+KL(q|p)=const与q无关KL(q|p)↓⇐0,update L(q,θ)lnp(X|θ)==L(q,θ)θ old θ

 

  1. fix q(Z), change θ [M步骤]

ignore influence of θ in KL(q|p)=Zq(Z)ln{p(Z|X,θ)q(Z)}

L(q,θ)=Zq(Z)ln{p(X,Z|θ)q(Z)}=Zp(Z|X,θ old )ln{p(X,Z|θ)p(Z|X,θ old )}=Zp(Z|X,θ old )lnp(X,Z|θ)const

set

Q(θ,θ old )Zp(Z|X,θ old )lnp(X,Z|θ)

get

θargmaxθL(q,θ)=argmaxθQ(θ,θ old )update L(q,θ)

update

KL(q|p)↑=Zq(Z)ln{p(Z|X,θ)q(Z)}>0lnp(X|θ)↑=L(q,θ)+KL(q|p)

Conclusion

因为1,2中 L(q,θ) 均上升;

且1中lnp(X|θ)=L(q,θ)+0↓=const

2中lnp(X|θ)↑=L(q,θ)+KL(q|p)

 

 

consider Q(θold,θ)

Use to delete variables, z1p(z1,z2)=P(z2)

X is fix observation value;

Z is [probability distribution] random variable, lnp(X,Z|θ) is random variable

Zp(Z|X,θ old ) is [probability distribution] random variable of Z [probability distribution] random variable

Q(θ,θ old )Zp(Z|X,θ old )lnp(X,Z|θ)=Zp(Z|X,θ old ){n=1NznT<lnϕ,xn>+n=2Nzn1TlnATzn+lnπTz1}={znp(zn|X,θ old )n=1NznT<lnϕ,xn>}+{zn1,znp(zn1,zn|X,θ old )n=2Nzn1TlnATzn}+{z1p(z1|X,θ old )z1Tlnπ}={n=1N[znp(zn|X,θ old )znT]<lnϕ,xn>}+{n=2N[zn1,znzn1p(zn1,zn|X,θ old )znT]lnAT}+{[z1p(z1|X,θ old )z1T]lnπ}=n=1NγnT<lnϕ,xn>+n=2Ntr(ξnTlnA)+γ1Tlnπ

where

zn1,znp(zn1,zn|X,θ old )n=2Nzn1TlnATzn={p(zn1,zn|X,θ old )n=2N[zn1TlnATzn]}=n=2Ntr(p(zn1,zn|X,θ old )[zn1TlnATzn]T)=n=2Ntr([zn1p(zn1,zn|X,θ old )znT]lnA)=n=2N{[znp(zn1,zn|X,θ old )zn1T]lnA}

其中, 因为X,θold 分别为已知、固定 在 M步骤

γn[znp(zn|X,θ old )zn] 为常数,是期望E[zn|X,θ old ], 大小K×1

ξn[znp(zn1,zn|X,θ old )zn1T] 为常数,是期望E[znzn1T|X,θ old ], 大小K×K

 

下面求解

maxθQ(θ,θ old )Zp(Z|X,θ old )lnp(X,Z|θ)=n=1NγnT<lnϕ,xn>+n=2Ntr(ξnTlnA)+γ1Tlnπ

subject to

s.t.γnT1.=1tr(ξnT1.)=11.Tϕ=1.T1.TA=1.T1.Tπ=1

 

with Lagrange method:

L{n=1NγnT<lnϕ,xn>+n=2Ntr(ξnTlnA)+γ1Tlnπ}t=1any dimensionut{1.T<ϕ,δ(t)>1}k=1Kvk{1.TAσk1}w1{1.Tπ1}

where σk=[010], 1×K

if t is discrete, any dimenstion=M

δ(t)=[010], 1×M,

<lnϕ,xn>=lnϕTxn

<ϕ,δ(t)>=ϕTδ(t)

Qϕ={n=1N[γnxnT]}T1ϕt=1Mut1.δ(t)T={n=1N[xnγnT]}1ϕ[u1utuMu1utuMu1utuM]=0QA=[n=2Nξn]1Ak=1Kvk1.σkT=[n=2Nξn]1A[v1vkvKv1vkvKv1vkvK]=0Qπ=γ11.πw11.=0

So

{n=1N[xnγnT]}ijϕij=uji=1K{n=1N[xnγnT]}ijuj=i=1Kϕij=1uj=i=1K{n=1N[xnγnT]}ijϕij={n=1N[xnγnT]}iji=1K{n=1N[xnγnT]}ijϕ={n=1N[xnγnT]}1.1.1.T{n=1N[xnγnT]}ϕ={n=1N[xnγnT]}1.1.[n=1Nγn]T[n=2Nξn]ijAij=vji=1K[n=2Nξn]ijvj=i=1KAij=1vj=i=1K[n=2Nξn]ijAij=[n=2Nξn]iji=1K[n=2Nξn]ijA=[n=2Nξn]1.1.1.T[n=2Nξn][γ1]iπi=w1i=1K[γ1]iw1=i=1Kπi=1w1=i=1K[γ1]iπi=[γ1]ii=1K[γ1]iπ=γ11.1.1.Tγ1

To sum up

ϕ={n=1N[xnγnT]}1.1.[n=1Nγn]TA=[n=2Nξn]1.1.1.T[n=2Nξn]π=γ11.1.1.Tγ1

How to calculate γn,ξn

γn[znp(zn|X,θ old )zn] 为常数,是期望E[zn|X,θ old ], 大小K×1

ξn[zn1,znznp(zn1,zn|X,θ old )zn1T] 为常数,是期望E[znzn1T|X,θ old ], 大小K×K

 

similarly get:

γnp(zn|X,θ old )[p(zn|X,θ old )|zn=state 1p(zn|X,θ old )|zn=state K]ξnp(zn1,zn|X,θ old )[p(zn1,zn|X,θ old )|zn=state i,zn1=state j]the same size asznzn1T

so

γnp(zn|X,θ old )=p(zn,X|θ old )1.p(X|θ old )=p(zn,x1,...,xn|θ old )p(xn+1,...,xN|zn,x1,...,xn,θ old )1.p(X|θ old )=p(zn,x1,...,xn|θ old )p(xn+1,...,xN|zn,θ old )1.p(X|θ old )=αnβn1.p(X|θ old )ξnp(zn1,zn|X,θ old )=p(zn1,zn,X|θ old )1.p(X|θ old )=p(zn1,x1,...,xn1|θ old )p(zn,xn,...,xN|zn1,x1,...,xn1,θ old )1.p(X|θ old )=p(zn1,x1,...,xn1|θ old )p(zn,xn,...,xN|zn1,θ old )1.p(X|θ old )=p(zn1,x1,...,xn1|θ old )p(zn|zn1,θ old )p(xn,...,xN|zn1,zn,θ old )1.p(X|θ old )=p(zn1,x1,...,xn1|θ old )p(zn|zn1,θ old )p(xn,...,xN|zn,θ old )1.p(X|θ old )=p(zn1,x1,...,xn1|θ old )p(zn|zn1,θ old )p(xn|zn,θ old )p(xn+1,...,xN|zn,xn,θ old )1.p(X|θ old )=p(zn1,x1,...,xn1|θ old )p(zn|zn1,θ old )p(xn|zn,θ old )p(xn+1,...,xN|zn,θ old )1.p(X|θ old )=[1.αn1T]A[[ϕTxn]βn1.T]1.p(X|θ old )=1p(X|θ old ){[[ϕTxn]βn]αn1T}A

其中p(X|θ old )=ZNp(zN,X|θ old )=αN=1.TαN is constant

p(X|θ old )=p(X|θ old )1.p(X|θ old )=1.Tp(zN,X|θ old )=1.TαN

So if αn,βn is obtained, we could get γn,ξn

αnp(zn,x1,...,xn|θ old )βnp(xn+1,...,xN|zn,θ old )

How to obtain αn,βn

for αn, initial value:

α1=p(z1,x1|θ old )=p(x1|z1,θ old )p(z1|θ old )=[ϕTx1]π

Recurrence relation

αnp(zn,x1,...,xn|θ old )=p(zn|θ old )p(xn|zn,θ old )p(x1,...,xn1|zn,xn,θ old )=p(zn|θ old )p(xn|zn,θ old )p(x1,...,xn1|zn,θ old )=p(xn|zn,θ old )p(zn,x1,...,xn1|θ old )=p(xn|zn,θ old ){p(zn1,zn,x1,...,xn1|θ old )1.}=p(xn|zn,θ old ){[p(zn1,zn|θ old )p(x1,...,xn1|zn1,zn,θ old )]1.}=p(xn|zn,θ old ){[(1.p(zn1|θ old )T)p(zn|zn1,θ old )(1.p(x1,...,xn1|zn1,θ old )T)]1.}=p(xn|zn,θ old ){[p(zn|zn1,θ old )(1.p(zn1,x1,...,xn1|θ old )T)]1.}=[ϕTxn]{[A(1.αn1T)]1.}=[ϕTxn][Aαn1]

 

xnTϕAαn1

for βn, last initial value:

βN=p(1|zN,θ old )=1

Recurrence relation

βnp(xn+1,...,xN|zn,θ old )={p(zn+1,xn+1,...,xN|zn,θ old )T1.}={[p(zn+1|zn,θ old )p(xn+1,...,xN|zn,zn+1,θ old )]T1.}={[p(zn+1|zn,θ old )(p(xn+1,...,xN|zn+1,θ old )1.T)]T1.}={[p(zn+1|zn,θ old )([p(xn+1|zn+1,θ old )p(xn+2,...,xN|zn+1,xn+1,θ old )]1.T)]T1.}={[p(zn+1|zn,θ old )([p(xn+1|zn+1,θ old )p(xn+2,...,xN|zn+1,θ old )]1.T)]T1.}={[A([[ϕTxn+1]βn+1]1.T)]T1.}=AT([ϕTxn+1]βn+1)

To sum up

α1=[ϕTx1]παn=[ϕTxn][Aαn1]βN=1.βn=AT([ϕTxn+1]βn+1)

Then

γn=αnβn1.p(X|θ old )=11.TαN[αnβn]ξn=1p(X|θ old ){[[ϕTxn]βn]αn1T}A=11.TαN{[[ϕTxn]βn]αn1T}A

because αN,β1 are so small

we divide p(X|θ old ) into 2 part

p(X|θ old )=p(x1,,xn|θ old )p(xn+1,,xN|x1,,xn,θ old )

define

α^n1p(x1,,xn|θ old )αn=p(zn|x1,,xn,θ old )={k=1np(xk|x1,,xk1,θ old )}αn={k=1nck}αnnotice1.Tα^n=znp(zn|x1,,xn,θ old )=1

the other way

cnα^n=p(xn|x1,,xn1,θ old )p(zn|x1,,xn,θ old )=p(zn,xn|x1,,xn1,θ old )=p(zn|x1,,xn1,θ old )p(xn|znθ old )=[p(zn,zn1|x1,,xn1,θ old )1.]p(xn|znθ old )=[[1.p(zn1|x1,,xn1,θ old )T]p(zn|zn1)1.]p(xn|znθ old )=[p(zn|zn1)p(zn1|x1,,xn1,θ old )]p(xn|znθ old )=[Aα^n1][ϕTxn]=[ϕTxn][Aα^n1]cn=1.Tcnαn=1.T{[ϕTxn][Aα^n1]}

so

α^n1cn[ϕTxn][Aα^n1]=11.T{[ϕTxn][Aα^n1]}[ϕTxn][Aα^n1]α^1=11.T{[ϕTxn]π}[ϕTxn]π

thesame

β^nβ^n+1βnβn+11p(xn+1,,xN|x1,,xn,θ old )1p(xn+2,,xN|x1,,xn+1,θ old )=βnβn+1p(x1,,xn|θ old )p(x1,,xn+1|θ old )=βnβn+11p(xn+1|x1,,xn,θ old )=βnβn+11cn+1

the

β^n1p(xn+1,,xN|x1,,xn,θ old )βn=1p(xn+1,,xN|x1,,xn,θ old )p(xn+1,...,xN|zn,θ old )=1cn+1AT([ϕTxn+1]β^n+1)β^N=1.

then

γn=[α^nβ^n]ξn=1cn{[[ϕTxn]β^n]α^n1T}A
αN

其中p(X|θ old )=ZNp(zN,X|θ old )=αN=1.TαN is constant

p(X|θ old )=p(X|θ old )1.p(X|θ old )=1.Tp(zN,X|θ old )=1.TαN
ϕ={xn[n=2Nγn]T}1.1.1.T{xn[n=2Nγn]T}A=[n=2Nξn]1.1.1.T[n=2Nξn]π=γ11.1.1.Tγ1