Gaussian Mixture Model (GMM)

Here we set following symbols X : l×N ( n= 1 ~ N)

X=[x1x2...xn...xN]

πk : 1×1 (k = 1~K)

μk : l×1 (k = 1~K)

Σk: l×l (k = 1~K)

ϕ: l×N

ϕ=[ϕk(xn)]=[ϕ(xn)ϕ(xN)] =[ϕk(X)T]

ϕk(x):

ϕk(x)=1(2π)l/2|Σk|1/2exp(12(xμk)TΣk1(xμk))lnϕk(x)=l2ln(2π)12ln|Σk|12(xμk)TΣk1(xμk)

Z: K×N

Z=[...zn...zN]=[zkT]

zkn: 1×1 (k = 1 ~ K)

zk=[0,1,...,1,...,0]T represents probability K states in N samples (N×1)

zn=[0,0,...,1,...,0]T represents probability of length = K states, (K×1)only when zk=1, it would be counted into p, random variable

p(z|π)=k=1Kπk1zkp(x|z,Σ,μ)=k=1Kϕk(x)1zk

Goal:

argmaxπ,Σ,μp(X|π,Σ,μ)

think about the Expectation–maximization algorithm

argmaxπ,Σ,μlnp(X|θ)=L(q,θ)+KL(qp)

where

L(q,θ)=Zq(Z)ln{p(X,Z|θ)q(Z)}KL(qp)=Zq(Z)ln{p(Z|X,θ)q(Z)}

suppose z1z2...zn...zN independent

Here p(X,Z|θ)

p(X,Z|θ)=p(X|Z,π,Σ,μ)p(Z|π,Σ,μ)=p(X|Z,Σ,μ)p(Z|π)=n=1Np(xn|Z,Σ,μ)n=1Np(zn|π)=n=1Np(xn|zn,Σ,μ)n=1Np(zn|π)

Hence

ln(p(X,Z|θ))=n=1Nlnp(xn|zn,Σ,μ)+n=1Nlnp(zn|π)=n=1Nk=1K(lnπk)zkn+n=1Nk=1K(lnϕk(xn))zkn=lnπTZones(N,1)+n=1Nk=1K[l2ln(2π)12ln|Σk|12(xnμk)TΣk1(xnμk)]zkn=(lnπT12ln|Σ|T)Zones(N,1)l2ln(2π)ones(1,K)Zones(N,1)12k=1K[tr((Xμk1T)TΣk1(Xμk1T)diag(zk))]=(lnπT12ln|Σ|T)Zones(N,1)l2ln(2π)N12k=1K[tr((Xμk1T)TΣk1(Xμk1T)diag(zk))]

notice: X is fixed, Z,θ are variables, we turn to replace it with q(Z),θ

EM method

lnp(X|θ)=L(q,θ)+KL(q|p)

[M step] fix q(Z), change θ

fix qqk=p(Z|X,θk)=p(Z|X,θ old )

θk+1argmaxθL(qk,θ)=argmaxθZqkln{p(X,Z|θ)qk}=argmaxθZqklnp(X,Z|θ)Zqkln(qk)=argmaxθZqklnp(X,Z|θ)=argmaxθQ(θ,θ old )
L(qk,θk+1)>L(qk,θk)=lnp(X|θk)

 

where θ old θk

Q(θ,θ old )Zqklnp(X,Z|θ)

[E step] fix θ, change q(Z)

fix θθk+1

lnp(X|θk+1)=L(q,θk+1)+KL(q|p(Z|X,θk+1))=const

when KL(q|p), wen have L(q,θk+1)

When KL(q|p) reaches the minimal value 0,then q have qk+1=p(Z|X,θ)

lnp(X|θk+1)==L(qk+1,θk+1)

update θ old ,to keep current θ value

θ old θk+1
qk+1argmaxqL(q,θk+1)argminqKL(q|p(Z|X,θk+1))=0=p(Z|X,θk+1)lnp(X|θk+1)L(q,θk+1)+KL(q|p(Z|X,θk+1))=L(qk,θk+1)+KL(qk|p(Z|X,θk+1))=L(qk+1,θk+1)+KL(qk+1|p(Z|X,θk+1))=L(qk+1,θk+1)
lnp(X|θk+1)=L(qk+1,θk+1)>L(qk,θk+1)

in all

lnp(X|θk+1)=L(qk+1,θk+1)>L(qk,θk+1)>lnp(X|θk)=L(qk,θk)

So

limklnp(X|θk)=limkL(qk,θk+1)=maxθlnp(X|θ)θmaxmaxθlnp(X|θ)=limkθk+1=limkargmaxθZqklnp(X,Z|θ)=limkargmaxθQ(θ,θk)

Calculation of [M step] fix q(Z), change θ

fix qqk=p(Z|X,θk)=p(Z|X,θ old )

where θ old θk

Q(θ,θ old )Zqklnp(X,Z|θ)

Here

θk+1argmaxθL(qk,θ)=argmaxθZqklnp(X,Z|θ)Zqkln(qk)=argmaxθZqklnp(X,Z|θ)=argmaxθQ(θ,θ old )

Now find close form of Q

Zp(Z|X,θ old ) is [probability distribution] random variable of Z [probability distribution] random variable

Here qk=p(Z|X,θk)=p(Z|X,θ old )

Q(θ,θ old )Zp(Z|X,θ old )lnp(X,Z|θ)=Zp(Z|X,θ old ){(lnπT12|Σ|T)Zones(N,1)l2ln(2π)ones(1,K)Zones(N,1)12k=1K[tr((Xμk1T)TΣk1(Xμk1T)diag(zk))]}={[n=1Nznp(zn|xn,θ old )znT](lnπ12|Σ|l2ln(2π)1.)}12{n=1Nk=1K[znp(zn|xn,θ old )(xnμk)TΣk1(xnμk)zkn]}=[γ1.]T(lnπ12|Σ|l2ln(2π)1.)12k=1Ktr(diag(γkT)(Xμk1T)TΣk1(Xμk1T))

Here

p(zn|xn,θold)=p(xn|zn,θold)p(zn|θold)znp(xn|zn,θold)p(zn|θold)

So, have 1.Tγ=ones(1,N)

p(zn=k|xn,θold)=ϕkold(xn)πkoldk=1Kϕkold(xn)πkoldγknγ=ϕold(X)(πkold1.T)(1.1.(πold)Tϕold(X))

maximize Q

maxθQ(θ,θ old )Zp(Z|X,θ old )lnp(X,Z|θ)=[γ1.]T(lnπ12|Σ|l2ln(2π)1.)12k=1Ktr(diag(γkT)(Xμk1T)TΣk1(Xμk1T))s.t.1.Tπ=1

define LQλ(1.Tπ1)

for π

Lπ=[γ1.]1.πλ1.=0

so

[γ1.]=λπ

Hence

λ=1.T[γ1.]=[1.Tγ]1.=ones(1,N)1.=Nπ=[γ1.]1.T[γ1.]=[γ1.]N

for μk

Lμk=122(1)Σk1(Xμk1T)diag(γkT)1.=0

so

(Xμk1T)diag(γkT)1.=0(Xμk1T)γk=0

Hence

μk=Xγk1.Tγk

for Σk

dL=[γkT1.]T(12)Σk1dΣk12tr(diag(γkT)(Xμk1T)TdΣk1(Xμk1T))=[γkT1.]T(12)Σk1dΣk12tr((Xμk1T)diag(γkT)(Xμk1T)TdΣk1)=(12)[γkT1.]TΣk1dΣk+12tr((Xμk1T)diag(γkT)(Xμk1T)TΣk1dΣkΣk1)=(12)[γkT1.]TΣk1dΣk+12tr(Σk1(Xμk1T)diag(γkT)(Xμk1T)TΣk1dΣk)

note:0=d(Σk1Σk)=(dΣk1)Σ+Σk1dΣk , so dΣk1=Σk1dΣkΣk1

So

LΣk=(12)[γkT1.]ΣkT+12(ΣkT(Xμk1T)diag(γkT)(Xμk1T)TΣkT)=0

So

[γkT1.]Il×l=ΣkT(Xμk1T)diag(γkT)(Xμk1T)T[γkT1.]Il×l=(Xμk1T)diag(γkT)(Xμk1T)TΣk1Σk=(Xμk1T)diag(γkT)(Xμk1T)T[γkT1.]

To sum up

For M step

π=[γ1.]1.T[γ1.]=[γ1.]Nμk=Xγk1.TγkΣk=(Xμk1T)diag(γkT)(Xμk1T)T1.Tγk=Xdiag(γk)XT1.Tγkμkμk

where, have 1.Tγ=ones(1,N)

For E step

p(zn=k|xn,θold)=ϕkold(xn)πkoldk=1Kϕkold(xn)πkoldγknγ=ϕold(X)(πold1.T)(1.1.(πold)Tϕold(X))

here γk is (N x 1) matrix

γ=[γkT]