隐马尔可夫模型(hidden Markov model,HMM)是可用于标注问题的统计学习模型,描述由隐藏的马尔可夫链随机生成观测序列的过程,属于生成模型。
1、隐马尔可夫模型的基本概念
隐马尔可夫模型的定义
隐马尔可夫模型是关于时序的概率模型,描述由一个隐藏的马尔可夫链随机生成不可观测的状态随机序列,再由各个状态生成一个观测而产生观测随机序列的过程.隐藏的马尔可夫链随机生成的状态的序列,称为状态序列(state sequence);每个状态生成一个观测,而由此产生的观测的随机序列,称为观测序列(observation sequence)。序列的每一个位置又可以看作是一个时刻。
状态集合Q={q1,q2,…,qN}∣Q∣=N\begin{aligned} & Q=\left\{q_{1},q_{2},\ldots ,q_{N}\right\} \quad \left| Q\right| =N \end{aligned}Q={q1,q2,…,qN}∣Q∣=N
观测集合V={v1,v2,…,vM}∣V∣=M\begin{aligned} & V=\left\{v_{1},v_{2},\ldots ,v_{M}\right\} \quad \left| V\right| =M \end{aligned}V={v1,v2,…,vM}∣V∣=M
状态序列I={i1,i2,…,it,…,iT}it∈Q(t=1,2,…,T)\begin{aligned} & I=\left\{i_{1},i_{2},\ldots ,i_{t},\ldots,i_{T}\right\} \quad i_{t}\in Q \quad \left(t=1,2,\ldots,T \right)\end{aligned}I={i1,i2,…,it,…,iT}it∈Q(t=1,2,…,T)
观测序列O={o1,o2,…,ot,…,oT}ot∈V(t=1,2,…,T)\begin{aligned} & O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\} \quad o_{t}\in V \quad \left(t=1,2,\ldots,T \right)\end{aligned}O={o1,o2,…,ot,…,oT}ot∈V(t=1,2,…,T)
状态转移矩阵A=[aij]N×N\begin{aligned} & A=\left[a_{ij}\right]_{N\times N} \end{aligned}A=[aij]N×N
在ttt时刻处于状态qiq_{i}qi的条件下,在t+1t+1t+1时刻转移到状态qjq_{j}qj的概率aij=P(it+1=qj∣it=qi)(i=1,2,…,N)(j=1,2,…,M)\begin{aligned} & a_{ij}= P\left( i_{t+1}=q_{j}|i_{t}=q_{i}\right) \quad \left(i=1,2,\ldots,N \right) \quad \left(j=1,2,\ldots,M \right)\end{aligned}aij=P(it+1=qj∣it=qi)(i=1,2,…,N)(j=1,2,…,M)
观测概率矩阵B=[bj(k)]N×M\begin{aligned} & B=\left[b_{j}\left(k\right)\right]_{N\times M} \end{aligned}B=[bj(k)]N×M
在ttt时刻处于状态qiq_{i}qi的条件下,生成观测vkv_{k}vk的概率bj(k)=P(ot=vk∣it=qj)(k=1,2,…,M)(j=1,2,…,N)\begin{aligned} & b_{j}\left(k\right)= P\left( o_{t}=v_{k}|i_{t}=q_{j}\right) \quad \left(k=1,2,\ldots,M \right) \quad \left(j=1,2,\ldots,N \right)\end{aligned}bj(k)=P(ot=vk∣it=qj)(k=1,2,…,M)(j=1,2,…,N)
初始概率向量π=(πi)\begin{aligned} & \pi =\left( \pi _{i}\right) \end{aligned}π=(πi)
在时刻t=1t=1t=1处于状态qiq_{i}qi的概率πi=P(i1=qi)(i=1,2,…,N)\begin{aligned} & \pi_{i} =P\left( i_{1}=q_{i}\right) \quad \left(i=1,2,\ldots,N \right) \end{aligned}πi=P(i1=qi)(i=1,2,…,N)
隐马尔科夫模型λ=(A,B.π)\begin{aligned} & \lambda =\left( A,B.\pi \right) \end{aligned}λ=(A,B.π)
隐马尔科夫模型基本假设:
齐次马尔科夫性假设:在任意时刻ttt的状态只依赖于时刻t−1t-1t−1的状态。P(it∣it−1,ot−1,…,i1,o1)=P(it∣it−1)(t=1,2,…,T)\begin{aligned} & P\left( i_{t}|i_{t-1},o_{t-1},\ldots,i_{1},o_{1}\right)=P\left(i_{t}|i_{t-1}\right) \quad \left(t=1,2,\ldots,T\right) \end{aligned}P(it∣it−1,ot−1,…,i1,o1)=P(it∣it−1)(t=1,2,…,T)观测独立性假设:任意时刻ttt的观测只依赖于时刻ttt的状态。P(ot∣iT,oT,iT−1,oT−1,…,it+1,ot+1,it,it−1,ot−1,…,i1,o1)=P(ot∣it)(t=1,2,…,T)\begin{aligned} & P\left( o_{t}|i_{T},o_{T},i_{T-1},o_{T-1},\ldots,i_{t+1},o_{t+1},i_{t},i_{t-1},o_{t-1},\ldots,i_{1},o_{1}\right)=P\left(o_{t}|i_{t}\right) \quad \left(t=1,2,\ldots,T\right) \end{aligned}P(ot∣iT,oT,iT−1,oT−1,…,it+1,ot+1,it,it−1,ot−1,…,i1,o1)=P(ot∣it)(t=1,2,…,T)
观测序列的生成过程
观测序列生成算法:
输入:隐马尔科夫模型λ=(A,B.π)\lambda =\left( A,B.\pi \right)λ=(A,B.π),观测序列长度TTT;输出:观测序列O={o1,o2,…,ot,…,oT}O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\}O={o1,o2,…,ot,…,oT}; 由初始概率向量π\piπ产生状态i1i_{1}i1;t=1t=1t=1;由状态iti_{t}it的观测概率分布bj(k)b_{j}\left(k\right)bj(k)生成oto_{t}ot;由状态iti_{t}it的状态转移概率分布aitit+1a_{i_{t}i_{t+1}}aitit+1生成状态it+1(it+1=1,2,…,N)i_{t+1} \quad \left(i_{t+1}=1,2,\ldots,N\right)it+1(it+1=1,2,…,N);t=t+1t=t+1t=t+1;如果KaTeX parse error: Expected 'EOF', got '&' at position 2: t&̲lt;T,转至3.;否则,结束。
隐马尔可夫模型的3个基本问题
隐马尔科夫模型的3个基本问题:
概率计算:已知λ=(A,B.π)\lambda =\left( A,B.\pi \right)λ=(A,B.π)和O={o1,o2,…,ot,…,oT}O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\}O={o1,o2,…,ot,…,oT},计算P(O∣λ)P\left(O| \lambda \right)P(O∣λ)学习:已知O={o1,o2,…,ot,…,oT}O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\}O={o1,o2,…,ot,…,oT},计算 λ∗=argmaxP(O∣λ)\lambda^* =\arg \max P\left( O|\lambda \right)λ∗=argmaxP(O∣λ)预测(编码):已知λ=(A,B.π)\lambda =\left( A,B.\pi \right)λ=(A,B.π)和O={o1,o2,…,ot,…,oT}O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\}O={o1,o2,…,ot,…,oT},计算 I∗=argmaxP(I∣Oλ)I^* =\arg \max P\left( I|O \lambda \right)I∗=argmaxP(I∣Oλ)
前向概率αt(i)=P(o1,o2,…,ot,it=qi∣λ)\begin{aligned} & \alpha _{t}\left( i\right) =P\left(o_{1},o_{2},\ldots ,o_{t}, i_{t}=q_{i}| \lambda \right) \end{aligned}αt(i)=P(o1,o2,…,ot,it=qi∣λ)
给定模型λ\lambdaλ,时刻ttt部分观测序列为o1,o2,…,oto_{1},o_{2},\ldots ,o_{t}o1,o2,…,ot且状态为qiq_{i}qi的概率。
2、概率计算算法
前向算法
前向概率递推计算αt(i)=P(o1,o2,…,ot,it=qi∣λ)=P(it=qi,o1t)=∑j=1NP(it−1=qj,it=qi,o1t−1,ot)=∑j=1NP(it=qi,ot∣it−1=qj,o1t−1)⋅P(it−1=qj,o1t−1)=∑j=1NP(it=qi,ot∣it−1=qj)⋅αt−1(j)=∑j=1NP(ot∣it=qi,it−1=qj)⋅P(it=qi∣it−1=qj)⋅αt−1(j)=∑j=1Nbi(ot)⋅aji⋅αt−1(j)\begin{aligned} & \alpha _{t}\left( i\right) =P\left(o_{1},o_{2},\ldots ,o_{t}, i_{t}=q_{i}| \lambda \right)=P\left(i_{t}=q_{i},o_{1}^t \right) \\ & =\sum _{j=1}^{N}P\left(i_{t-1}=q_{j},i_{t}=q_{i},o_{1}^{t-1},o_{t}\right) \\ & =\sum _{j=1}^{N}P\left(i_{t}=q_{i},o_{t}|i_{t-1}=q_{j},o_{1}^{t-1}\right)\cdot P\left(i_{t-1}=q_{j},o_{1}^{t-1} \right) \\ & =\sum _{j=1}^{N}P\left(i_{t}=q_{i},o_{t}|i_{t-1}=q_{j}\right)\cdot \alpha _{t-1}\left( j\right)\\ & =\sum _{j=1}^{N}P\left(o_{t}|i_{t}=q_{i},i_{t-1}=q_{j}\right)\cdot P\left(i_{t}=q_{i}|i_{t-1}=q_{j}\right)\cdot \alpha _{t-1}\left( j\right) \\ & =\sum _{j=1}^{N}b_{i}\left(o_{t}\right)\cdot a_{ji}\cdot \alpha _{t-1}\left( j\right)\end{aligned}αt(i)=P(o1,o2,…,ot,it=qi∣λ)=P(it=qi,o1t)=j=1∑NP(it−1=qj,it=qi,o1t−1,ot)=j=1∑NP(it=qi,ot∣it−1=qj,o1t−1)⋅P(it−1=qj,o1t−1)=j=1∑NP(it=qi,ot∣it−1=qj)⋅αt−1(j)=j=1∑NP(ot∣it=qi,it−1=qj)⋅P(it=qi∣it−1=qj)⋅αt−1(j)=j=1∑Nbi(ot)⋅aji⋅αt−1(j)
概率计算P(O∣λ)=P(o1T∣λ)=∑i=1NP(o1T,iT=qi)=∑i=1NαT(i)\begin{aligned} & P\left(O| \lambda \right) =P\left(o_{1}^{T}| \lambda\right) \\ & = \sum_{i=1}^{N}P\left(o_{1}^{T},i_{T}=q_{i}\right)\\ & = \sum _{i=1}^{N}\alpha _{T}\left( i\right)\end{aligned}P(O∣λ)=P(o1T∣λ)=i=1∑NP(o1T,iT=qi)=i=1∑NαT(i)
观测序列概率计算的前向算法:
输入:隐马尔科夫模型λ\lambdaλ,观测序列OOO;输出:观测序列概率P(O∣λ)P\left(O| \lambda \right)P(O∣λ); 初值α1(i)=πibi(o1)(t=1,2,…,N)\begin{aligned} & \alpha _{1}\left( i\right)= \pi_{i}b_{i}\left(o_{1}\right) \quad \left(t=1,2,\ldots,N\right) \end{aligned}α1(i)=πibi(o1)(t=1,2,…,N)递推 对t=1,2,…,T−1t=1,2,\ldots,T-1t=1,2,…,T−1αt+1(i)=∑j=1Nbi(ot+1)⋅aji⋅αt(j)(t=1,2,…,N)\begin{aligned} & \alpha _{t+1}\left( i\right) =\sum _{j=1}^{N}b_{i}\left(o_{t+1}\right)\cdot a_{ji}\cdot \alpha _{t}\left( j\right) \quad \left(t=1,2,\ldots,N\right) \end{aligned}αt+1(i)=j=1∑Nbi(ot+1)⋅aji⋅αt(j)(t=1,2,…,N)终止P(O∣λ)=∑j=1NαT(i)\begin{aligned} & P\left(O| \lambda \right)= \sum _{j=1}^{N}\alpha _{T}\left( i\right)\end{aligned}P(O∣λ)=j=1∑NαT(i)
后向算法
后向概率βt(i)=P(ot+1,ot+2,…,oT∣it=qiλ)\begin{aligned} & \beta_{t}\left( i\right) =P\left(o_{t+1},o_{t+2},\ldots ,o_{T}| i_{t}=q_{i} \lambda \right) \end{aligned}βt(i)=P(ot+1,ot+2,…,oT∣it=qiλ)
给定模型λ\lambdaλ,时刻ttt状态为qiq_{i}qi的条件下,从时刻t+1t+1t+1到时刻TTT的部分观测序列为ot+1,ot+2,…,oTo_{t+1},o_{t+2},\ldots ,o_{T}ot+1,ot+2,…,oT的概率。
后向概率递推计算βt(i)=P(ot+1,ot+2,…,oT∣it=qi,λ)=P(ot+1T∣it=qi)=P(ot+1T,it=qi)P(it=qi)=∑j=1NP(ot+1T,it=qi,it+1=qj)P(it=qi)=∑j=1NP(ot+1T∣it=qi,it+1=qj)⋅P(it=qi,it+1=qj)P(it=qi)=∑j=1NP(ot+1T∣it+1=qj)⋅P(it+1=qj∣it=qi)⋅P(it=qi)P(it=qi)=∑j=1NP(ot+2N,ot+1∣it+1=qj)⋅aij=∑j=1NP(ot+2T∣it+1=qj)⋅P(ot+1∣it+1=qj)⋅aij=∑j=1Nβt+1(j)⋅bj(ot+1)⋅aij\begin{aligned} & \beta _{t}\left( i\right) =P\left(o_{t+1},o_{t+2},\ldots ,o_{T}| i_{t}=q_{i}, \lambda \right)=P\left(o_{t+1}^T |i_{t}=q_{i}\right) \\ & =\dfrac {P\left(o_{t+1}^{T}, i_{t}=q_{i}\right)} {P\left(i_{t}=q_{i}\right)}\\ & =\dfrac {\sum_{j=1}^{N} P\left(o_{t+1}^{T},i_{t}=q_{i},i_{t+1}=q_{j}\right)}{P\left(i_{t}=q_{i}\right)}\\ & =\sum_{j=1}^{N} \dfrac {P\left(o_{t+1}^{T}|i_{t}=q_{i},i_{t+1}=q_{j}\right) \cdot P\left(i_{t}=q_{i},i_{t+1}=q_{j} \right)}{P\left(i_{t}=q_{i}\right)} \\ & = \sum_{j=1}^{N} P\left(o_{t+1}^{T}|i_{t+1}=q_{j}\right) \cdot \dfrac {P\left(i_{t+1}=q_{j}|i_{t}=q_{i}\right) \cdot P\left(i_{t}=q_{i} \right)}{P\left(i_{t}=q_{i} \right)} \\ & = \sum_{j=1}^{N} P\left(o_{t+2}^{N},o_{t+1}|i_{t+1}=q_{j}\right) \cdot a_{ij} \\ & = \sum_{j=1}^{N} P\left(o_{t+2}^{T}|i_{t+1}=q_{j}\right) \cdot P\left(o_{t+1}|i_{t+1}=q_{j}\right) \cdot a_{ij} \\ & = \sum_{j=1}^{N} \beta_{t+1}\left(j\right) \cdot b_{j}\left(o_{t+1}\right) \cdot a_{ij}\end{aligned}βt(i)=P(ot+1,ot+2,…,oT∣it=qi,λ)=P(ot+1T∣it=qi)=P(it=qi)P(ot+1T,it=qi)=P(it=qi)∑j=1NP(ot+1T,it=qi,it+1=qj)=j=1∑NP(it=qi)P(ot+1T∣it=qi,it+1=qj)⋅P(it=qi,it+1=qj)=j=1∑NP(ot+1T∣it+1=qj)⋅P(it=qi)P(it+1=qj∣it=qi)⋅P(it=qi)=j=1∑NP(ot+2N,ot+1∣it+1=qj)⋅aij=j=1∑NP(ot+2T∣it+1=qj)⋅P(ot+1∣it+1=qj)⋅aij=j=1∑Nβt+1(j)⋅bj(ot+1)⋅aij
概率计算P(O∣λ)=P(o1T∣λ)=∑i=1NP(o1T,i1=qi)=∑i=1NP(i1=qi)⋅P(o1∣i1=qi)⋅P(o2T∣i1=qi)=∑i=1Nπibi(o1)β1(i)\begin{aligned} & P\left(O| \lambda \right) =P\left(o_{1}^{T}| \lambda\right) \\ & = \sum_{i=1}^{N}P\left(o_{1}^{T},i_{1}=q_{i}\right)\\ & = \sum_{i=1}^{N}P\left(i_{1}=q_{i}\right) \cdot P\left(o_{1}|i_{1}=q_{i}\right)\cdot P\left(o_{2}^{T}|i_{1}=q_{i}\right) \\ & = \sum_{i=1}^{N} \pi_{i} b_{i}\left(o_{1}\right) \beta_{1}\left(i\right)\end{aligned}P(O∣λ)=P(o1T∣λ)=i=1∑NP(o1T,i1=qi)=i=1∑NP(i1=qi)⋅P(o1∣i1=qi)⋅P(o2T∣i1=qi)=i=1∑Nπibi(o1)β1(i)
观测序列概率计算的后向算法:
输入:隐马尔科夫模型λ\lambdaλ,观测序列OOO;输出:观测序列概率P(O∣λ)P\left(O| \lambda \right)P(O∣λ); 初值βT(i)=1(t=1,2,…,N)\begin{aligned} & \beta_{T}\left( i\right)= 1 \quad \left(t=1,2,\ldots,N\right) \end{aligned}βT(i)=1(t=1,2,…,N)递推 对t=T−1,T−2,…,1t=T-1,T-2,\ldots,1t=T−1,T−2,…,1βt(i)=∑j=1Nβt+1(j)⋅bj(ot+1)⋅aij(t=1,2,…,N)\begin{aligned} & \beta_{t}\left( i\right) =\sum_{j=1}^{N} \beta_{t+1}\left(j\right) \cdot b_{j}\left(o_{t+1}\right) \cdot a_{ij} \quad \left(t=1,2,\ldots,N\right) \end{aligned}βt(i)=j=1∑Nβt+1(j)⋅bj(ot+1)⋅aij(t=1,2,…,N)终止P(O∣λ)=∑j=1Nπibi(o1)β1(i)\begin{aligned} & P\left(O| \lambda \right)= \sum _{j=1}^{N}\pi_{i} b_{i}\left(o_{1}\right)\beta _{1}\left( i\right) \end{aligned}P(O∣λ)=j=1∑Nπibi(o1)β1(i)
P(O∣λ)P \left( O | \lambda \right)P(O∣λ)的前向概率、后向概率的表示
P(O∣λ)=P(o1T)=∑i=1N∑j=1NP(o1t,ot+1T,it=qi,it+1=qj)=∑i=1N∑j=1NP(o1t,it=qi,it+1=qj)P(ot+1T∣it+1=qj)=∑i=1N∑j=1NP(o1t,it=qi)P(it+1=qj∣it=qi)P(ot+1T∣it+1=qj)=∑i=1N∑j=1NP(o1t,it=qi)P(it+1=qj∣it=qi)P(ot+1∣it+1=qj)P(ot+2T∣it+1=qj)=∑i=1N∑j=1Nαt(i)aijbj(ot+1)βt+1(j)t=1,2,⋯,T−1\begin{aligned} & P \left( O | \lambda \right) = P \left( o_{1}^{T} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, o_{t+1}^{T}, i_{t}=q_{i}, i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i}, i_{t+1}=q_{j} \right) P \left( o_{t+1}^{T} | i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( i_{t+1}=q_{j} | i_{t}=q_{i} \right) P \left( o_{t+1}^{T} | i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( i_{t+1}=q_{j} | i_{t}=q_{i} \right) P \left( o_{t+1} | i_{t+1}=q_{j} \right) P \left( o_{t+2}^{T} | i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{t} \left( i \right) a_{ij} b_{j} \left( o_{t+1} \right) \beta_{t+1} \left( j \right) \quad \quad \quad t=1, 2, \cdots, T-1\end{aligned}P(O∣λ)=P(o1T)=i=1∑Nj=1∑NP(o1t,ot+1T,it=qi,it+1=qj)=i=1∑Nj=1∑NP(o1t,it=qi,it+1=qj)P(ot+1T∣it+1=qj)=i=1∑Nj=1∑NP(o1t,it=qi)P(it+1=qj∣it=qi)P(ot+1T∣it+1=qj)=i=1∑Nj=1∑NP(o1t,it=qi)P(it+1=qj∣it=qi)P(ot+1∣it+1=qj)P(ot+2T∣it+1=qj)=i=1∑Nj=1∑Nαt(i)aijbj(ot+1)βt+1(j)t=1,2,⋯,T−1
一些概率与期望值的计算
给定模型λ\lambdaλ和观测OOO,在时刻ttt处于状态qiq_{i}qi的概率γt(i)=P(it=qi∣O,λ)=P(it=qi,O∣λ)P(O∣λ)=P(it=qi,O∣λ)∑j=1N(it=qi,O∣λ)=P(o1t,it=qi)P(ot+1T∣it=qi)∑j=1NP(o1t,it=qi)P(ot+1T∣it=qi)=αt(i)βt(i)∑j=1Nαt(i)βt(i)\begin{aligned} \\ & \gamma_{t} \left( i \right) = P \left( i_{t}=q_{i} | O, \lambda \right) \\ & = \dfrac{ P \left( i_{t}=q_{i}, O | \lambda \right) } { P \left( O | \lambda \right) } \\ & = \dfrac{ P \left( i_{t}=q_{i}, O | \lambda \right) } { \sum_{j=1}^{N} \left( i_{t}=q_{i}, O | \lambda \right) } \\ & = \dfrac{ P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( o_{t+1}^{T}| i_{t}=q_{i} \right) } { \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( o_{t+1}^{T}| i_{t}=q_{i} \right) } \\ & = \dfrac{ \alpha_{t} \left( i \right) \beta_{t} \left( i \right)} { \sum_{j=1}^{N} \alpha_{t} \left( i \right) \beta_{t} \left( i \right) }\end{aligned}γt(i)=P(it=qi∣O,λ)=P(O∣λ)P(it=qi,O∣λ)=∑j=1N(it=qi,O∣λ)P(it=qi,O∣λ)=∑j=1NP(o1t,it=qi)P(ot+1T∣it=qi)P(o1t,it=qi)P(ot+1T∣it=qi)=∑j=1Nαt(i)βt(i)αt(i)βt(i)
给定模型λ\lambdaλ和观测OOO,在时刻ttt处于状态qiq_{i}qi且在时刻t+1t+1t+1处于状态qjq_{j}qj的概率
ξt(i,j)=P(it=qi,it+1=qj∣O,λ)=P(it=qi,it+1=qj,O∣λ)P(O∣λ)=P(it=qi,it+1=qj,O∣λ)∑i=1N∑j=1NP(it=qi,it+1=qj,O∣λ)=αt(i)aijbj(ot+1)βt+1(j)∑i=1N∑j=1Nαt(i)aijbj(ot+1)βt+1(j)\begin{aligned} \\ & \xi_{t} \left( i,j \right) = P \left( i_{t}=q_{i}, i_{t+1}=q_{j} | O ,\lambda \right) \\ & = \dfrac{ P \left( i_{t}=q_{i}, i_{t+1}=q_{j},O | \lambda \right) } { P \left( O | \lambda \right) } \\ & = \dfrac{ P \left( i_{t}=q_{i}, i_{t+1}=q_{j}, O | \lambda \right) } { \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( i_{t}=q_{i}, i_{t+1}=q_{j}, O|\lambda \right) } \\ & = \dfrac{ \alpha_{t} \left( i \right) a_{ij} b_{j} \left( o_{t+1} \right) \beta_{t+1} \left( j \right) } { \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{t} \left( i \right) a_{ij} b_{j} \left( o_{t+1} \right) \beta_{t+1} \left( j \right)}\end{aligned}ξt(i,j)=P(it=qi,it+1=qj∣O,λ)=P(O∣λ)P(it=qi,it+1=qj,O∣λ)=∑i=1N∑j=1NP(it=qi,it+1=qj,O∣λ)P(it=qi,it+1=qj,O∣λ)=∑i=1N∑j=1Nαt(i)aijbj(ot+1)βt+1(j)αt(i)aijbj(ot+1)βt+1(j)
在观测OOO下状态iii出现的期望∑t=1Tγt(i)=∑t=1TP(it=qi∣O,λ)\begin{aligned} & \sum_{t=1}^{T} \gamma_{t} \left( i \right) = \sum_{t=1}^{T} P \left( i_{t}=q_{i} | O, \lambda \right) \end{aligned}t=1∑Tγt(i)=t=1∑TP(it=qi∣O,λ)
在观测OOO下由状态iii转移的期望∑t=1T-1γt(i)=∑t=1T-1P(it=qi∣O,λ)\begin{aligned} & \sum_{t=1}^{T-1} \gamma_{t} \left( i \right) = \sum_{t=1}^{T-1} P \left( i_{t}=q_{i} | O, \lambda \right) \end{aligned}t=1∑T-1γt(i)=t=1∑T-1P(it=qi∣O,λ)
在观测OOO下由状态iii转移到状态jjj的期望∑t=1T-1ξt(i,j)=∑t=1T-1P(it=qi,it+1=qj∣O,λ)\begin{aligned} & \sum_{t=1}^{T-1} \xi_{t} \left( i,j \right) = \sum_{t=1}^{T-1} P \left( i_{t}=q_{i}, i_{t+1}=q_{j} | O, \lambda \right) \end{aligned}t=1∑T-1ξt(i,j)=t=1∑T-1P(it=qi,it+1=qj∣O,λ)
3、学习算法
Baum-Welch 算法
将观测序列作为观测数据OOO,将状态序列作为隐数据III,则应马尔科夫模型是含有隐变量的概率模型
P(O∣λ)=∑IP(O∣I,λ)P(I∣λ)\begin{aligned} & P \left( O | \lambda \right) = \sum_{I} P \left( O | I, \lambda \right) P \left( I | \lambda \right)\end{aligned}P(O∣λ)=I∑P(O∣I,λ)P(I∣λ)
完全数据(O,I)=(o1,o2,⋯,oT,i1,i2,⋯,oT)\begin{aligned} & \left( O, I \right) = \left(o_{1}, o_{2}, \cdots, o_{T}, i_{1}, i_{2}, \cdots, o_{T} \right)\end{aligned}(O,I)=(o1,o2,⋯,oT,i1,i2,⋯,oT)
完全数据的对数似然函数logP(O,I∣λ)\begin{aligned} & \log P \left( O, I | \lambda \right) \end{aligned}logP(O,I∣λ)
Q(λ,λ‾)Q \left( \lambda, \overline{\lambda} \right)Q(λ,λ)函数
Q(λ,λ‾)=EI[logP(O,I∣λ)∣O,λ‾]=∑IlogP(O,I∣λ)P(I∣O,λ‾)=∑IlogP(O,I∣λ)P(O,I∣λ‾)P(O∣λ‾)\begin{aligned} \\& Q \left( \lambda, \overline{\lambda} \right) = E_{I} \left[ \log P \left( O, I | \lambda \right) | O, \overline{\lambda} \right] \\ & = \sum_{I} \log P \left( O, I | \lambda \right) P \left( I | O, \overline{\lambda} \right) \\ & = \sum_{I} \log \dfrac{P \left( O, I | \lambda \right) P \left( O, I | \overline{\lambda} \right) }{P \left( O | \overline{\lambda} \right)}\end{aligned}Q(λ,λ)=EI[logP(O,I∣λ)∣O,λ]=I∑logP(O,I∣λ)P(I∣O,λ)=I∑logP(O∣λ)P(O,I∣λ)P(O,I∣λ)
其中,λ‾\overline{\lambda}λ是隐马尔科夫模型参数的当前估计值,λ\lambdaλ是隐马尔科夫模型参数。
由于对最大化Q(λ,λ‾)Q \left( \lambda, \overline{\lambda} \right)Q(λ,λ)函数,P(O∣λ‾)P \left( O | \overline{\lambda} \right)P(O∣λ)为常数因子,
以及P(O,I∣λ)=πi1bi1(o1)ai1i2bi2(o2)⋯aiT−1iTbT(oT)\begin{aligned} & P \left( O, I | \lambda \right) = \pi_{i_{1}} b_{i_{1}} \left( o_{1} \right) a_{i_{1}i_{2}} b_{i_{2}} \left( o_{2} \right) \cdots a_{i_{T-1}i_{T}}b_{T}\left( o_{T} \right)\end{aligned}P(O,I∣λ)=πi1bi1(o1)ai1i2bi2(o2)⋯aiT−1iTbT(oT)
所以求Q(λ,λ‾)Q \left( \lambda, \overline{\lambda} \right)Q(λ,λ)函数对λ\lambdaλ的最大λ=argmaxQ(λ,λ‾)⇔argmax∑IlogP(O,I∣λ)P(O,I∣λ‾)=∑Ilogπi1P(O,I∣λ‾)+∑I(∑t=1T−1logaitit+1)P(O,I∣λ‾)+∑I(∑t=1Tlogbit(ot))P(O,I∣λ‾)\begin{aligned} & \lambda = \arg \max{Q \left( \lambda, \overline{\lambda} \right) }\Leftrightarrow \arg\max \sum_{I} \log P \left( O, I | \lambda \right) P \left( O, I | \overline{\lambda} \right) \\ & = \sum_{I} \log \pi_{i_{1}} P \left( O, I | \overline{\lambda} \right) + \sum_{I} \left( \sum_{t=1}^{T-1} \log a_{i_{t}i_{t+1}} \right) P \left( O, I | \overline{\lambda} \right) + \sum_{I} \left( \sum_{t=1}^{T} \log b_{i_{t}} \left( o_{t} \right) \right) P \left( O, I | \overline{\lambda} \right)\end{aligned}λ=argmaxQ(λ,λ)⇔argmaxI∑logP(O,I∣λ)P(O,I∣λ)=I∑logπi1P(O,I∣λ)+I∑(t=1∑T−1logaitit+1)P(O,I∣λ)+I∑(t=1∑Tlogbit(ot))P(O,I∣λ)
对三项分别进行极大化:
max∑Ilogπi1P(O,I∣λ‾)=∑i=1Nlogπi1P(O,i1=i∣λ‾)s.t.∑i=1Nπi=1\begin{aligned} & \max \sum_{I} \log \pi_{i_{1}} P \left( O, I | \overline{\lambda} \right) = \sum_{i=1}^{N} \log \pi_{i_{1}} P \left( O, i_{1}=i | \overline{\lambda} \right) \\ & s.t. \sum_{i=1}^{N} \pi_{i} = 1 \end{aligned}maxI∑logπi1P(O,I∣λ)=i=1∑Nlogπi1P(O,i1=i∣λ)s.t.i=1∑Nπi=1
构造拉格朗日函数,对其求偏导,令结果为0∂∂πi[∑i=1Nlogπi1P(O,i1=i∣λ‾)+γ(∑i=1Nπi−1)]=0\begin{aligned} & \dfrac{\partial}{\partial \pi_{i}} \left[ \sum_{i=1}^{N} \log \pi_{i_{1}} P \left( O, i_{1}=i | \overline{\lambda} \right) + \gamma \left( \sum_{i=1}^{N} \pi_{i} - 1 \right) \right] = 0\end{aligned}∂πi∂[i=1∑Nlogπi1P(O,i1=i∣λ)+γ(i=1∑Nπi−1)]=0
得P(O,i1=i∣λ‾)+γπi=0∑i=1N[P(O,i1=i∣λ‾)+γπi]=0∑i=1NP(O,i1=i∣λ‾)+γ∑i=1Nπi=0P(O∣λ‾)+γ=0γ=−P(O∣λ‾)\begin{aligned} & P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \pi_{i} = 0 \\ & \sum_{i=1}^{N} \left[ P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \pi_{i} \right] = 0 \\ & \sum_{i=1}^{N} P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \sum_{i=1}^{N} \pi_{i} = 0 \\ & P \left( O | \overline{\lambda} \right) + \gamma = 0 \\ & \gamma = - P \left( O | \overline{\lambda} \right)\end{aligned}P(O,i1=i∣λ)+γπi=0i=1∑N[P(O,i1=i∣λ)+γπi]=0i=1∑NP(O,i1=i∣λ)+γi=1∑Nπi=0P(O∣λ)+γ=0γ=−P(O∣λ)
代入P(O,i1=i∣λ‾)+γπi=0P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \pi_{i} = 0P(O,i1=i∣λ)+γπi=0,得πi=P(O,i1=i∣λ‾)P(O∣λ‾)=γ1(i)\begin{aligned} & \pi_{i} = \dfrac{P \left( O, i_{1} = i | \overline{\lambda} \right)}{P \left( O | \overline{\lambda} \right)} \\ & = \gamma_{1} \left( i \right) \end{aligned}πi=P(O∣λ)P(O,i1=i∣λ)=γ1(i)
max∑I(∑t=1T−1logaitit+1)P(O,I∣λ‾)=∑i=1N∑j=1N∑t=1T−1logaijP(O,it=i,it+1=j∣λ‾)s.t.∑j=1Naij=1\begin{aligned} \\ & \max \sum_{I} \left( \sum_{t=1}^{T-1} \log a_{i_{t}i_{t+1}} \right) P \left( O, I | \overline{\lambda} \right) = \sum_{i=1}^{N} \sum_{j=1}^{N} \sum_{t=1}^{T-1} \log a_{ij} P \left( O, i_{t}=i, i_{t+1}=j | \overline{\lambda} \right) \\ & s.t. \sum_{j=1}^{N} a_{ij} = 1 \end{aligned}maxI∑(t=1∑T−1logaitit+1)P(O,I∣λ)=i=1∑Nj=1∑Nt=1∑T−1logaijP(O,it=i,it+1=j∣λ)s.t.j=1∑Naij=1得aij=∑t=1T−1P(O,it=i,it+1=j∣λ‾)∑t=1T−1P(O,it=i∣λ‾)=∑t=1T−1ξt(i,j)∑t=1T−1γt(i)\begin{aligned} \\ & a_{ij} = \dfrac{\sum_{t=1}^{T-1} P \left( O, i_{t}=i, i_{t+1}=j | \overline{\lambda} \right)}{\sum_{t=1}^{T-1} P \left( O, i_{t}=i | \overline{\lambda} \right)} \\ & = \dfrac{\sum_{t=1}^{T-1} \xi_{t} \left( i,j \right) }{\sum_{t=1}^{T-1} \gamma_{t} \left( i \right)}\end{aligned}aij=∑t=1T−1P(O,it=i∣λ)∑t=1T−1P(O,it=i,it+1=j∣λ)=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)
max∑I(∑t=1Nlogbit(ot))P(O,I∣λ‾)=∑j=1N∑t=1Tlogbj(ot)P(O,it=j∣λ‾)s.t.∑k=1Mbj(k)=1\begin{aligned} \\ & \max \sum_{I} \left( \sum_{t=1}^{N} \log b_{i_{t}} \left( o_{t} \right) \right) P \left( O, I | \overline{\lambda} \right) = \sum_{j=1}^{N} \sum_{t=1}^{T} \log b_{j} \left( o_{t} \right) P \left( O, i_{t}=j | \overline{\lambda} \right) \\ & s.t. \sum_{k=1}^{M} b_{j} \left( k \right) = 1 \end{aligned}maxI∑(t=1∑Nlogbit(ot))P(O,I∣λ)=j=1∑Nt=1∑Tlogbj(ot)P(O,it=j∣λ)s.t.k=1∑Mbj(k)=1得bj(k)=∑t=1TP(O,it=j∣λ‾)I(ot=vk)∑t=1TP(O,it=j∣λ‾)=∑t=1,ot=vkTγt(j)∑t=1Tγt(j)\begin{aligned} \\ & b_{j} \left( k \right) = \dfrac{\sum_{t=1}^{T} P \left( O, i_{t}=j | \overline{\lambda} \right) I \left( o_{t} = v_{k} \right)}{\sum_{t=1}^{T} P \left( O, i_{t}=j | \overline{\lambda} \right)} \\ & = \dfrac{ \sum_{t=1,o_{t}=v_{k}}^{T} \gamma_{t} \left( j \right)}{\sum_{t=1}^{T} \gamma_{t} \left( j \right)}\end{aligned}bj(k)=∑t=1TP(O,it=j∣λ)∑t=1TP(O,it=j∣λ)I(ot=vk)=∑t=1Tγt(j)∑t=1,ot=vkTγt(j)
Baum-Welch算法:
输入:观测数据O=(o1,o2,⋯,oT)O = \left( o_{1}, o_{2}, \cdots, o_{T} \right)O=(o1,o2,⋯,oT)输出:隐马尔科夫模型参数 初始化
对n=0n=0n=0,选取aij(0),bj(k)(0),πi(0)a_{ij}^{ \left( 0 \right) },b_{j} \left( k \right)^{\left( 0 \right)},\pi_{i}^{\left( 0 \right)}aij(0),bj(k)(0),πi(0),得到模型λ(0)=(aij(0),bj(k)(0),πi(0))\lambda^{\left( 0 \right)} = \left( a_{ij}^{ \left( 0 \right) },b_{j} \left( k \right)^{\left( 0 \right)},\pi_{i}^{\left( 0 \right)} \right)λ(0)=(aij(0),bj(k)(0),πi(0))递推
对n=1,2,⋯,n=1,2, \cdots,n=1,2,⋯,
aij(n+1)=∑t=1T−1ξt(i,j)∑t=1T−1γt(i)bj(k)(n+1)=∑t=1,ot=vkTγt(j)∑t=1Tγt(j)πi(n+1)=P(O,i1=i∣λ‾)P(O∣λ‾)\begin{aligned} \\ & a_{ij}^{\left( n+1 \right)} = \dfrac{\sum_{t=1}^{T-1} \xi_{t} \left( i,j \right) }{\sum_{t=1}^{T-1} \gamma_{t} \left( i \right)} \\ & b_{j} \left( k \right)^{\left( n+1 \right)} = \dfrac{ \sum_{t=1,o_{t}=v_{k}}^{T} \gamma_{t} \left( j \right)}{\sum_{t=1}^{T} \gamma_{t} \left( j \right)} \\ & \pi_{i}^{\left( n+1 \right)} = \dfrac{P \left( O, i_{1} = i | \overline{\lambda} \right)}{P \left( O | \overline{\lambda} \right)} \end{aligned}aij(n+1)=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)bj(k)(n+1)=∑t=1Tγt(j)∑t=1,ot=vkTγt(j)πi(n+1)=P(O∣λ)P(O,i1=i∣λ)
其中,右端各值按观测数据O=(o1,o2,⋯,oT)O = \left( o_{1}, o_{2}, \cdots, o_{T} \right)O=(o1,o2,⋯,oT)和模型λ(n)=(A(n),B(n),π(n))\lambda^{\left( n \right)} = \left( A^{\left( n \right)},B^{\left( n \right)},\pi^{\left( n \right)} \right)λ(n)=(A(n),B(n),π(n))计算。终止
得到模型λ(n+1)=(A(n+1),B(n+1),π(n+1))\lambda^{\left( n+1 \right)} = \left( A^{\left( n+1 \right)},B^{\left( n+1 \right)},\pi^{\left( n+1 \right)} \right)λ(n+1)=(A(n+1),B(n+1),π(n+1))
在时刻ttt状态为iii的所有单个路径(i1,i2,⋯,it)\left( i_{1}, i_{2}, \cdots, i_{t} \right)(i1,i2,⋯,it)中概率最大值
δt(i)=maxi1,i2,⋯,it−1P(it=i,it−1,⋯,i1,ot,⋯,o1∣λ)i=1,2,⋯,N\begin{aligned} \\ & \delta_{t} \left( i \right) = \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left(i_{t}=i, i_{t-1}, \cdots, i_{1}, o_{t}, \cdots, o_{1} | \lambda \right) \quad \quad \quad i = 1, 2, \cdots, N \end{aligned}δt(i)=i1,i2,⋯,it−1maxP(it=i,it−1,⋯,i1,ot,⋯,o1∣λ)i=1,2,⋯,N
得递推公式δt+1(i)=maxi1,i2,⋯,itP(it+1=i,it,⋯,i1,ot+1,⋯,o1∣λ)=max1≤j≤N[maxi1,i2,⋯,it−1P(it+1=i,it=j,it−1,⋯,i1,ot+1,ot,⋯,o1∣λ)]=max1≤j≤N[maxi1,i2,⋯,it−1P(it+1=i,it=j,it−1,⋯,i1,ot,ot−1,⋯,o1∣λ)P(ot+1∣it+1=i,λ)]=max1≤j≤N[maxi1,i2,⋯,it−1P(it=j,it−1,⋯,i1,ot,ot−1,⋯,o1∣λ)P(it+1=i∣it=j,λ)P(ot+1∣it+1=i,λ)]=max1≤j≤N[δt(j)aji]bi(ot+1)i=1,2,⋯,N\begin{aligned} \\ & \delta_{t+1} \left( i \right) = \max_{i_{1}, i_{2}, \cdots, i_{t}} P \left(i_{t+1}=i, i_{t}, \cdots, i_{1}, o_{t+1}, \cdots, o_{1} | \lambda \right) \\ & = \max_{1 \leq j \leq N} \left[ \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left( i_{t+1}=i, i_{t}=j, i_{t-1}, \cdots, i_{1}, o_{t+1}, o_{t}, \cdots, o_{1} | \lambda \right) \right] \\ & = \max_{1 \leq j \leq N} \left[ \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left( i_{t+1}=i, i_{t}=j, i_{t-1}, \cdots, i_{1}, o_{t}, o_{t-1}, \cdots, o_{1} | \lambda \right) P \left( o_{t+1} | i_{t+1}=i, \lambda \right)\right] \\ & = \max_{1 \leq j \leq N} \left[ \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left( i_{t}=j, i_{t-1}, \cdots, i_{1}, o_{t}, o_{t-1}, \cdots, o_{1} | \lambda \right) P \left( i_{t+1}=i | i_{t}=j, \lambda \right)P \left( o_{t+1} | i_{t+1}=i, \lambda \right)\right] \\ & = \max_{1 \leq j \leq N} \left[ \delta_{t} \left( j \right) a_{ji}\right] b_{i} \left( o_{t+1} \right)\quad \quad \quad i = 1, 2, \cdots, N \end{aligned}δt+1(i)=i1,i2,⋯,itmaxP(it+1=i,it,⋯,i1,ot+1,⋯,o1∣λ)=1≤j≤Nmax[i1,i2,⋯,it−1maxP(it+1=i,it=j,it−1,⋯,i1,ot+1,ot,⋯,o1∣λ)]=1≤j≤Nmax[i1,i2,⋯,it−1maxP(it+1=i,it=j,it−1,⋯,i1,ot,ot−1,⋯,o1∣λ)P(ot+1∣it+1=i,λ)]=1≤j≤Nmax[i1,i2,⋯,it−1maxP(it=j,it−1,⋯,i1,ot,ot−1,⋯,o1∣λ)P(it+1=i∣it=j,λ)P(ot+1∣it+1=i,λ)]=1≤j≤Nmax[δt(j)aji]bi(ot+1)i=1,2,⋯,N
在时刻ttt状态为iii的所有单个路径(i1,i2,⋯,it)\left( i_{1}, i_{2}, \cdots, i_{t} \right)(i1,i2,⋯,it)中概率最大值的路径的第t−1t-1t−1个结点ψt(i)=argmax1≤j≤N[δt−1(j)aji]i=1,2,⋯,N\begin{aligned} \\ & \psi_{t} \left( i \right) = \arg \max_{1 \leq j \leq N} \left[ \delta_{t-1} \left( j \right) a_{ji} \right] \quad \quad \quad i = 1, 2, \cdots, N \end{aligned}ψt(i)=arg1≤j≤Nmax[δt−1(j)aji]i=1,2,⋯,N
4、预测算法
维特比算法
维特比算法:
输入:模型λ=(A,B,π)\lambda = \left( A, B, \pi \right)λ=(A,B,π)和观测数据O=(o1,o2,⋯,oT)O = \left( o_{1}, o_{2}, \cdots, o_{T} \right)O=(o1,o2,⋯,oT)输出:最优路径I∗=(i1∗,i2∗,⋯,iT∗)I^{*} = \left( i_{1}^{*}, i_{2}^{*}, \cdots, i_{T}^{*} \right)I∗=(i1∗,i2∗,⋯,iT∗) 初始化
δ1(i)=πibi(o1)i=1,2,⋯,Nψ1(i)=0\begin{aligned} \\ & \delta_{1} \left( i \right) = \pi_{i} b_{i} \left( o_{1} \right) \quad \quad \quad i = 1, 2, \cdots, N \\ & \psi_{1} \left( i \right) = 0 \end{aligned}δ1(i)=πibi(o1)i=1,2,⋯,Nψ1(i)=0递推
对t=2,3,⋯,Tt=2,3, \cdots, Tt=2,3,⋯,Tδt(i)=max1≤j≤N[δt−1(j)aji]bi(ot)i=1,2,⋯,Nψt(i)=argmax1≤j≤N[δt−1(j)aji]i=1,2,⋯,N\begin{aligned} \\ & \delta_{t} \left( i \right) = \max_{1 \leq j \leq N} \left[ \delta_{t-1} \left( j \right) a_{ji}\right] b_{i} \left( o_{t} \right)\quad \quad \quad i = 1, 2, \cdots, N \\ & \psi_{t} \left( i \right) = \arg \max_{1 \leq j \leq N} \left[ \delta_{t-1} \left( j \right) a_{ji} \right] \quad \quad \quad i = 1, 2, \cdots, N \end{aligned}δt(i)=1≤j≤Nmax[δt−1(j)aji]bi(ot)i=1,2,⋯,Nψt(i)=arg1≤j≤Nmax[δt−1(j)aji]i=1,2,⋯,N终止P∗=max1≤j≤NδT(i)iT∗=argmax1≤j≤N[δT(i)]\begin{aligned} \\ & P^{*} = \max_{1 \leq j \leq N} \delta_{T} \left( i \right) \\ & i_{T}^{*} = \arg \max_{1 \leq j \leq N} \left[ \delta_{T} \left( i \right) \right] \end{aligned}P∗=1≤j≤NmaxδT(i)iT∗=arg1≤j≤Nmax[δT(i)]最优路径回溯
对t=T−1,T−2,⋯,1t=T-1,T-2, \cdots, 1t=T−1,T−2,⋯,1it∗=ψt+1(it+1∗)\begin{aligned} \\ & i_{t}^{*} = \psi_{t+1} \left( i_{t+1}^{*} \right) \end{aligned}it∗=ψt+1(it+1∗)
求得最优路径I∗=(i1∗,i2∗,⋯,iT∗)I^{*} = \left( i_{1}^{*}, i_{2}^{*}, \cdots, i_{T}^{*} \right)I∗=(i1∗,i2∗,⋯,iT∗)
5、概要总结
1.隐马尔可夫模型是关于时序的概率模型,描述由一个隐藏的马尔可夫链随机生成不可观测的状态的序列,再由各个状态随机生成一个观测而产生观测的序列的过程。
隐马尔可夫模型由初始状态概率向π\piπ、状态转移概率矩阵AAA和观测概率矩阵BBB决定。因此,隐马尔可夫模型可以写成λ=(A,B,π)\lambda=(A, B, \pi)λ=(A,B,π)。
隐马尔可夫模型是一个生成模型,表示状态序列和观测序列的联合分布,但是状态序列是隐藏的,不可观测的。
隐马尔可夫模型可以用于标注,这时状态对应着标记。标注问题是给定观测序列预测其对应的标记序列。
2.概率计算问题。给定模型λ=(A,B,π)\lambda=(A, B, \pi)λ=(A,B,π)和观测序列O=(o1,o2,…,oT)O=(o_1,o_2,…,o_T)O=(o1,o2,…,oT),计算在模型λ\lambdaλ下观测序列OOO出现的概率P(O∣λ)P(O|\lambda)P(O∣λ)。前向-后向算法是通过递推地计算前向-后向概率可以高效地进行隐马尔可夫模型的概率计算。
3.学习问题。已知观测序列O=(o1,o2,…,oT)O=(o_1,o_2,…,o_T)O=(o1,o2,…,oT),估计模型λ=(A,B,π)\lambda=(A, B, \pi)λ=(A,B,π)参数,使得在该模型下观测序列概率P(O∣λ)P(O|\lambda)P(O∣λ)最大。即用极大似然估计的方法估计参数。Baum-Welch算法,也就是EM算法可以高效地对隐马尔可夫模型进行训练。它是一种非监督学习算法。
4.预测问题。已知模型λ=(A,B,π)\lambda=(A, B, \pi)λ=(A,B,π)和观测序列O=(o1,o2,…,oT)O=(o_1,o_2,…,o_T)O=(o1,o2,…,oT),求对给定观测序列条件概率P(I∣O)P(I|O)P(I∣O)最大的状态序列I=(i1,i2,…,iT)I=(i_1,i_2,…,i_T)I=(i1,i2,…,iT)。维特比算法应用动态规划高效地求解最优路径,即概率最大的状态序列。