700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > 【统计学习方法】第10章 隐马尔可夫模型

【统计学习方法】第10章 隐马尔可夫模型

时间:2018-12-10 09:31:44

相关推荐

【统计学习方法】第10章 隐马尔可夫模型

隐马尔可夫模型(hidden Markov model,HMM)是可用于标注问题的统计学习模型,描述由隐藏的马尔可夫链随机生成观测序列的过程,属于生成模型。

1、隐马尔可夫模型的基本概念

隐马尔可夫模型的定义

隐马尔可夫模型是关于时序的概率模型,描述由一个隐藏的马尔可夫链随机生成不可观测的状态随机序列,再由各个状态生成一个观测而产生观测随机序列的过程.隐藏的马尔可夫链随机生成的状态的序列,称为状态序列(state sequence);每个状态生成一个观测,而由此产生的观测的随机序列,称为观测序列(observation sequence)。序列的每一个位置又可以看作是一个时刻。

状态集合Q={q1,q2,…,qN}∣Q∣=N\begin{aligned} & Q=\left\{q_{1},q_{2},\ldots ,q_{N}\right\} \quad \left| Q\right| =N \end{aligned}​Q={q1​,q2​,…,qN​}∣Q∣=N​

观测集合V={v1,v2,…,vM}∣V∣=M\begin{aligned} & V=\left\{v_{1},v_{2},\ldots ,v_{M}\right\} \quad \left| V\right| =M \end{aligned}​V={v1​,v2​,…,vM​}∣V∣=M​

状态序列I={i1,i2,…,it,…,iT}it∈Q(t=1,2,…,T)\begin{aligned} & I=\left\{i_{1},i_{2},\ldots ,i_{t},\ldots,i_{T}\right\} \quad i_{t}\in Q \quad \left(t=1,2,\ldots,T \right)\end{aligned}​I={i1​,i2​,…,it​,…,iT​}it​∈Q(t=1,2,…,T)​

观测序列O={o1,o2,…,ot,…,oT}ot∈V(t=1,2,…,T)\begin{aligned} & O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\} \quad o_{t}\in V \quad \left(t=1,2,\ldots,T \right)\end{aligned}​O={o1​,o2​,…,ot​,…,oT​}ot​∈V(t=1,2,…,T)​

状态转移矩阵A=[aij]N×N\begin{aligned} & A=\left[a_{ij}\right]_{N\times N} \end{aligned}​A=[aij​]N×N​​

在ttt时刻处于状态qiq_{i}qi​的条件下,在t+1t+1t+1时刻转移到状态qjq_{j}qj​的概率aij=P(it+1=qj∣it=qi)(i=1,2,…,N)(j=1,2,…,M)\begin{aligned} & a_{ij}= P\left( i_{t+1}=q_{j}|i_{t}=q_{i}\right) \quad \left(i=1,2,\ldots,N \right) \quad \left(j=1,2,\ldots,M \right)\end{aligned}​aij​=P(it+1​=qj​∣it​=qi​)(i=1,2,…,N)(j=1,2,…,M)​

观测概率矩阵B=[bj(k)]N×M\begin{aligned} & B=\left[b_{j}\left(k\right)\right]_{N\times M} \end{aligned}​B=[bj​(k)]N×M​​

在ttt时刻处于状态qiq_{i}qi​的条件下,生成观测vkv_{k}vk​的概率bj(k)=P(ot=vk∣it=qj)(k=1,2,…,M)(j=1,2,…,N)\begin{aligned} & b_{j}\left(k\right)= P\left( o_{t}=v_{k}|i_{t}=q_{j}\right) \quad \left(k=1,2,\ldots,M \right) \quad \left(j=1,2,\ldots,N \right)\end{aligned}​bj​(k)=P(ot​=vk​∣it​=qj​)(k=1,2,…,M)(j=1,2,…,N)​

初始概率向量π=(πi)\begin{aligned} & \pi =\left( \pi _{i}\right) \end{aligned}​π=(πi​)​

在时刻t=1t=1t=1处于状态qiq_{i}qi​的概率πi=P(i1=qi)(i=1,2,…,N)\begin{aligned} & \pi_{i} =P\left( i_{1}=q_{i}\right) \quad \left(i=1,2,\ldots,N \right) \end{aligned}​πi​=P(i1​=qi​)(i=1,2,…,N)​

隐马尔科夫模型λ=(A,B.π)\begin{aligned} & \lambda =\left( A,B.\pi \right) \end{aligned}​λ=(A,B.π)​

隐马尔科夫模型基本假设:

齐次马尔科夫性假设:在任意时刻ttt的状态只依赖于时刻t−1t-1t−1的状态。P(it∣it−1,ot−1,…,i1,o1)=P(it∣it−1)(t=1,2,…,T)\begin{aligned} & P\left( i_{t}|i_{t-1},o_{t-1},\ldots,i_{1},o_{1}\right)=P\left(i_{t}|i_{t-1}\right) \quad \left(t=1,2,\ldots,T\right) \end{aligned}​P(it​∣it−1​,ot−1​,…,i1​,o1​)=P(it​∣it−1​)(t=1,2,…,T)​观测独立性假设:任意时刻ttt的观测只依赖于时刻ttt的状态。P(ot∣iT,oT,iT−1,oT−1,…,it+1,ot+1,it,it−1,ot−1,…,i1,o1)=P(ot∣it)(t=1,2,…,T)\begin{aligned} & P\left( o_{t}|i_{T},o_{T},i_{T-1},o_{T-1},\ldots,i_{t+1},o_{t+1},i_{t},i_{t-1},o_{t-1},\ldots,i_{1},o_{1}\right)=P\left(o_{t}|i_{t}\right) \quad \left(t=1,2,\ldots,T\right) \end{aligned}​P(ot​∣iT​,oT​,iT−1​,oT−1​,…,it+1​,ot+1​,it​,it−1​,ot−1​,…,i1​,o1​)=P(ot​∣it​)(t=1,2,…,T)​

观测序列的生成过程

观测序列生成算法:

输入:隐马尔科夫模型λ=(A,B.π)\lambda =\left( A,B.\pi \right)λ=(A,B.π),观测序列长度TTT;输出:观测序列O={o1,o2,…,ot,…,oT}O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\}O={o1​,o2​,…,ot​,…,oT​}; 由初始概率向量π\piπ产生状态i1i_{1}i1​;t=1t=1t=1;由状态iti_{t}it​的观测概率分布bj(k)b_{j}\left(k\right)bj​(k)生成oto_{t}ot​;由状态iti_{t}it​的状态转移概率分布aitit+1a_{i_{t}i_{t+1}}ait​it+1​​生成状态it+1(it+1=1,2,…,N)i_{t+1} \quad \left(i_{t+1}=1,2,\ldots,N\right)it+1​(it+1​=1,2,…,N);t=t+1t=t+1t=t+1;如果KaTeX parse error: Expected 'EOF', got '&' at position 2: t&̲lt;T,转至3.;否则,结束。

隐马尔可夫模型的3个基本问题

隐马尔科夫模型的3个基本问题:

概率计算:已知λ=(A,B.π)\lambda =\left( A,B.\pi \right)λ=(A,B.π)和O={o1,o2,…,ot,…,oT}O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\}O={o1​,o2​,…,ot​,…,oT​},计算P(O∣λ)P\left(O| \lambda \right)P(O∣λ)学习:已知O={o1,o2,…,ot,…,oT}O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\}O={o1​,o2​,…,ot​,…,oT​},计算 λ∗=arg⁡max⁡P(O∣λ)\lambda^* =\arg \max P\left( O|\lambda \right)λ∗=argmaxP(O∣λ)预测(编码):已知λ=(A,B.π)\lambda =\left( A,B.\pi \right)λ=(A,B.π)和O={o1,o2,…,ot,…,oT}O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\}O={o1​,o2​,…,ot​,…,oT​},计算 I∗=arg⁡max⁡P(I∣Oλ)I^* =\arg \max P\left( I|O \lambda \right)I∗=argmaxP(I∣Oλ)

前向概率αt(i)=P(o1,o2,…,ot,it=qi∣λ)\begin{aligned} & \alpha _{t}\left( i\right) =P\left(o_{1},o_{2},\ldots ,o_{t}, i_{t}=q_{i}| \lambda \right) \end{aligned}​αt​(i)=P(o1​,o2​,…,ot​,it​=qi​∣λ)​

给定模型λ\lambdaλ,时刻ttt部分观测序列为o1,o2,…,oto_{1},o_{2},\ldots ,o_{t}o1​,o2​,…,ot​且状态为qiq_{i}qi​的概率。

2、概率计算算法

前向算法

前向概率递推计算αt(i)=P(o1,o2,…,ot,it=qi∣λ)=P(it=qi,o1t)=∑j=1NP(it−1=qj,it=qi,o1t−1,ot)=∑j=1NP(it=qi,ot∣it−1=qj,o1t−1)⋅P(it−1=qj,o1t−1)=∑j=1NP(it=qi,ot∣it−1=qj)⋅αt−1(j)=∑j=1NP(ot∣it=qi,it−1=qj)⋅P(it=qi∣it−1=qj)⋅αt−1(j)=∑j=1Nbi(ot)⋅aji⋅αt−1(j)\begin{aligned} & \alpha _{t}\left( i\right) =P\left(o_{1},o_{2},\ldots ,o_{t}, i_{t}=q_{i}| \lambda \right)=P\left(i_{t}=q_{i},o_{1}^t \right) \\ & =\sum _{j=1}^{N}P\left(i_{t-1}=q_{j},i_{t}=q_{i},o_{1}^{t-1},o_{t}\right) \\ & =\sum _{j=1}^{N}P\left(i_{t}=q_{i},o_{t}|i_{t-1}=q_{j},o_{1}^{t-1}\right)\cdot P\left(i_{t-1}=q_{j},o_{1}^{t-1} \right) \\ & =\sum _{j=1}^{N}P\left(i_{t}=q_{i},o_{t}|i_{t-1}=q_{j}\right)\cdot \alpha _{t-1}\left( j\right)\\ & =\sum _{j=1}^{N}P\left(o_{t}|i_{t}=q_{i},i_{t-1}=q_{j}\right)\cdot P\left(i_{t}=q_{i}|i_{t-1}=q_{j}\right)\cdot \alpha _{t-1}\left( j\right) \\ & =\sum _{j=1}^{N}b_{i}\left(o_{t}\right)\cdot a_{ji}\cdot \alpha _{t-1}\left( j\right)\end{aligned}​αt​(i)=P(o1​,o2​,…,ot​,it​=qi​∣λ)=P(it​=qi​,o1t​)=j=1∑N​P(it−1​=qj​,it​=qi​,o1t−1​,ot​)=j=1∑N​P(it​=qi​,ot​∣it−1​=qj​,o1t−1​)⋅P(it−1​=qj​,o1t−1​)=j=1∑N​P(it​=qi​,ot​∣it−1​=qj​)⋅αt−1​(j)=j=1∑N​P(ot​∣it​=qi​,it−1​=qj​)⋅P(it​=qi​∣it−1​=qj​)⋅αt−1​(j)=j=1∑N​bi​(ot​)⋅aji​⋅αt−1​(j)​

概率计算P(O∣λ)=P(o1T∣λ)=∑i=1NP(o1T,iT=qi)=∑i=1NαT(i)\begin{aligned} & P\left(O| \lambda \right) =P\left(o_{1}^{T}| \lambda\right) \\ & = \sum_{i=1}^{N}P\left(o_{1}^{T},i_{T}=q_{i}\right)\\ & = \sum _{i=1}^{N}\alpha _{T}\left( i\right)\end{aligned}​P(O∣λ)=P(o1T​∣λ)=i=1∑N​P(o1T​,iT​=qi​)=i=1∑N​αT​(i)​

观测序列概率计算的前向算法:

输入:隐马尔科夫模型λ\lambdaλ,观测序列OOO;输出:观测序列概率P(O∣λ)P\left(O| \lambda \right)P(O∣λ); 初值α1(i)=πibi(o1)(t=1,2,…,N)\begin{aligned} & \alpha _{1}\left( i\right)= \pi_{i}b_{i}\left(o_{1}\right) \quad \left(t=1,2,\ldots,N\right) \end{aligned}​α1​(i)=πi​bi​(o1​)(t=1,2,…,N)​递推 对t=1,2,…,T−1t=1,2,\ldots,T-1t=1,2,…,T−1αt+1(i)=∑j=1Nbi(ot+1)⋅aji⋅αt(j)(t=1,2,…,N)\begin{aligned} & \alpha _{t+1}\left( i\right) =\sum _{j=1}^{N}b_{i}\left(o_{t+1}\right)\cdot a_{ji}\cdot \alpha _{t}\left( j\right) \quad \left(t=1,2,\ldots,N\right) \end{aligned}​αt+1​(i)=j=1∑N​bi​(ot+1​)⋅aji​⋅αt​(j)(t=1,2,…,N)​终止P(O∣λ)=∑j=1NαT(i)\begin{aligned} & P\left(O| \lambda \right)= \sum _{j=1}^{N}\alpha _{T}\left( i\right)\end{aligned}​P(O∣λ)=j=1∑N​αT​(i)​

后向算法

后向概率βt(i)=P(ot+1,ot+2,…,oT∣it=qiλ)\begin{aligned} & \beta_{t}\left( i\right) =P\left(o_{t+1},o_{t+2},\ldots ,o_{T}| i_{t}=q_{i} \lambda \right) \end{aligned}​βt​(i)=P(ot+1​,ot+2​,…,oT​∣it​=qi​λ)​

给定模型λ\lambdaλ,时刻ttt状态为qiq_{i}qi​的条件下,从时刻t+1t+1t+1到时刻TTT的部分观测序列为ot+1,ot+2,…,oTo_{t+1},o_{t+2},\ldots ,o_{T}ot+1​,ot+2​,…,oT​的概率。

后向概率递推计算βt(i)=P(ot+1,ot+2,…,oT∣it=qi,λ)=P(ot+1T∣it=qi)=P(ot+1T,it=qi)P(it=qi)=∑j=1NP(ot+1T,it=qi,it+1=qj)P(it=qi)=∑j=1NP(ot+1T∣it=qi,it+1=qj)⋅P(it=qi,it+1=qj)P(it=qi)=∑j=1NP(ot+1T∣it+1=qj)⋅P(it+1=qj∣it=qi)⋅P(it=qi)P(it=qi)=∑j=1NP(ot+2N,ot+1∣it+1=qj)⋅aij=∑j=1NP(ot+2T∣it+1=qj)⋅P(ot+1∣it+1=qj)⋅aij=∑j=1Nβt+1(j)⋅bj(ot+1)⋅aij\begin{aligned} & \beta _{t}\left( i\right) =P\left(o_{t+1},o_{t+2},\ldots ,o_{T}| i_{t}=q_{i}, \lambda \right)=P\left(o_{t+1}^T |i_{t}=q_{i}\right) \\ & =\dfrac {P\left(o_{t+1}^{T}, i_{t}=q_{i}\right)} {P\left(i_{t}=q_{i}\right)}\\ & =\dfrac {\sum_{j=1}^{N} P\left(o_{t+1}^{T},i_{t}=q_{i},i_{t+1}=q_{j}\right)}{P\left(i_{t}=q_{i}\right)}\\ & =\sum_{j=1}^{N} \dfrac {P\left(o_{t+1}^{T}|i_{t}=q_{i},i_{t+1}=q_{j}\right) \cdot P\left(i_{t}=q_{i},i_{t+1}=q_{j} \right)}{P\left(i_{t}=q_{i}\right)} \\ & = \sum_{j=1}^{N} P\left(o_{t+1}^{T}|i_{t+1}=q_{j}\right) \cdot \dfrac {P\left(i_{t+1}=q_{j}|i_{t}=q_{i}\right) \cdot P\left(i_{t}=q_{i} \right)}{P\left(i_{t}=q_{i} \right)} \\ & = \sum_{j=1}^{N} P\left(o_{t+2}^{N},o_{t+1}|i_{t+1}=q_{j}\right) \cdot a_{ij} \\ & = \sum_{j=1}^{N} P\left(o_{t+2}^{T}|i_{t+1}=q_{j}\right) \cdot P\left(o_{t+1}|i_{t+1}=q_{j}\right) \cdot a_{ij} \\ & = \sum_{j=1}^{N} \beta_{t+1}\left(j\right) \cdot b_{j}\left(o_{t+1}\right) \cdot a_{ij}\end{aligned}​βt​(i)=P(ot+1​,ot+2​,…,oT​∣it​=qi​,λ)=P(ot+1T​∣it​=qi​)=P(it​=qi​)P(ot+1T​,it​=qi​)​=P(it​=qi​)∑j=1N​P(ot+1T​,it​=qi​,it+1​=qj​)​=j=1∑N​P(it​=qi​)P(ot+1T​∣it​=qi​,it+1​=qj​)⋅P(it​=qi​,it+1​=qj​)​=j=1∑N​P(ot+1T​∣it+1​=qj​)⋅P(it​=qi​)P(it+1​=qj​∣it​=qi​)⋅P(it​=qi​)​=j=1∑N​P(ot+2N​,ot+1​∣it+1​=qj​)⋅aij​=j=1∑N​P(ot+2T​∣it+1​=qj​)⋅P(ot+1​∣it+1​=qj​)⋅aij​=j=1∑N​βt+1​(j)⋅bj​(ot+1​)⋅aij​​

概率计算P(O∣λ)=P(o1T∣λ)=∑i=1NP(o1T,i1=qi)=∑i=1NP(i1=qi)⋅P(o1∣i1=qi)⋅P(o2T∣i1=qi)=∑i=1Nπibi(o1)β1(i)\begin{aligned} & P\left(O| \lambda \right) =P\left(o_{1}^{T}| \lambda\right) \\ & = \sum_{i=1}^{N}P\left(o_{1}^{T},i_{1}=q_{i}\right)\\ & = \sum_{i=1}^{N}P\left(i_{1}=q_{i}\right) \cdot P\left(o_{1}|i_{1}=q_{i}\right)\cdot P\left(o_{2}^{T}|i_{1}=q_{i}\right) \\ & = \sum_{i=1}^{N} \pi_{i} b_{i}\left(o_{1}\right) \beta_{1}\left(i\right)\end{aligned}​P(O∣λ)=P(o1T​∣λ)=i=1∑N​P(o1T​,i1​=qi​)=i=1∑N​P(i1​=qi​)⋅P(o1​∣i1​=qi​)⋅P(o2T​∣i1​=qi​)=i=1∑N​πi​bi​(o1​)β1​(i)​

观测序列概率计算的后向算法:

输入:隐马尔科夫模型λ\lambdaλ,观测序列OOO;输出:观测序列概率P(O∣λ)P\left(O| \lambda \right)P(O∣λ); 初值βT(i)=1(t=1,2,…,N)\begin{aligned} & \beta_{T}\left( i\right)= 1 \quad \left(t=1,2,\ldots,N\right) \end{aligned}​βT​(i)=1(t=1,2,…,N)​递推 对t=T−1,T−2,…,1t=T-1,T-2,\ldots,1t=T−1,T−2,…,1βt(i)=∑j=1Nβt+1(j)⋅bj(ot+1)⋅aij(t=1,2,…,N)\begin{aligned} & \beta_{t}\left( i\right) =\sum_{j=1}^{N} \beta_{t+1}\left(j\right) \cdot b_{j}\left(o_{t+1}\right) \cdot a_{ij} \quad \left(t=1,2,\ldots,N\right) \end{aligned}​βt​(i)=j=1∑N​βt+1​(j)⋅bj​(ot+1​)⋅aij​(t=1,2,…,N)​终止P(O∣λ)=∑j=1Nπibi(o1)β1(i)\begin{aligned} & P\left(O| \lambda \right)= \sum _{j=1}^{N}\pi_{i} b_{i}\left(o_{1}\right)\beta _{1}\left( i\right) \end{aligned}​P(O∣λ)=j=1∑N​πi​bi​(o1​)β1​(i)​

P(O∣λ)P \left( O | \lambda \right)P(O∣λ)的前向概率、后向概率的表示

P(O∣λ)=P(o1T)=∑i=1N∑j=1NP(o1t,ot+1T,it=qi,it+1=qj)=∑i=1N∑j=1NP(o1t,it=qi,it+1=qj)P(ot+1T∣it+1=qj)=∑i=1N∑j=1NP(o1t,it=qi)P(it+1=qj∣it=qi)P(ot+1T∣it+1=qj)=∑i=1N∑j=1NP(o1t,it=qi)P(it+1=qj∣it=qi)P(ot+1∣it+1=qj)P(ot+2T∣it+1=qj)=∑i=1N∑j=1Nαt(i)aijbj(ot+1)βt+1(j)t=1,2,⋯,T−1\begin{aligned} & P \left( O | \lambda \right) = P \left( o_{1}^{T} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, o_{t+1}^{T}, i_{t}=q_{i}, i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i}, i_{t+1}=q_{j} \right) P \left( o_{t+1}^{T} | i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( i_{t+1}=q_{j} | i_{t}=q_{i} \right) P \left( o_{t+1}^{T} | i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( i_{t+1}=q_{j} | i_{t}=q_{i} \right) P \left( o_{t+1} | i_{t+1}=q_{j} \right) P \left( o_{t+2}^{T} | i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{t} \left( i \right) a_{ij} b_{j} \left( o_{t+1} \right) \beta_{t+1} \left( j \right) \quad \quad \quad t=1, 2, \cdots, T-1\end{aligned}​P(O∣λ)=P(o1T​)=i=1∑N​j=1∑N​P(o1t​,ot+1T​,it​=qi​,it+1​=qj​)=i=1∑N​j=1∑N​P(o1t​,it​=qi​,it+1​=qj​)P(ot+1T​∣it+1​=qj​)=i=1∑N​j=1∑N​P(o1t​,it​=qi​)P(it+1​=qj​∣it​=qi​)P(ot+1T​∣it+1​=qj​)=i=1∑N​j=1∑N​P(o1t​,it​=qi​)P(it+1​=qj​∣it​=qi​)P(ot+1​∣it+1​=qj​)P(ot+2T​∣it+1​=qj​)=i=1∑N​j=1∑N​αt​(i)aij​bj​(ot+1​)βt+1​(j)t=1,2,⋯,T−1​

一些概率与期望值的计算

给定模型λ\lambdaλ和观测OOO,在时刻ttt处于状态qiq_{i}qi​的概率γt(i)=P(it=qi∣O,λ)=P(it=qi,O∣λ)P(O∣λ)=P(it=qi,O∣λ)∑j=1N(it=qi,O∣λ)=P(o1t,it=qi)P(ot+1T∣it=qi)∑j=1NP(o1t,it=qi)P(ot+1T∣it=qi)=αt(i)βt(i)∑j=1Nαt(i)βt(i)\begin{aligned} \\ & \gamma_{t} \left( i \right) = P \left( i_{t}=q_{i} | O, \lambda \right) \\ & = \dfrac{ P \left( i_{t}=q_{i}, O | \lambda \right) } { P \left( O | \lambda \right) } \\ & = \dfrac{ P \left( i_{t}=q_{i}, O | \lambda \right) } { \sum_{j=1}^{N} \left( i_{t}=q_{i}, O | \lambda \right) } \\ & = \dfrac{ P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( o_{t+1}^{T}| i_{t}=q_{i} \right) } { \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( o_{t+1}^{T}| i_{t}=q_{i} \right) } \\ & = \dfrac{ \alpha_{t} \left( i \right) \beta_{t} \left( i \right)} { \sum_{j=1}^{N} \alpha_{t} \left( i \right) \beta_{t} \left( i \right) }\end{aligned}​γt​(i)=P(it​=qi​∣O,λ)=P(O∣λ)P(it​=qi​,O∣λ)​=∑j=1N​(it​=qi​,O∣λ)P(it​=qi​,O∣λ)​=∑j=1N​P(o1t​,it​=qi​)P(ot+1T​∣it​=qi​)P(o1t​,it​=qi​)P(ot+1T​∣it​=qi​)​=∑j=1N​αt​(i)βt​(i)αt​(i)βt​(i)​​

给定模型λ\lambdaλ和观测OOO,在时刻ttt处于状态qiq_{i}qi​且在时刻t+1t+1t+1处于状态qjq_{j}qj​的概率

ξt(i,j)=P(it=qi,it+1=qj∣O,λ)=P(it=qi,it+1=qj,O∣λ)P(O∣λ)=P(it=qi,it+1=qj,O∣λ)∑i=1N∑j=1NP(it=qi,it+1=qj,O∣λ)=αt(i)aijbj(ot+1)βt+1(j)∑i=1N∑j=1Nαt(i)aijbj(ot+1)βt+1(j)\begin{aligned} \\ & \xi_{t} \left( i,j \right) = P \left( i_{t}=q_{i}, i_{t+1}=q_{j} | O ,\lambda \right) \\ & = \dfrac{ P \left( i_{t}=q_{i}, i_{t+1}=q_{j},O | \lambda \right) } { P \left( O | \lambda \right) } \\ & = \dfrac{ P \left( i_{t}=q_{i}, i_{t+1}=q_{j}, O | \lambda \right) } { \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( i_{t}=q_{i}, i_{t+1}=q_{j}, O|\lambda \right) } \\ & = \dfrac{ \alpha_{t} \left( i \right) a_{ij} b_{j} \left( o_{t+1} \right) \beta_{t+1} \left( j \right) } { \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{t} \left( i \right) a_{ij} b_{j} \left( o_{t+1} \right) \beta_{t+1} \left( j \right)}\end{aligned}​ξt​(i,j)=P(it​=qi​,it+1​=qj​∣O,λ)=P(O∣λ)P(it​=qi​,it+1​=qj​,O∣λ)​=∑i=1N​∑j=1N​P(it​=qi​,it+1​=qj​,O∣λ)P(it​=qi​,it+1​=qj​,O∣λ)​=∑i=1N​∑j=1N​αt​(i)aij​bj​(ot+1​)βt+1​(j)αt​(i)aij​bj​(ot+1​)βt+1​(j)​​

在观测OOO下状态iii出现的期望∑t=1Tγt(i)=∑t=1TP(it=qi∣O,λ)\begin{aligned} & \sum_{t=1}^{T} \gamma_{t} \left( i \right) = \sum_{t=1}^{T} P \left( i_{t}=q_{i} | O, \lambda \right) \end{aligned}​t=1∑T​γt​(i)=t=1∑T​P(it​=qi​∣O,λ)​

在观测OOO下由状态iii转移的期望∑t=1T-1γt(i)=∑t=1T-1P(it=qi∣O,λ)\begin{aligned} & \sum_{t=1}^{T-1} \gamma_{t} \left( i \right) = \sum_{t=1}^{T-1} P \left( i_{t}=q_{i} | O, \lambda \right) \end{aligned}​t=1∑T-1​γt​(i)=t=1∑T-1​P(it​=qi​∣O,λ)​

在观测OOO下由状态iii转移到状态jjj的期望∑t=1T-1ξt(i,j)=∑t=1T-1P(it=qi,it+1=qj∣O,λ)\begin{aligned} & \sum_{t=1}^{T-1} \xi_{t} \left( i,j \right) = \sum_{t=1}^{T-1} P \left( i_{t}=q_{i}, i_{t+1}=q_{j} | O, \lambda \right) \end{aligned}​t=1∑T-1​ξt​(i,j)=t=1∑T-1​P(it​=qi​,it+1​=qj​∣O,λ)​

3、学习算法

Baum-Welch 算法

将观测序列作为观测数据OOO,将状态序列作为隐数据III,则应马尔科夫模型是含有隐变量的概率模型

P(O∣λ)=∑IP(O∣I,λ)P(I∣λ)\begin{aligned} & P \left( O | \lambda \right) = \sum_{I} P \left( O | I, \lambda \right) P \left( I | \lambda \right)\end{aligned}​P(O∣λ)=I∑​P(O∣I,λ)P(I∣λ)​

完全数据(O,I)=(o1,o2,⋯,oT,i1,i2,⋯,oT)\begin{aligned} & \left( O, I \right) = \left(o_{1}, o_{2}, \cdots, o_{T}, i_{1}, i_{2}, \cdots, o_{T} \right)\end{aligned}​(O,I)=(o1​,o2​,⋯,oT​,i1​,i2​,⋯,oT​)​

完全数据的对数似然函数log⁡P(O,I∣λ)\begin{aligned} & \log P \left( O, I | \lambda \right) \end{aligned}​logP(O,I∣λ)​

Q(λ,λ‾)Q \left( \lambda, \overline{\lambda} \right)Q(λ,λ)函数

Q(λ,λ‾)=EI[log⁡P(O,I∣λ)∣O,λ‾]=∑Ilog⁡P(O,I∣λ)P(I∣O,λ‾)=∑Ilog⁡P(O,I∣λ)P(O,I∣λ‾)P(O∣λ‾)\begin{aligned} \\& Q \left( \lambda, \overline{\lambda} \right) = E_{I} \left[ \log P \left( O, I | \lambda \right) | O, \overline{\lambda} \right] \\ & = \sum_{I} \log P \left( O, I | \lambda \right) P \left( I | O, \overline{\lambda} \right) \\ & = \sum_{I} \log \dfrac{P \left( O, I | \lambda \right) P \left( O, I | \overline{\lambda} \right) }{P \left( O | \overline{\lambda} \right)}\end{aligned}​Q(λ,λ)=EI​[logP(O,I∣λ)∣O,λ]=I∑​logP(O,I∣λ)P(I∣O,λ)=I∑​logP(O∣λ)P(O,I∣λ)P(O,I∣λ)​​

其中,λ‾\overline{\lambda}λ是隐马尔科夫模型参数的当前估计值,λ\lambdaλ是隐马尔科夫模型参数。

由于对最大化Q(λ,λ‾)Q \left( \lambda, \overline{\lambda} \right)Q(λ,λ)函数,P(O∣λ‾)P \left( O | \overline{\lambda} \right)P(O∣λ)为常数因子,

以及P(O,I∣λ)=πi1bi1(o1)ai1i2bi2(o2)⋯aiT−1iTbT(oT)\begin{aligned} & P \left( O, I | \lambda \right) = \pi_{i_{1}} b_{i_{1}} \left( o_{1} \right) a_{i_{1}i_{2}} b_{i_{2}} \left( o_{2} \right) \cdots a_{i_{T-1}i_{T}}b_{T}\left( o_{T} \right)\end{aligned}​P(O,I∣λ)=πi1​​bi1​​(o1​)ai1​i2​​bi2​​(o2​)⋯aiT−1​iT​​bT​(oT​)​

所以求Q(λ,λ‾)Q \left( \lambda, \overline{\lambda} \right)Q(λ,λ)函数对λ\lambdaλ的最大λ=arg⁡max⁡Q(λ,λ‾)⇔arg⁡max⁡∑Ilog⁡P(O,I∣λ)P(O,I∣λ‾)=∑Ilog⁡πi1P(O,I∣λ‾)+∑I(∑t=1T−1log⁡aitit+1)P(O,I∣λ‾)+∑I(∑t=1Tlog⁡bit(ot))P(O,I∣λ‾)\begin{aligned} & \lambda = \arg \max{Q \left( \lambda, \overline{\lambda} \right) }\Leftrightarrow \arg\max \sum_{I} \log P \left( O, I | \lambda \right) P \left( O, I | \overline{\lambda} \right) \\ & = \sum_{I} \log \pi_{i_{1}} P \left( O, I | \overline{\lambda} \right) + \sum_{I} \left( \sum_{t=1}^{T-1} \log a_{i_{t}i_{t+1}} \right) P \left( O, I | \overline{\lambda} \right) + \sum_{I} \left( \sum_{t=1}^{T} \log b_{i_{t}} \left( o_{t} \right) \right) P \left( O, I | \overline{\lambda} \right)\end{aligned}​λ=argmaxQ(λ,λ)⇔argmaxI∑​logP(O,I∣λ)P(O,I∣λ)=I∑​logπi1​​P(O,I∣λ)+I∑​(t=1∑T−1​logait​it+1​​)P(O,I∣λ)+I∑​(t=1∑T​logbit​​(ot​))P(O,I∣λ)​

对三项分别进行极大化:

max⁡∑Ilog⁡πi1P(O,I∣λ‾)=∑i=1Nlog⁡πi1P(O,i1=i∣λ‾)s.t.∑i=1Nπi=1\begin{aligned} & \max \sum_{I} \log \pi_{i_{1}} P \left( O, I | \overline{\lambda} \right) = \sum_{i=1}^{N} \log \pi_{i_{1}} P \left( O, i_{1}=i | \overline{\lambda} \right) \\ & s.t. \sum_{i=1}^{N} \pi_{i} = 1 \end{aligned}​maxI∑​logπi1​​P(O,I∣λ)=i=1∑N​logπi1​​P(O,i1​=i∣λ)s.t.i=1∑N​πi​=1​

构造拉格朗日函数,对其求偏导,令结果为0∂∂πi[∑i=1Nlog⁡πi1P(O,i1=i∣λ‾)+γ(∑i=1Nπi−1)]=0\begin{aligned} & \dfrac{\partial}{\partial \pi_{i}} \left[ \sum_{i=1}^{N} \log \pi_{i_{1}} P \left( O, i_{1}=i | \overline{\lambda} \right) + \gamma \left( \sum_{i=1}^{N} \pi_{i} - 1 \right) \right] = 0\end{aligned}​∂πi​∂​[i=1∑N​logπi1​​P(O,i1​=i∣λ)+γ(i=1∑N​πi​−1)]=0​

得P(O,i1=i∣λ‾)+γπi=0∑i=1N[P(O,i1=i∣λ‾)+γπi]=0∑i=1NP(O,i1=i∣λ‾)+γ∑i=1Nπi=0P(O∣λ‾)+γ=0γ=−P(O∣λ‾)\begin{aligned} & P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \pi_{i} = 0 \\ & \sum_{i=1}^{N} \left[ P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \pi_{i} \right] = 0 \\ & \sum_{i=1}^{N} P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \sum_{i=1}^{N} \pi_{i} = 0 \\ & P \left( O | \overline{\lambda} \right) + \gamma = 0 \\ & \gamma = - P \left( O | \overline{\lambda} \right)\end{aligned}​P(O,i1​=i∣λ)+γπi​=0i=1∑N​[P(O,i1​=i∣λ)+γπi​]=0i=1∑N​P(O,i1​=i∣λ)+γi=1∑N​πi​=0P(O∣λ)+γ=0γ=−P(O∣λ)​

代入P(O,i1=i∣λ‾)+γπi=0P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \pi_{i} = 0P(O,i1​=i∣λ)+γπi​=0,得πi=P(O,i1=i∣λ‾)P(O∣λ‾)=γ1(i)\begin{aligned} & \pi_{i} = \dfrac{P \left( O, i_{1} = i | \overline{\lambda} \right)}{P \left( O | \overline{\lambda} \right)} \\ & = \gamma_{1} \left( i \right) \end{aligned}​πi​=P(O∣λ)P(O,i1​=i∣λ)​=γ1​(i)​

max⁡∑I(∑t=1T−1log⁡aitit+1)P(O,I∣λ‾)=∑i=1N∑j=1N∑t=1T−1log⁡aijP(O,it=i,it+1=j∣λ‾)s.t.∑j=1Naij=1\begin{aligned} \\ & \max \sum_{I} \left( \sum_{t=1}^{T-1} \log a_{i_{t}i_{t+1}} \right) P \left( O, I | \overline{\lambda} \right) = \sum_{i=1}^{N} \sum_{j=1}^{N} \sum_{t=1}^{T-1} \log a_{ij} P \left( O, i_{t}=i, i_{t+1}=j | \overline{\lambda} \right) \\ & s.t. \sum_{j=1}^{N} a_{ij} = 1 \end{aligned}​maxI∑​(t=1∑T−1​logait​it+1​​)P(O,I∣λ)=i=1∑N​j=1∑N​t=1∑T−1​logaij​P(O,it​=i,it+1​=j∣λ)s.t.j=1∑N​aij​=1​得aij=∑t=1T−1P(O,it=i,it+1=j∣λ‾)∑t=1T−1P(O,it=i∣λ‾)=∑t=1T−1ξt(i,j)∑t=1T−1γt(i)\begin{aligned} \\ & a_{ij} = \dfrac{\sum_{t=1}^{T-1} P \left( O, i_{t}=i, i_{t+1}=j | \overline{\lambda} \right)}{\sum_{t=1}^{T-1} P \left( O, i_{t}=i | \overline{\lambda} \right)} \\ & = \dfrac{\sum_{t=1}^{T-1} \xi_{t} \left( i,j \right) }{\sum_{t=1}^{T-1} \gamma_{t} \left( i \right)}\end{aligned}​aij​=∑t=1T−1​P(O,it​=i∣λ)∑t=1T−1​P(O,it​=i,it+1​=j∣λ)​=∑t=1T−1​γt​(i)∑t=1T−1​ξt​(i,j)​​

max⁡∑I(∑t=1Nlog⁡bit(ot))P(O,I∣λ‾)=∑j=1N∑t=1Tlog⁡bj(ot)P(O,it=j∣λ‾)s.t.∑k=1Mbj(k)=1\begin{aligned} \\ & \max \sum_{I} \left( \sum_{t=1}^{N} \log b_{i_{t}} \left( o_{t} \right) \right) P \left( O, I | \overline{\lambda} \right) = \sum_{j=1}^{N} \sum_{t=1}^{T} \log b_{j} \left( o_{t} \right) P \left( O, i_{t}=j | \overline{\lambda} \right) \\ & s.t. \sum_{k=1}^{M} b_{j} \left( k \right) = 1 \end{aligned}​maxI∑​(t=1∑N​logbit​​(ot​))P(O,I∣λ)=j=1∑N​t=1∑T​logbj​(ot​)P(O,it​=j∣λ)s.t.k=1∑M​bj​(k)=1​得bj(k)=∑t=1TP(O,it=j∣λ‾)I(ot=vk)∑t=1TP(O,it=j∣λ‾)=∑t=1,ot=vkTγt(j)∑t=1Tγt(j)\begin{aligned} \\ & b_{j} \left( k \right) = \dfrac{\sum_{t=1}^{T} P \left( O, i_{t}=j | \overline{\lambda} \right) I \left( o_{t} = v_{k} \right)}{\sum_{t=1}^{T} P \left( O, i_{t}=j | \overline{\lambda} \right)} \\ & = \dfrac{ \sum_{t=1,o_{t}=v_{k}}^{T} \gamma_{t} \left( j \right)}{\sum_{t=1}^{T} \gamma_{t} \left( j \right)}\end{aligned}​bj​(k)=∑t=1T​P(O,it​=j∣λ)∑t=1T​P(O,it​=j∣λ)I(ot​=vk​)​=∑t=1T​γt​(j)∑t=1,ot​=vk​T​γt​(j)​​

Baum-Welch算法:

输入:观测数据O=(o1,o2,⋯,oT)O = \left( o_{1}, o_{2}, \cdots, o_{T} \right)O=(o1​,o2​,⋯,oT​)输出:隐马尔科夫模型参数 初始化

对n=0n=0n=0,选取aij(0),bj(k)(0),πi(0)a_{ij}^{ \left( 0 \right) },b_{j} \left( k \right)^{\left( 0 \right)},\pi_{i}^{\left( 0 \right)}aij(0)​,bj​(k)(0),πi(0)​,得到模型λ(0)=(aij(0),bj(k)(0),πi(0))\lambda^{\left( 0 \right)} = \left( a_{ij}^{ \left( 0 \right) },b_{j} \left( k \right)^{\left( 0 \right)},\pi_{i}^{\left( 0 \right)} \right)λ(0)=(aij(0)​,bj​(k)(0),πi(0)​)递推

对n=1,2,⋯,n=1,2, \cdots,n=1,2,⋯,

aij(n+1)=∑t=1T−1ξt(i,j)∑t=1T−1γt(i)bj(k)(n+1)=∑t=1,ot=vkTγt(j)∑t=1Tγt(j)πi(n+1)=P(O,i1=i∣λ‾)P(O∣λ‾)\begin{aligned} \\ & a_{ij}^{\left( n+1 \right)} = \dfrac{\sum_{t=1}^{T-1} \xi_{t} \left( i,j \right) }{\sum_{t=1}^{T-1} \gamma_{t} \left( i \right)} \\ & b_{j} \left( k \right)^{\left( n+1 \right)} = \dfrac{ \sum_{t=1,o_{t}=v_{k}}^{T} \gamma_{t} \left( j \right)}{\sum_{t=1}^{T} \gamma_{t} \left( j \right)} \\ & \pi_{i}^{\left( n+1 \right)} = \dfrac{P \left( O, i_{1} = i | \overline{\lambda} \right)}{P \left( O | \overline{\lambda} \right)} \end{aligned}​aij(n+1)​=∑t=1T−1​γt​(i)∑t=1T−1​ξt​(i,j)​bj​(k)(n+1)=∑t=1T​γt​(j)∑t=1,ot​=vk​T​γt​(j)​πi(n+1)​=P(O∣λ)P(O,i1​=i∣λ)​​

其中,右端各值按观测数据O=(o1,o2,⋯,oT)O = \left( o_{1}, o_{2}, \cdots, o_{T} \right)O=(o1​,o2​,⋯,oT​)和模型λ(n)=(A(n),B(n),π(n))\lambda^{\left( n \right)} = \left( A^{\left( n \right)},B^{\left( n \right)},\pi^{\left( n \right)} \right)λ(n)=(A(n),B(n),π(n))计算。终止

得到模型λ(n+1)=(A(n+1),B(n+1),π(n+1))\lambda^{\left( n+1 \right)} = \left( A^{\left( n+1 \right)},B^{\left( n+1 \right)},\pi^{\left( n+1 \right)} \right)λ(n+1)=(A(n+1),B(n+1),π(n+1))

在时刻ttt状态为iii的所有单个路径(i1,i2,⋯,it)\left( i_{1}, i_{2}, \cdots, i_{t} \right)(i1​,i2​,⋯,it​)中概率最大值

δt(i)=max⁡i1,i2,⋯,it−1P(it=i,it−1,⋯,i1,ot,⋯,o1∣λ)i=1,2,⋯,N\begin{aligned} \\ & \delta_{t} \left( i \right) = \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left(i_{t}=i, i_{t-1}, \cdots, i_{1}, o_{t}, \cdots, o_{1} | \lambda \right) \quad \quad \quad i = 1, 2, \cdots, N \end{aligned}​δt​(i)=i1​,i2​,⋯,it−1​max​P(it​=i,it−1​,⋯,i1​,ot​,⋯,o1​∣λ)i=1,2,⋯,N​

得递推公式δt+1(i)=max⁡i1,i2,⋯,itP(it+1=i,it,⋯,i1,ot+1,⋯,o1∣λ)=max⁡1≤j≤N[max⁡i1,i2,⋯,it−1P(it+1=i,it=j,it−1,⋯,i1,ot+1,ot,⋯,o1∣λ)]=max⁡1≤j≤N[max⁡i1,i2,⋯,it−1P(it+1=i,it=j,it−1,⋯,i1,ot,ot−1,⋯,o1∣λ)P(ot+1∣it+1=i,λ)]=max⁡1≤j≤N[max⁡i1,i2,⋯,it−1P(it=j,it−1,⋯,i1,ot,ot−1,⋯,o1∣λ)P(it+1=i∣it=j,λ)P(ot+1∣it+1=i,λ)]=max⁡1≤j≤N[δt(j)aji]bi(ot+1)i=1,2,⋯,N\begin{aligned} \\ & \delta_{t+1} \left( i \right) = \max_{i_{1}, i_{2}, \cdots, i_{t}} P \left(i_{t+1}=i, i_{t}, \cdots, i_{1}, o_{t+1}, \cdots, o_{1} | \lambda \right) \\ & = \max_{1 \leq j \leq N} \left[ \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left( i_{t+1}=i, i_{t}=j, i_{t-1}, \cdots, i_{1}, o_{t+1}, o_{t}, \cdots, o_{1} | \lambda \right) \right] \\ & = \max_{1 \leq j \leq N} \left[ \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left( i_{t+1}=i, i_{t}=j, i_{t-1}, \cdots, i_{1}, o_{t}, o_{t-1}, \cdots, o_{1} | \lambda \right) P \left( o_{t+1} | i_{t+1}=i, \lambda \right)\right] \\ & = \max_{1 \leq j \leq N} \left[ \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left( i_{t}=j, i_{t-1}, \cdots, i_{1}, o_{t}, o_{t-1}, \cdots, o_{1} | \lambda \right) P \left( i_{t+1}=i | i_{t}=j, \lambda \right)P \left( o_{t+1} | i_{t+1}=i, \lambda \right)\right] \\ & = \max_{1 \leq j \leq N} \left[ \delta_{t} \left( j \right) a_{ji}\right] b_{i} \left( o_{t+1} \right)\quad \quad \quad i = 1, 2, \cdots, N \end{aligned}​δt+1​(i)=i1​,i2​,⋯,it​max​P(it+1​=i,it​,⋯,i1​,ot+1​,⋯,o1​∣λ)=1≤j≤Nmax​[i1​,i2​,⋯,it−1​max​P(it+1​=i,it​=j,it−1​,⋯,i1​,ot+1​,ot​,⋯,o1​∣λ)]=1≤j≤Nmax​[i1​,i2​,⋯,it−1​max​P(it+1​=i,it​=j,it−1​,⋯,i1​,ot​,ot−1​,⋯,o1​∣λ)P(ot+1​∣it+1​=i,λ)]=1≤j≤Nmax​[i1​,i2​,⋯,it−1​max​P(it​=j,it−1​,⋯,i1​,ot​,ot−1​,⋯,o1​∣λ)P(it+1​=i∣it​=j,λ)P(ot+1​∣it+1​=i,λ)]=1≤j≤Nmax​[δt​(j)aji​]bi​(ot+1​)i=1,2,⋯,N​

在时刻ttt状态为iii的所有单个路径(i1,i2,⋯,it)\left( i_{1}, i_{2}, \cdots, i_{t} \right)(i1​,i2​,⋯,it​)中概率最大值的路径的第t−1t-1t−1个结点ψt(i)=arg⁡max⁡1≤j≤N[δt−1(j)aji]i=1,2,⋯,N\begin{aligned} \\ & \psi_{t} \left( i \right) = \arg \max_{1 \leq j \leq N} \left[ \delta_{t-1} \left( j \right) a_{ji} \right] \quad \quad \quad i = 1, 2, \cdots, N \end{aligned}​ψt​(i)=arg1≤j≤Nmax​[δt−1​(j)aji​]i=1,2,⋯,N​

4、预测算法

维特比算法

维特比算法:

输入:模型λ=(A,B,π)\lambda = \left( A, B, \pi \right)λ=(A,B,π)和观测数据O=(o1,o2,⋯,oT)O = \left( o_{1}, o_{2}, \cdots, o_{T} \right)O=(o1​,o2​,⋯,oT​)输出:最优路径I∗=(i1∗,i2∗,⋯,iT∗)I^{*} = \left( i_{1}^{*}, i_{2}^{*}, \cdots, i_{T}^{*} \right)I∗=(i1∗​,i2∗​,⋯,iT∗​) 初始化

δ1(i)=πibi(o1)i=1,2,⋯,Nψ1(i)=0\begin{aligned} \\ & \delta_{1} \left( i \right) = \pi_{i} b_{i} \left( o_{1} \right) \quad \quad \quad i = 1, 2, \cdots, N \\ & \psi_{1} \left( i \right) = 0 \end{aligned}​δ1​(i)=πi​bi​(o1​)i=1,2,⋯,Nψ1​(i)=0​递推

对t=2,3,⋯,Tt=2,3, \cdots, Tt=2,3,⋯,Tδt(i)=max⁡1≤j≤N[δt−1(j)aji]bi(ot)i=1,2,⋯,Nψt(i)=arg⁡max⁡1≤j≤N[δt−1(j)aji]i=1,2,⋯,N\begin{aligned} \\ & \delta_{t} \left( i \right) = \max_{1 \leq j \leq N} \left[ \delta_{t-1} \left( j \right) a_{ji}\right] b_{i} \left( o_{t} \right)\quad \quad \quad i = 1, 2, \cdots, N \\ & \psi_{t} \left( i \right) = \arg \max_{1 \leq j \leq N} \left[ \delta_{t-1} \left( j \right) a_{ji} \right] \quad \quad \quad i = 1, 2, \cdots, N \end{aligned}​δt​(i)=1≤j≤Nmax​[δt−1​(j)aji​]bi​(ot​)i=1,2,⋯,Nψt​(i)=arg1≤j≤Nmax​[δt−1​(j)aji​]i=1,2,⋯,N​终止P∗=max⁡1≤j≤NδT(i)iT∗=arg⁡max⁡1≤j≤N[δT(i)]\begin{aligned} \\ & P^{*} = \max_{1 \leq j \leq N} \delta_{T} \left( i \right) \\ & i_{T}^{*} = \arg \max_{1 \leq j \leq N} \left[ \delta_{T} \left( i \right) \right] \end{aligned}​P∗=1≤j≤Nmax​δT​(i)iT∗​=arg1≤j≤Nmax​[δT​(i)]​最优路径回溯

对t=T−1,T−2,⋯,1t=T-1,T-2, \cdots, 1t=T−1,T−2,⋯,1it∗=ψt+1(it+1∗)\begin{aligned} \\ & i_{t}^{*} = \psi_{t+1} \left( i_{t+1}^{*} \right) \end{aligned}​it∗​=ψt+1​(it+1∗​)​

求得最优路径I∗=(i1∗,i2∗,⋯,iT∗)I^{*} = \left( i_{1}^{*}, i_{2}^{*}, \cdots, i_{T}^{*} \right)I∗=(i1∗​,i2∗​,⋯,iT∗​)

5、概要总结

1.隐马尔可夫模型是关于时序的概率模型,描述由一个隐藏的马尔可夫链随机生成不可观测的状态的序列,再由各个状态随机生成一个观测而产生观测的序列的过程。

隐马尔可夫模型由初始状态概率向π\piπ、状态转移概率矩阵AAA和观测概率矩阵BBB决定。因此,隐马尔可夫模型可以写成λ=(A,B,π)\lambda=(A, B, \pi)λ=(A,B,π)。

隐马尔可夫模型是一个生成模型,表示状态序列和观测序列的联合分布,但是状态序列是隐藏的,不可观测的。

隐马尔可夫模型可以用于标注,这时状态对应着标记。标注问题是给定观测序列预测其对应的标记序列。

2.概率计算问题。给定模型λ=(A,B,π)\lambda=(A, B, \pi)λ=(A,B,π)和观测序列O=(o1,o2,…,oT)O=(o_1,o_2,…,o_T)O=(o1​,o2​,…,oT​),计算在模型λ\lambdaλ下观测序列OOO出现的概率P(O∣λ)P(O|\lambda)P(O∣λ)。前向-后向算法是通过递推地计算前向-后向概率可以高效地进行隐马尔可夫模型的概率计算。

3.学习问题。已知观测序列O=(o1,o2,…,oT)O=(o_1,o_2,…,o_T)O=(o1​,o2​,…,oT​),估计模型λ=(A,B,π)\lambda=(A, B, \pi)λ=(A,B,π)参数,使得在该模型下观测序列概率P(O∣λ)P(O|\lambda)P(O∣λ)最大。即用极大似然估计的方法估计参数。Baum-Welch算法,也就是EM算法可以高效地对隐马尔可夫模型进行训练。它是一种非监督学习算法。

4.预测问题。已知模型λ=(A,B,π)\lambda=(A, B, \pi)λ=(A,B,π)和观测序列O=(o1,o2,…,oT)O=(o_1,o_2,…,o_T)O=(o1​,o2​,…,oT​),求对给定观测序列条件概率P(I∣O)P(I|O)P(I∣O)最大的状态序列I=(i1,i2,…,iT)I=(i_1,i_2,…,i_T)I=(i1​,i2​,…,iT​)。维特比算法应用动态规划高效地求解最优路径,即概率最大的状态序列。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。