HMM 博客汇总
问题三合一(为了防止部分读者厌烦,故而单独发了三个问题的博客)概率计算问题学习问题解码问题词性标注
概率计算问题(Evaluation)
给定模型 λ=(A,B,π)\lambda=(A, B, \pi)λ=(A,B,π) 和观测序列 O=(o1,o2,⋯,oT)O=\left(o_{1}, o_{2}, \cdots, o_{T}\right)O=(o1,o2,⋯,oT) , 计算在模型 λ\lambdaλ 下观测序列 OOO 出现的概率 P(O∣λ)P(O \mid \lambda)P(O∣λ).
直接计算法
P(O∣λ)=∑IP(O,I∣λ)=∑IP(O∣I,λ)⋅P(I∣λ)P(O|\lambda) = \sum_{I}P(O,I|\lambda) = \sum_{I}P(O|I,\lambda)\cdot P(I|\lambda) P(O∣λ)=I∑P(O,I∣λ)=I∑P(O∣I,λ)⋅P(I∣λ)
前向算法
定义前向概率为
αt(i)=P(o1,o2,...,ot,it=qi∣λ)\alpha_t(i) = P(o_1,o_2,...,o_t,i_t=q_i|\lambda) αt(i)=P(o1,o2,...,ot,it=qi∣λ)
那么我们要求的
P(O∣λ)=P(o1,o2,...,oT∣λ)=∑iNP(o1,o2,...,oT,iT=qi∣λ)=∑iαT(i)P(O|\lambda) = P(o_1,o_2,...,o_T|\lambda) = \sum_{i}^NP(o_1,o_2,...,o_T,i_T=q_i|\lambda) = \sum_{i}\alpha_T(i) P(O∣λ)=P(o1,o2,...,oT∣λ)=i∑NP(o1,o2,...,oT,iT=qi∣λ)=i∑αT(i)
算法过程:
(1)初值
α1(i)=P(o1,i1=qi∣λ)=P(i1∣λ)⋅P(o1∣i1=qi,λ)=πibi(o1)\alpha_1(i) = P(o_1,i_1=q_i|\lambda) = P(i_1|\lambda)\cdot P(o_1|i_1=qi,\lambda) = \pi_i b_i(o_1) α1(i)=P(o1,i1=qi∣λ)=P(i1∣λ)⋅P(o1∣i1=qi,λ)=πibi(o1)
(2)递推
αt+1(i)=P(o1,o2,...,ot,ot+1,it+1=qi∣λ)=P(o1,o2,...,ot,it+1=qi∣λ)⋅P(ot+1∣o1,o2,...,ot,it+1=qi∣λ)=[∑j=1P(o1,o2,...,ot,it=qj,it+1=qi∣λ)]⋅P(ot+1∣it+1=qi)=[∑j=1P(o1,o2,...,ot,it=qj∣λ)⋅P(it+1=qi∣o1,o2,...,ot,it=qj,λ)]⋅bi(ot+1)=[∑j=1αt(j)⋅P(it+1=qi∣it=qj,λ)]⋅bi(ot+1)=[∑j=1αt(j)⋅aji]⋅bi(ot+1)\begin{aligned} \alpha_{t+1}(i) &= P(o_1,o_2,...,o_t,o_{t+1},i_{t+1}=q_i|\lambda) \\ &= P(o_1,o_2,...,o_t,i_{t+1}=q_i|\lambda)\cdot P(o_{t+1}|o_1,o_2,...,o_t,i_{t+1} = q_i|\lambda) \\ &= [\sum_{j=1} P(o_1,o_2,...,o_t,i_t=q_j,i_{t+1}=q_i|\lambda)]\cdot P(o_{t+1}|i_{t+1}=q_i) \\ &= [\sum_{j=1}P(o_1,o_2,...,o_t,i_t=q_j|\lambda)\cdot P(i_{t+1}=q_i|o_1,o_2,...,o_t,i_t=q_j,\lambda)]\cdot b_i(o_{t+1}) \\ &= [\sum_{j=1}\alpha_t(j)\cdot P(i_{t+1}=q_i|i_t=q_j,\lambda)]\cdot b_i(o_{t+1}) \\ &= [\sum_{j=1}\alpha_t(j)\cdot a_{ji}]\cdot b_i(o_{t+1}) \end{aligned} αt+1(i)=P(o1,o2,...,ot,ot+1,it+1=qi∣λ)=P(o1,o2,...,ot,it+1=qi∣λ)⋅P(ot+1∣o1,o2,...,ot,it+1=qi∣λ)=[j=1∑P(o1,o2,...,ot,it=qj,it+1=qi∣λ)]⋅P(ot+1∣it+1=qi)=[j=1∑P(o1,o2,...,ot,it=qj∣λ)⋅P(it+1=qi∣o1,o2,...,ot,it=qj,λ)]⋅bi(ot+1)=[j=1∑αt(j)⋅P(it+1=qi∣it=qj,λ)]⋅bi(ot+1)=[j=1∑αt(j)⋅aji]⋅bi(ot+1)
(3)终止
P(O∣λ)=∑iαT(i)P(O|\lambda) = \sum_{i}\alpha_T(i) P(O∣λ)=i∑αT(i)
算法实现
定义观测数据和λ\lambdaλ参数(π,A,B\pi,A,Bπ,A,B)
import numpy as npO = [1,1,0,0,1]Pi= [0.25,0.25,0.25,0.25]A = [[0, 1, 0, 0 ],[0.4,0,0.6,0],[0,0.4,0,0.6],[0,0,0.5,0.5]]B = [[0.5,0.5],[0.3,0.7],[0.6,0.4],[0.8,0.2]]
定义模型
class ForwardModel:def __init__(self,Pi,A,B) -> None:self.Pi = Piself.A = Aself.B = Bself.T = len(A)def predict(self,O):alpha = np.zeros(shape=(self.T,),dtype=float)# 初值for i in range(self.T):alpha[i] = self.Pi[i]*self.B[i][O[0]]# 递推for observation in O:temp = np.zeros_like(alpha)for i in range(self.T):for j in range(self.T):temp[i] += alpha[j]*self.A[j][i]temp[i] = temp[i]*self.B[i][observation]alpha = temp# 终止return np.sum(alpha)
预测
model = ForwardModel(Pi,A,B)model.predict(O)
0.01164323808
读者可自行验证是否正确
后向算法
定义后向概率为
βt(i)=P(ot+1,ot+2,⋯,oT∣it=qi,λ)\beta_{t}(i)=P\left(o_{t+1}, o_{t+2}, \cdots, o_{T} \mid i_{t}=q_{i}, \lambda\right) βt(i)=P(ot+1,ot+2,⋯,oT∣it=qi,λ)
那么
P(O∣λ)=P(o1,o2,...,oT∣λ)=∑i=1NP(o1,o2,...,oT,i1=qi∣λ)=∑i=1NP(o1∣o2,...,oT,i1=qi,λ)⋅P(o2,...,oT,i1=qi∣λ)=∑i=1NP(o1∣i1=qi,λ)⋅P(o2,...,oT∣i1=qi,λ)⋅P(i1=qi∣λ)=∑i=1Nbi(o1)⋅β1(i)⋅πi\begin{aligned} P(O|\lambda) &= P(o_1,o_2,...,o_T|\lambda) \\ &=\sum_{i=1}^{N}P(o_1,o_2,...,o_T,i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}P(o_1|o_2,...,o_T,i_1=q_i,\lambda)\cdot P(o_2,...,o_T,i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}P(o_1|i_1=q_i,\lambda)\cdot P(o_2,...,o_T|i_1=q_i,\lambda)\cdot P(i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}b_i(o_1)\cdot \beta_1(i)\cdot \pi_i \end{aligned} P(O∣λ)=P(o1,o2,...,oT∣λ)=i=1∑NP(o1,o2,...,oT,i1=qi∣λ)=i=1∑NP(o1∣o2,...,oT,i1=qi,λ)⋅P(o2,...,oT,i1=qi∣λ)=i=1∑NP(o1∣i1=qi,λ)⋅P(o2,...,oT∣i1=qi,λ)⋅P(i1=qi∣λ)=i=1∑Nbi(o1)⋅β1(i)⋅πi
算法过程
(1)初值
βT(i)=1\beta_T(i) = 1 βT(i)=1
(2)递推
βt(i)=P(ot+1,ot+2,⋯,oT∣it=qi,λ)=∑j=1NP(ot+1,ot+2,⋯,oT,it+1=qj∣it=qi,λ)=∑j=1NP(ot+1,ot+2,⋯,oT∣it+1=qj,it=qi,λ)⋅P(it+1=qj∣it=qi,λ)=∑j=1NP(ot+1,ot+2,⋯,oT∣it+1=qj,it=qi,λ)⋅αij\begin{aligned} \beta_{t}(i) &= P\left(o_{t+1}, o_{t+2}, \cdots, o_{T} \mid i_{t}=q_{i}, \lambda\right) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T},i_{t+1}=q_j \mid i_{t}=q_{i}, \lambda\right) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T}\mid i_{t+1}=q_j,i_{t}=q_{i}, \lambda\right)\cdot P(i_{t+1}=q_j|i_t=q_i,\lambda) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T}\mid i_{t+1}=q_j,i_{t}=q_{i}, \lambda\right)\cdot \alpha_{ij} \end{aligned} βt(i)=P(ot+1,ot+2,⋯,oT∣it=qi,λ)=j=1∑NP(ot+1,ot+2,⋯,oT,it+1=qj∣it=qi,λ)=j=1∑NP(ot+1,ot+2,⋯,oT∣it+1=qj,it=qi,λ)⋅P(it+1=qj∣it=qi,λ)=j=1∑NP(ot+1,ot+2,⋯,oT∣it+1=qj,it=qi,λ)⋅αij
接下来这个步骤很关键,根据概率图模型。。。(挖个坑,这儿差个证明)
βt(i)=∑j=1NP(ot+1,...oT∣it+1=qj,λ)⋅αij=∑j=1NP(ot+2,...oT∣it+1=qj,λ)⋅P(ot+1∣ot+2,...oT,it+1=qj,λ)⋅αij=∑j=1Nβt+1(j)⋅bj(ot+1)⋅αij\begin{aligned} \beta_t(i) &= \sum_{j=1}^{N}P(o_{t+1},...o_T|i_{t+1}=q_j,\lambda)\cdot \alpha_{ij} \\ &=\sum_{j=1}^{N}P(o_{t+2},...o_T|i_{t+1}=q_j,\lambda)\cdot P(o_{t+1}|o_{t+2},...o_T,i_{t+1}=q_j,\lambda) \cdot \alpha_{ij} \\ &= \sum_{j=1}^{N}\beta_{t+1}(j)\cdot b_j(o_{t+1})\cdot \alpha_{ij} \end{aligned} βt(i)=j=1∑NP(ot+1,...oT∣it+1=qj,λ)⋅αij=j=1∑NP(ot+2,...oT∣it+1=qj,λ)⋅P(ot+1∣ot+2,...oT,it+1=qj,λ)⋅αij=j=1∑Nβt+1(j)⋅bj(ot+1)⋅αij
(3)终止
P(O∣λ)=∑i=1Nbi(o1)⋅β1(i)⋅πi\begin{aligned} P(O|\lambda) =\sum_{i=1}^{N}b_i(o_1)\cdot \beta_1(i)\cdot \pi_i \end{aligned} P(O∣λ)=i=1∑Nbi(o1)⋅β1(i)⋅πi
算法实现
数据和参数定义部分与上面一致,略
定义模型
class BackwardModel:def __init__(self,Pi,A,B) -> None:self.Pi = Piself.A = Aself.B = Bself.T = len(A)def predict(self,O):# 初值beta = np.ones(shape=(self.T,))# 递推for o in reversed(O):temp = np.zeros_like(beta)for i in range(self.T):for j in range(self.T):temp[i] += beta[j]*B[j][o]*A[i][j]beta = temp# 终止res = 0for i in range(self.T):res += B[i][O[0]]*beta[i]*Pi[i]return res
预测
model = BackwardModel(Pi,A,B)model.predict(O)
0.01164323808
我们看到这跟前面使用前向算法得出的预测值是相同的