HMM-前向后向算法理解与实现(python)
HMM-维特比算法理解与实现(python)
基本要素
状态 N N 个
状态序列 S = s 1 , s 2 , . . . S = s 1 , s 2 , . . .
观测序列 O = O 1 , O 2 , . . . O = O 1 , O 2 , . . .
λ ( A , B , π ) λ ( A , B , π )
状态转移概率 A = { a i j } A = { a i j }
发射概率 B = { b i k } B = { b i k }
初始概率分布 π = { π i } π = { π i }
观测序列生成过程
HMM三大问题
给定观测序列 O = O 1 O 2 . . . O T O = O 1 O 2 . . . O T ,模型 λ ( A , B , π ) λ ( A , B , π ) ,计算 P ( O | λ ) P ( O | λ ) ,即计算观测序列的概率
给定观测序列 O = O 1 O 2 . . . O T O = O 1 O 2 . . . O T ,模型 λ ( A , B , π ) λ ( A , B , π ) ,找到对应的状态序列 S S
给定观测序列 O = O 1 O 2 . . . O T O = O 1 O 2 . . . O T ,找到模型参数 λ ( A , B , π ) λ ( A , B , π ) ,以最大化 P ( O | λ ) P ( O | λ ) ,
概率计算问题
给定模型 λ λ 和观测序列 O O ,如何计算P ( O | λ ) P ( O | λ ) ?
暴力枚举 每一个可能的状态序列 S S
对每一个给定的状态序列
P ( O | S , λ ) = T ∏ t = 1 P ( O t | s t , λ ) = T ∏ t = 1 b s t O t P ( O | S , λ ) = ∏ t = 1 T P ( O t | s t , λ ) = ∏ t = 1 T b s t O t
一个状态序列的产生概率
P ( S | λ ) = P ( s 1 ) T ∏ t = 2 P ( s t | s t − 1 ) = π 1 T ∏ t = 2 a s t − 1 s t P ( S | λ ) = P ( s 1 ) ∏ t = 2 T P ( s t | s t − 1 ) = π 1 ∏ t = 2 T a s t − 1 s t
联合概率
P ( O , S | λ ) = P ( S | λ ) P ( O | S , λ ) = π 1 T ∏ t = 2 a s t − 1 s t T ∏ t = 1 b s t O t P ( O , S | λ ) = P ( S | λ ) P ( O | S , λ ) = π 1 ∏ t = 2 T a s t − 1 s t ∏ t = 1 T b s t O t
考虑所有的状态序列
P ( O | λ ) = ∑ S π 1 b s 1 O 1 T ∏ t = 2 a s t − 1 s t b s t O t P ( O | λ ) = ∑ S π 1 b s 1 O 1 ∏ t = 2 T a s t − 1 s t b s t O t
O O 可能由任意一个状态得到,所以需要将每个状态的可能性相加。
这样做什么问题?时间复杂度高达 O ( 2 T N T ) O ( 2 T N T ) 。每个序列需要计算 2 T 2 T 次,一共 N T N T 个序列。
前向算法
在时刻 t t ,状态为 i i 时,前面的时刻观测到 O 1 , O 2 , . . . , O t O 1 , O 2 , . . . , O t 的概率,记为 α i ( t ) α i ( t ) :
α i ( t ) = P ( O 1 , O 2 , … O t , s t = i | λ ) α i ( t ) = P ( O 1 , O 2 , … O t , s t = i | λ )
当 t = 1 t = 1 时,输出为 O 1 O 1 ,假设有三个状态,O 1 O 1 可能是任意一个状态发出,即
P ( O 1 | λ ) = π 1 b 1 ( O 1 ) + π 2 b 2 ( O 1 ) + π 2 b 3 ( O 1 ) = α 1 ( 1 ) + α 2 ( 1 ) + α 3 ( 1 ) P ( O 1 | λ ) = π 1 b 1 ( O 1 ) + π 2 b 2 ( O 1 ) + π 2 b 3 ( O 1 ) = α 1 ( 1 ) + α 2 ( 1 ) + α 3 ( 1 )
当 t = 2 t = 2 时,输出为 O 1 O 2 O 1 O 2 ,O 2 O 2 可能由任一个状态发出,同时产生 O 2 O 2 对应的状态可以由 t = 1 t = 1 时刻任意一个状态转移得到。假设 O 2 O 2 由状态 1
发出,如下图
P ( O 1 O 2 , s 2 = q 1 | λ ) = π 1 b 1 ( O 1 ) a 11 b 1 ( O 2 ) + π 2 b 2 ( O 1 ) a 21 b 1 ( O 2 ) + π 2 b 3 ( O 1 ) a 31 b 1 ( O 2 ) = α 1 ( 1 ) a 11 b 1 ( O 2 ) + α 2 ( 1 ) a 21 b 1 ( O 2 ) + α 3 ( 1 ) a 31 b 1 ( O 2 ) = α 1 ( 2 ) P ( O 1 O 2 , s 2 = q 1 | λ ) = π 1 b 1 ( O 1 ) a 11 b 1 ( O 2 ) + π 2 b 2 ( O 1 ) a 21 b 1 ( O 2 ) + π 2 b 3 ( O 1 ) a 31 b 1 ( O 2 ) = α 1 ( 1 ) a 11 b 1 ( O 2 ) + α 2 ( 1 ) a 21 b 1 ( O 2 ) + α 3 ( 1 ) a 31 b 1 ( O 2 ) = α 1 ( 2 )
同理可得 α 2 ( 2 ) , α 3 ( 2 ) α 2 ( 2 ) , α 3 ( 2 )
α 2 ( 2 ) = P ( O 1 O 2 , s 2 = q 2 | λ ) = α 1 ( 1 ) a 12 b 2 ( O 2 ) + α 2 ( 1 ) a 22 b 2 ( O 2 ) + α 3 ( 1 ) a 32 b 2 ( O 2 ) α 3 ( 2 ) = P ( O 1 O 2 , s 2 = q 3 | λ ) = α 1 ( 1 ) a 13 b 3 ( O 2 ) + α 2 ( 1 ) a 23 b 3 ( O 2 ) + α 3 ( 1 ) a 33 b 3 ( O 2 ) α 2 ( 2 ) = P ( O 1 O 2 , s 2 = q 2 | λ ) = α 1 ( 1 ) a 12 b 2 ( O 2 ) + α 2 ( 1 ) a 22 b 2 ( O 2 ) + α 3 ( 1 ) a 32 b 2 ( O 2 ) α 3 ( 2 ) = P ( O 1 O 2 , s 2 = q 3 | λ ) = α 1 ( 1 ) a 13 b 3 ( O 2 ) + α 2 ( 1 ) a 23 b 3 ( O 2 ) + α 3 ( 1 ) a 33 b 3 ( O 2 )
所以
P ( O 1 O 2 | λ ) = P ( O 1 O 2 , s 2 = q 1 | λ ) + P ( O 1 O 2 , s 2 = q 2 | λ ) + P ( O 1 O 2 , s 2 = q 3 | λ ) = α 1 ( 2 ) + α 2 ( 2 ) + α 3 ( 2 ) P ( O 1 O 2 | λ ) = P ( O 1 O 2 , s 2 = q 1 | λ ) + P ( O 1 O 2 , s 2 = q 2 | λ ) + P ( O 1 O 2 , s 2 = q 3 | λ ) = α 1 ( 2 ) + α 2 ( 2 ) + α 3 ( 2 )
所以前向算法 过程如下:
step1:初始化 α i ( 1 ) = π i ∗ b i ( O 1 ) α i ( 1 ) = π i ∗ b i ( O 1 )
step2:计算 α i ( t ) = ( ∑ N j = 1 α j ( t − 1 ) a j i ) b i ( O t ) α i ( t ) = ( ∑ j = 1 N α j ( t − 1 ) a j i ) b i ( O t )
step3:P ( O | λ ) = ∑ N i = 1 α i ( T ) P ( O | λ ) = ∑ i = 1 N α i ( T )
相比暴力法,时间复杂度降低了吗?
当前时刻有 N N 个状态,每个状态可能由前一时刻 N N 个状态中的任意一个转移得到,所以单个时刻的时间复杂度为 O ( N 2 ) O ( N 2 ) ,总时间复杂度 为 O ( T N 2 ) O ( T N 2 )
代码实现
例子:
假设从三个 袋子 {1,2,3}
中 取出 4 个球 O={red,white,red,white}
,模型参数λ = ( A , B , π ) λ = ( A , B , π ) 如下,计算序列O
出现的概率
#状态 1 2 3
A = [[0.5,0.2,0.3],
[0.3,0.5,0.2],
[0.2,0.3,0.5]]
pi = [0.2 ,0.4 ,0.4 ]
# red white
B = [[0.5,0.5],
[0.4,0.6],
[0.7,0.3]]
step1:初始化 α i ( 1 ) = π i ∗ b i ( O 1 ) α i ( 1 ) = π i ∗ b i ( O 1 )
step2:计算 α i ( t ) = ( ∑ N j = 1 α j ( t − 1 ) a j i ) b i ( O t ) α i ( t ) = ( ∑ j = 1 N α j ( t − 1 ) a j i ) b i ( O t )
step3:P ( O | λ ) = ∑ N i = 1 α i ( T ) P ( O | λ ) = ∑ i = 1 N α i ( T )
def hmm_forward (A,B,pi,O ):
T = len (O)
N = len (A[0 ])
alpha = [[0 ]*T for _ in range (N)]
for i in range (N):
alpha[i][0 ] = pi[i]*B[i][O[0 ]]
for t in range (1 ,T):
for i in range (N):
temp = 0
for j in range (N):
temp += alpha[j][t-1 ]*A[j][i]
alpha[i][t] = temp*B[i][O[t]]
proba = 0
for i in range (N):
proba += alpha[i][-1 ]
return proba,alpha
A = [[0.5 ,0.2 ,0.3 ],[0.3 ,0.5 ,0.2 ],[0.2 ,0.3 ,0.5 ]]
B = [[0.5 ,0.5 ],[0.4 ,0.6 ],[0.7 ,0.3 ]]
pi = [0.2 ,0.4 ,0.4 ]
O = [0 ,1 ,0 ,1 ]
hmm_forward(A,B,pi,O)
结果
后向算法
在时刻 t t ,状态为 i i 时,观测到 O t + 1 , O t + 2 , . . . , O T O t + 1 , O t + 2 , . . . , O T 的概率,记为 β i ( t ) β i ( t ) :
β i ( t ) = P ( O t + 1 , O t + 2 , . . . , O T | s t = i , λ ) β i ( t ) = P ( O t + 1 , O t + 2 , . . . , O T | s t = i , λ )
当 t = T t = T 时,由于 T T 时刻之后为空,没有观测,所以 β i ( t ) = 1 β i ( t ) = 1
当 t = T − 1 t = T − 1 时,观测 O T O T ,O T O T 可能由任意一个状态产生
β i ( T − 1 ) = P ( O T | s t = i , λ ) = a i 1 b 1 ( O T ) β 1 ( T ) + a i 2 b 2 ( O T ) β 2 ( T ) + a i 3 b 3 ( O T ) β 3 ( T ) β i ( T − 1 ) = P ( O T | s t = i , λ ) = a i 1 b 1 ( O T ) β 1 ( T ) + a i 2 b 2 ( O T ) β 2 ( T ) + a i 3 b 3 ( O T ) β 3 ( T )
当 t = 1 t = 1 时,观测为 O 2 , O 3 , . . . , O T O 2 , O 3 , . . . , O T
β 1 ( 1 ) = P ( O 2 , O 3 , . . . , O T | s 1 = 1 , λ ) = a 11 b 1 ( O 2 ) β 1 ( 2 ) + a 12 b 2 ( O 2 ) β 2 ( 2 ) + a 13 b 3 ( O 2 ) β 3 ( 2 ) β 2 ( 1 ) = P ( O 2 , O 3 , . . . , O T | s 1 = 2 , λ ) = a 21 b 1 ( O 2 ) β 1 ( 2 ) + a 22 b 2 ( O 2 ) β 2 ( 2 ) + a 23 b 3 ( O 2 ) β 3 ( 2 ) β 3 ( 1 ) = P ( O 2 , O 3 , . . . , O T | s 1 = 3 , λ ) = a 31 b 1 ( O 2 ) β 1 ( 2 ) + a 32 b 2 ( O 2 ) β 2 ( 2 ) + a 33 b 3 ( O 2 ) β 3 ( 2 ) β 1 ( 1 ) = P ( O 2 , O 3 , . . . , O T | s 1 = 1 , λ ) = a 11 b 1 ( O 2 ) β 1 ( 2 ) + a 12 b 2 ( O 2 ) β 2 ( 2 ) + a 13 b 3 ( O 2 ) β 3 ( 2 ) β 2 ( 1 ) = P ( O 2 , O 3 , . . . , O T | s 1 = 2 , λ ) = a 21 b 1 ( O 2 ) β 1 ( 2 ) + a 22 b 2 ( O 2 ) β 2 ( 2 ) + a 23 b 3 ( O 2 ) β 3 ( 2 ) β 3 ( 1 ) = P ( O 2 , O 3 , . . . , O T | s 1 = 3 , λ ) = a 31 b 1 ( O 2 ) β 1 ( 2 ) + a 32 b 2 ( O 2 ) β 2 ( 2 ) + a 33 b 3 ( O 2 ) β 3 ( 2 )
所以
P ( O 2 , O 3 , . . . , O T | λ ) = β 1 ( 1 ) + β 2 ( 1 ) + β 3 ( 1 ) P ( O 2 , O 3 , . . . , O T | λ ) = β 1 ( 1 ) + β 2 ( 1 ) + β 3 ( 1 )
后向算法过程如下:
step1:初始化 β i ( T ) = 1 β i ( T ) = 1
step2:计算 β i ( t ) = ∑ N j = 1 a i j b j ( O t + 1 ) β j ( t + 1 ) β i ( t ) = ∑ j = 1 N a i j b j ( O t + 1 ) β j ( t + 1 )
step3:P ( O | λ ) = ∑ N i = 1 π i b i ( O 1 ) β i ( 1 ) P ( O | λ ) = ∑ i = 1 N π i b i ( O 1 ) β i ( 1 )
代码实现
还是上面的例子
def hmm_backward (A,B,pi,O ):
T = len (O)
N = len (A[0 ])
beta = [[0 ]*T for _ in range (N)]
for i in range (N):
beta[i][-1 ] = 1
for t in reversed (range (T-1 )):
for i in range (N):
for j in range (N):
beta[i][t] += A[i][j]*B[j][O[t+1 ]]*beta[j][t+1 ]
proba = 0
for i in range (N):
proba += pi[i]*B[i][O[0 ]]*beta[i][0 ]
return proba,beta
A = [[0.5 ,0.2 ,0.3 ],[0.3 ,0.5 ,0.2 ],[0.2 ,0.3 ,0.5 ]]
B = [[0.5 ,0.5 ],[0.4 ,0.6 ],[0.7 ,0.3 ]]
pi = [0.2 ,0.4 ,0.4 ]
O = [0 ,1 ,0 ,1 ]
hmm_backward(A,B,pi,O)
结果
前向-后向算法
回顾前向、后向变量:
a i ( t ) a i ( t ) 时刻 t t ,状态为 i i ,观测序列为 O 1 , O 2 , . . . , O t O 1 , O 2 , . . . , O t 的概率
β i ( t ) β i ( t ) 时刻 t t ,状态为 i i ,观测序列为 O t + 1 , O t + 2 , . . . , O T O t + 1 , O t + 2 , . . . , O T 的概率
P ( O , s t = i | λ ) = P ( O 1 , O 2 , . . . , O T , s t = i | λ ) = P ( O 1 , O 2 , . . . , O t , s t = i , O t + 1 , O t + 2 , . . . , O T | λ ) = P ( O 1 , O 2 , . . . , O t , s t = i | λ ) ∗ P ( O t + 1 , O t + 2 , . . . , O T | O 1 , O 2 , . . . , O t , s t = i , λ ) = P ( O 1 , O 2 , . . . , O t , s t = i | λ ) ∗ P ( O t + 1 , O t + 2 , . . . , O T , s t = i | λ ) = a i ( t ) ∗ β i ( t ) P ( O , s t = i | λ ) = P ( O 1 , O 2 , . . . , O T , s t = i | λ ) = P ( O 1 , O 2 , . . . , O t , s t = i , O t + 1 , O t + 2 , . . . , O T | λ ) = P ( O 1 , O 2 , . . . , O t , s t = i | λ ) ∗ P ( O t + 1 , O t + 2 , . . . , O T | O 1 , O 2 , . . . , O t , s t = i , λ ) = P ( O 1 , O 2 , . . . , O t , s t = i | λ ) ∗ P ( O t + 1 , O t + 2 , . . . , O T , s t = i | λ ) = a i ( t ) ∗ β i ( t )
即在给定的状态序列中,t t 时刻状态为 i i 的概率。
使用前后向算法可以计算隐状态,记 γ i ( t ) = P ( s t = i | O , λ ) γ i ( t ) = P ( s t = i | O , λ ) 表示时刻 t t 位于隐状态 i i 的概率
P ( s t = i , O | λ ) = α i ( t ) β i ( t ) P ( s t = i , O | λ ) = α i ( t ) β i ( t )
γ i ( t ) = P ( s t = i | O , λ ) = P ( s t = i , O | λ ) P ( O | λ ) = α i ( t ) β i ( t ) P ( O | λ ) = α i ( t ) β i ( t ) ∑ N i = 1 α i ( t ) β i ( t ) γ i ( t ) = P ( s t = i | O , λ ) = P ( s t = i , O | λ ) P ( O | λ ) = α i ( t ) β i ( t ) P ( O | λ ) = α i ( t ) β i ( t ) ∑ i = 1 N α i ( t ) β i ( t )
references:
[1] https://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf
[2]https://www.cnblogs.com/fulcra/p/11065474.html
[3] https://www.cnblogs.com/sjjsxl/p/6285629.html
[4] https://blog.csdn.net/xueyingxue001/article/details/52396494
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!