NLP(二):LSTM处理不定长句子
参考文献:
https://zhuanlan.zhihu.com/p/59772104
https://blog.csdn.net/kejizuiqianfang/article/details/100835528
https://www.cnblogs.com/picassooo/p/13577527.html
https://www.jianshu.com/p/043083d114d4
https://blog.csdn.net/yangyang_yangqi/article/details/84585998
一、nn.LSTM参数讲解
import torch import torch.nn as nn from torch.autograd import Variable #构建网络模型---输入矩阵特征数input_size、输出矩阵特征数hidden_size、层数num_layers inputs = torch.randn(5,3,10) ->(seq_len,batch_size,input_size) rnn = nn.LSTM(10,20,2) -> (input_size,hidden_size,num_layers) h0 = torch.randn(2,3,20) ->(num_layers* 1,batch_size,hidden_size) c0 = torch.randn(2,3,20) ->(num_layers*1,batch_size,hidden_size) num_directions=1 因为是单向LSTM ''' Outputs: output, (h_n, c_n) ''' output,(hn,cn) = rnn(inputs,(h0,c0))
二、LSTM中不定长句子处理
import torch from torch import nn import torch.nn.utils.rnn as rnn_utils from torch.utils.data import DataLoader import torch.utils.data as data x1 = [ torch.tensor([[6,6], [6,6],[6,6]]).float(), torch.tensor([[7,7]]).float() ] y = [ torch.tensor([1]), torch.tensor([0]) ] class MyData(data.Dataset): def __init__(self, data_seq, y): self.data_seq = data_seq self.y = y def __len__(self): return len(self.data_seq) def __getitem__(self, idx): tuple_ = (self.data_seq[idx], self.y[idx]) return tuple_ def collate_fn(data_tuple): data_tuple.sort(key=lambda x: len(x[0]), reverse=True) data = [sq[0] for sq in data_tuple] label = [sq[1] for sq in data_tuple] data_length = [len(q) for q in data] data = rnn_utils.pad_sequence(data, batch_first=True, padding_value=0.0) label = rnn_utils.pad_sequence(label, batch_first=True, padding_value=0.0) return data, label,data_length if __name__=='__main__': learning_rate = 0.001 data = MyData(x1, y) data_loader = DataLoader(data, batch_size=2, shuffle=True, collate_fn=collate_fn) batch_x, y, batch_x_len = iter(data_loader).next() print(batch_x) print(batch_x.shape) print(batch_x_len) print(y) print(y.shape) batch_x_pack = rnn_utils.pack_padded_sequence(batch_x, batch_x_len, batch_first=True) net = nn.LSTM(input_size=2, hidden_size=10, num_layers=4, batch_first=True) criteria = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate) print(batch_x_pack) out, (h1, c1) = net(batch_x_pack)
三、孪生LSTM
import torch from torch import nn from torch.utils.data import DataLoader import torch.utils.data as data x1 = [ torch.tensor([[7,7]]).float(), torch.tensor([[6,6], [6,6],[6,6]]).float(), ] x2 = [ torch.tensor([[6,3]]).float(), torch.tensor([[6,3], [3,6],[6,6]]).float(), ] y = [ torch.tensor([1]), torch.tensor([0]), ] class MyData(data.Dataset): def __init__(self, data1, data2, y): self.data1 = data1 self.data2 = data2 self.y = y def __len__(self): return len(self.data1) def __getitem__(self, idx): tuple_ = (self.data1[idx], self.data2[idx],self.y[idx]) return tuple_ class SiameseLSTM(nn.Module): def __init__(self, input_size): super(SiameseLSTM, self).__init__() self.lstm = nn.LSTM(input_size=input_size, hidden_size=10, num_layers=4, batch_first=True) self.fc = nn.Linear(10, 1) def forward(self, data1, data2): out1, (h1, c1) = self.lstm(data1) out2, (h2, c2) = self.lstm(data2) pre1 = out1[:, -1, :] pre2 = out2[:, -1, :] dis = torch.abs(pre1 - pre2) out = self.fc(dis) return out if __name__=='__main__': learning_rate = 0.001 data = MyData(x1, x2, y) data_loader = DataLoader(data, batch_size=1, shuffle=True) net = SiameseLSTM(2) criterion = nn.BCEWithLogitsLoss() optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate) for epoch in range(100): for batch_id, (data1,data2, label) in enumerate(data_loader): distence = net(data1,data2) print(distence) print(label) loss = criterion(distence, label.float()) optimizer.zero_grad() loss.backward() optimizer.step() print(loss)
四、参数讲解
1、输入的参数列表包括:
input_size: 输入数据的特征维数,通常就是embedding_dim(词向量的维度)
hidden_size: LSTM中隐层的维度
num_layers: 循环神经网络的层数
bias: 用不用偏置,default=True
batch_first: 这个要注意,通常我们输入的数据shape=(batch_size,seq_length,embedding_dim),而batch_first默认是False,所以我们的输入数据最好送进LSTM之前将batch_size与seq_length这两个维度调换
dropout: 默认是0,代表不用dropout
bidirectional: 默认是false,代表不用双向LSTM
2、输入数据包括input, (h_0, c_0):
input: shape = [seq_length, batch_size, input_size]的张量
h_0: shape = [num_layers * num_directions,batch, hidden_size]的张量,它包含了在当前这个batch_size中每个句子的初始隐藏状态,num_layers就是LSTM的层数,如果bidirectional = True,则num_directions = 2,否则就是1,表示只有一个方向
c_0: 与h_0的形状相同,它包含的是在当前这个batch_size中的每个句子的初始细胞状态。h_0,c_0如果不提供,那么默认是0
3、输出数据包括output, (h_t, c_t):
output.shape = [seq_length, batch_size, num_directions * hidden_size]
它包含的LSTM的最后一层的输出特征(h_t),t是batch_size中每个句子的长度.
h_t.shape = [num_directions * num_layers, batch, hidden_size]
c_t.shape = h_t.shape
h_n包含的是句子的最后一个单词的隐藏状态,c_t包含的是句子的最后一个单词的细胞状态,所以它们都与句子的长度seq_length无关。
output[-1]与h_t是相等的,因为output[-1]包含的正是batch_size个句子中每一个句子的最后一个单词的隐藏状态,注意LSTM中的隐藏状态其实就是输出,cell state细胞状态才是LSTM中一直隐藏的,记录着信息,这也就是博主本文想说的一个事情,output与h_t的关系。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧