RNN原理及其复现

RNN figures

单层RNN

记忆单元和过去相关和未来无关

双向RNN

既能看到过去，也能看到未来

优点

可以处理变长序列

这个是DNN和CNN处理不了的，DNN的话输入特征是固定的，CNN的话，kernel不光和kernel_size有关还有输入通道数有关。之所以能够处理变长序列的原因是因为这些w1，w2，w3权重在每一个时刻是相等的，正因为这些权重无论在输入还是在记忆单元连接，还是历史信息和当前信息连接，权重都是固定的，正是因为权重在每一个时刻共享，所以RNN才能处理变长序列，一旦把共享的w去掉了，就不能处理变长序列了。

模型大小与序列长度无关
计算量与序列长度星线性增长
考虑历史信息
便于流式输出

每计算一步就可以输出

权重时不变

缺点

串行计算比较慢
无法获取太长的历史信息

应用场景

Al诗歌生成

one to many任务

文本情感分类

many to one任务

词法识别

many to many 任务，识别每个单词的词性

机器翻译

many to many 任务，seq2seq

语音识别/合成
语言模型

API

RNN — PyTorch 1.12 documentation

公式

初始化RNN参数

input_size

输入特征的维度

hidden_size

每一个时刻h_t的大小

num_layers

RNN的层数，可以堆叠多层

nonlinearity

激活函数，默认是tanh，可以用relu

bias

偏置值

batch_first

决定输入和输出的格式，为True的话，格式是这个样子的 (batch, seq, feature) ，为False的话，格式是这个样子的 (seq, batch, feature)，很好的解释了沐神为什么在转换维度

dropout
bidirectional

为True的话，可以构造双向RNN结构，输出就是2×hidden_size，因为是双向，所以有2个隐藏状态

输入参数

input

如果batch_first为True的话，输入特征的shape要求为

如果batch_first为False的话，输入特征的shape要求为

上述为有batch，没有batch的shape为

初始状态要求有batch的shape为

没有batch的

输出参数

output

如果batch_first为True的话，输出shape为

如果batch_first为False的话，输出shape为

没有batch

最后时刻的状态，输出shape为

没有batch

变量

RNN.weight_ih_l[k]：第k层输入层的权重，shape为(hidden_size, input_size) ，
RNN.weight_hh_l[k]：第k层隐藏层的权重，shape为 (hidden_size, hidden_size)
RNN.bias_ih_l[k] ：第k层输入层的偏置，shape为(hidden_size)
RNN.bias_hh_l[k]：第k层隐藏层的偏置，shape为(hidden_size)

API实现单层，单向RNN

参数名	值
input_size	4
hidden_size	3
num_layers	1
batch_size	1
seqLength	2
D	1
batch_first	True

import torch
import torch.nn as nn

single_rnn = nn.RNN(4,3,1,batch_first=True) # input_size * hidden_size * num_layers

input_x = torch.randn(1,2,4) # batch_size * seqLength * input_size
output,h_n = single_rnn(input_x)

output # batch_size * seqLength * (D * hidden_size)

#tensor([[[ 0.6475, -0.3831, -0.0288],
#         [ 0.4701, -0.7224, -0.1448]]], grad_fn=<TransposeBackward1>)

h_n # (D * num_layers) * batch_size * hidden_size

# tensor([[[ 0.4701, -0.7224, -0.1448]]], grad_fn=<StackBackward0>)

API实现单层，双向RNN

参数名	值
input_size	4
hidden_size	3
num_layers	1
batch_size	1
seqLength	2
D	2
batch_first	True
bidirectional	True

bidirectional_rnn = nn.RNN(4,3,1,batch_first=True,bidirectional=True) # input_size * hidden_size * num_layers

bi_output,bi_h_n = bidirectional_rnn(input_x)

bi_output # batch_size * seqLength * (D * hidden_size)
#tensor([[[ 0.2967,  0.0377, -0.3754,  0.1055, -0.8097,  0.5296],
#         [ 0.6000, -0.3587, -0.2720, -0.2106, -0.3747,  0.4135]]],grad_fn=<TransposeBackward1>)

bi_h_n # (D * num_layers) * batch_size * hidden_size
#tensor([[[ 0.6000, -0.3587, -0.2720]],
#        [[ 0.1055, -0.8097,  0.5296]]], grad_fn=<StackBackward0>)

RNN复现

单层，单向

def rnn_forward(input,weight_ih,weight_hh,bias_ih,bias_hh,h_prev):
    batch_size,T,input_size = input.shape
    h_dim = weight_ih.shape[0] # 隐藏层维度
    h_out = torch.zeros(batch_size,T,h_dim) # 初始化一个输出（状态）矩阵
    
    for t in range(T):
        #x = input[:,t,:] # 获取当前时刻输入 batch_size * input_size
        x = input[:,t,:].unsqueeze(2) # 做bmm运算时，需要对x扩维变成batch_size * input_size*1
        
        # 这里给weight扩维度，主要是因为x里面有batch，
        w_ih_batch = weight_ih.unsqueeze(0).tile(batch_size,1,1) # batch_size*h_dim*input_size
        w_hh_batch = weight_hh.unsqueeze(0).tile(batch_size,1,1) # batch_size*h_dim*h_dim
        
        w_time_x = torch.bmm(w_ih_batch,x).squeeze(-1)# 因为x是二维，需要x扩维 batch_size*h_dim
        # 上一时刻的隐藏状态
        w_time_h = torch.bmm(w_hh_batch,h_pre.unsqueeze(2)).squeeze(-1) #batch_size*h_dim
        
        h_prev = torch.tanh(w_time_x+biash_ih+w_time_h+bias_hh)
        
        h_out[:,t,:] = h_prev
    return h_out,h_prev.unsqueeze(0)

batch_size,T = 2,3
input_size,hidden_size = 2,3
input = torch.randn(batch_size,T,input_size)
h_prev = torch.randn(batch_size,hidden_size)

weight_ih = torch.randn(hidden_size,input_size)
weight_hh = torch.randn(hidden_size,hidden_size)
bias_ih = torch.randn(hidden_size)
bias_hh = torch.randn(hidden_size)

r = rnn_forward(input,weight_ih,weight_hh,bias_ih,bias_hh,h_prev)
r

c[:,1,:]

变为2维的

tile用法

只传入一个参数

x = torch.tensor([[1, 2], [3, 4]])
print(x.tile((2, )))
>>> tensor([[1, 2, 1, 2],
            [3, 4, 3, 4]])
# 只有一个参数，可以理解最后一个维度里面的元素进行复制，所以就是把1,2复制两边就是1,2,1,2 把3,4复制两边就是3,4,3,4
# 这里最后一个维度是[1,2],[3,4]，里面的元素分别是1,2,3,4

传入两个参数

x = torch.tensor([[1, 2], [3, 4]])

print(x.tile((2, 2)))
>>> tensor([[1, 2, 1, 2],
            [3, 4, 3, 4],
            [1, 2, 1, 2],
            [3, 4, 3, 4]])
# 传入两个参数，首先对最后一个维度里面的元素复制2遍就是[1,2,1,2],[3,4,3,4]，然后把最后一个维度的上一个维度的元素复制两遍就是
# [[1, 2, 1, 2],
#  [3, 4, 3, 4],
#  [1, 2, 1, 2],
#  [3, 4, 3, 4]]

传入三个参数

x = torch.randn(2, 2)
print(x)
>>> tensor([[ 1.1165, -0.5559],
            [-0.6341,  0.5215]])

print(x.tile((2, 2, 2)))
>>> tensor([[[ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215],
             [ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215]],

            [[ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215],
             [ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215]]])

# 传入3个参数，首先对最后一个维度里面的元素复制2遍就是[1.1165, -0.5559,  1.1165, -0.5559]，然后对最后一个维度的上一个维度里面的元素复制2遍，就是
			[[ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215],
             [ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215]]
# 接着，对到倒数第3个维度里的元素进行复制，这里没有倒数第三个维度，所以创建一个维度，然后再把倒数第三个维度的值复制两遍

双向RNN

# 手写一个bidirectional
def bidirectional_rnn_forward(input,weight_ih,weight_hh,bias_ih,bias_hh,h_prev,
                              weight_ih_reverse,weight_hh_reverse,bias_ih_reverse,bias_hh_reverse,
                             h_prev_reverse):
    batch_size,T,input_size = input.shape
    # 隐藏层的维度
    h_dim = weight_ih.shape[0]
    # output
    h_out = torch.zeros(batch_size,T,h_dim*2) # 初始化一个输出，注意双向是两倍的特征大小
    
    # 正向层，这里对结果取[0]只取output
    forward_output = rnn_forward(input,weight_ih,weight_hh,bias_ih,bias_hh,h_prev)[0]
    
    # 反向层
    backward_output = rnn_forward(torch.flip(input,(1,)),weight_ih,weight_hh,bias_ih,bias_hh,h_prev)[0]
    
    # 有两个dim，分别赋值
    h_out[:,:,:h_dim] = forward_output
    h_out[:,:,h_dim:] = backward_output
    
    return h_out,h_out[:,-1,:].reshape((batch_size,2,h_dim)).transpose(0,1) # 交换维度

# API实现双向
# 验证以下bidirectional_rnn_forward正确性
bi_rnn = nn.RNN(input_size,hidden_size,batch_first=True,bidirectional=True)
h_prev = torch.zeros(2,batch_size,hidden_size) # 这个2表示双向RNN，h_prev[0]是正向,h_prev[1]是反向
bi_rnn_output,bi_state_final = bi_rnn(input,h_prev)
print(bi_rnn_output)
print(bi_state_final)

torch.flip

你传入那个dim，我就对那个dim反转，这里

# 查看网络的参数
for k,v in bi_rnn.named_parameters():
    print(k,v)

# 直接把api的bidirectional的weight拿过用到自己实现的
bidirectional_rnn_forward(input,bi_rnn.weight_ih_l0,bi_rnn.weight_hh_l0,
                         bi_rnn.bias_ih_l0,bi_rnn.bias_hh_l0,h_prev[0],
						 bi_rnn.weight_ih_l0_reverse,bi_rnn.weight_hh_l0_reverse,
                         bi_rnn.bias_ih_l0_reverse,bi_rnn.bias_hh_l0_reverse,
                         h_prev[1])

posted @ 2022-10-19 12:06 放学别跑啊阅读(112) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

bzw

RNN原理及其复现

RNN figures

单层RNN

双向RNN

优点

缺点

应用场景

API

公式

初始化RNN参数

输入参数

输出参数

变量

API实现单层，单向RNN

API实现单层，双向RNN

RNN复现

单层，单向

双向RNN

公告