RNN原理及其复现

RNN figures

单层RNN

记忆单元和过去相关和未来无关

image-20220927142722642

双向RNN

既能看到过去,也能看到未来

image-20220927142818015

优点

  • 可以处理变长序列

这个是DNN和CNN处理不了的,DNN的话输入特征是固定的,CNN的话,kernel不光和kernel_size有关还有输入通道数有关。之所以能够处理变长序列的原因是因为这些w1,w2,w3权重在每一个时刻是相等的,正因为这些权重无论在输入还是在记忆单元连接,还是历史信息和当前信息连接,权重都是固定的,正是因为权重在每一个时刻共享,所以RNN才能处理变长序列,一旦把共享的w去掉了,就不能处理变长序列了。

  • 模型大小与序列长度无关

  • 计算量与序列长度星线性增长

  • 考虑历史信息

  • 便于流式输出

每计算一步就可以输出

  • 权重时不变

缺点

  • 串行计算比较慢

  • 无法获取太长的历史信息

应用场景

  • Al诗歌生成

one to many任务

image-20220927144327095

  • 文本情感分类

many to one任务

image-20220927144405477

  • 词法识别

many to many 任务,识别每个单词的词性

image-20220927144507172

  • 机器翻译

many to many 任务,seq2seq

image-20220927144549090

  • 语音识别/合成

  • 语言模型

API

RNN — PyTorch 1.12 documentation

公式

image-20220927145011783

初始化RNN参数

  • input_size

输入特征的维度

  • hidden_size

每一个时刻ht的大小

  • num_layers

RNN的层数,可以堆叠多层

  • nonlinearity

激活函数,默认是tanh,可以用relu

  • bias

偏置值

  • batch_first

决定输入和输出的格式,为True的话,格式是这个样子的 (batch, seq, feature) ,为False的话,格式是这个样子的 (seq, batch, feature),很好的解释了沐神为什么在转换维度

  • dropout

  • bidirectional

为True的话,可以构造双向RNN结构,输出就是2×hidden_size,因为是双向,所以有2个隐藏状态

输入参数

image-20220927150816829

  • input

如果batch_first为True的话,输入特征的shape要求为

image-20220927150717653

如果batch_first为False的话,输入特征的shape要求为

image-20220927150750436

上述为有batch,没有batch的shape为

image-20220928100101439

  • h_0

初始状态要求有batch的shape为

image-20220927150532761

没有batch的

image-20220928100030710

输出参数

  • output

如果batch_first为True的话,输出shape为

image-20220927153145972

如果batch_first为False的话,输出shape为

image-20220927153214098

没有batch

image-20220928100243005

  • h_n

最后时刻的状态,输出shape为

image-20220927153319055

没有batch

image-20220928100255357

变量

  • RNN.weight_ih_l[k]:第k层输入层的权重,shape为(hidden_size, input_size)
  • RNN.weight_hh_l[k]:第k层隐藏层的权重,shape为 (hidden_size, hidden_size)
  • RNN.bias_ih_l[k] :第k层输入层的偏置,shape为(hidden_size)
  • RNN.bias_hh_l[k]:第k层隐藏层的偏置,shape为(hidden_size)

API实现单层,单向RNN

参数名
input_size 4
hidden_size 3
num_layers 1
batch_size 1
seqLength 2
D 1
batch_first True
import torch
import torch.nn as nn

single_rnn = nn.RNN(4,3,1,batch_first=True) # input_size * hidden_size * num_layers

input_x = torch.randn(1,2,4) # batch_size * seqLength * input_size
output,h_n = single_rnn(input_x)
output # batch_size * seqLength * (D * hidden_size)

#tensor([[[ 0.6475, -0.3831, -0.0288],
#         [ 0.4701, -0.7224, -0.1448]]], grad_fn=<TransposeBackward1>)
h_n # (D * num_layers) * batch_size * hidden_size

# tensor([[[ 0.4701, -0.7224, -0.1448]]], grad_fn=<StackBackward0>)

API实现单层,双向RNN

参数名
input_size 4
hidden_size 3
num_layers 1
batch_size 1
seqLength 2
D 2
batch_first True
bidirectional True
bidirectional_rnn = nn.RNN(4,3,1,batch_first=True,bidirectional=True) # input_size * hidden_size * num_layers

bi_output,bi_h_n = bidirectional_rnn(input_x)

bi_output # batch_size * seqLength * (D * hidden_size)
#tensor([[[ 0.2967,  0.0377, -0.3754,  0.1055, -0.8097,  0.5296],
#         [ 0.6000, -0.3587, -0.2720, -0.2106, -0.3747,  0.4135]]],grad_fn=<TransposeBackward1>)

bi_h_n # (D * num_layers) * batch_size * hidden_size
#tensor([[[ 0.6000, -0.3587, -0.2720]],
#        [[ 0.1055, -0.8097,  0.5296]]], grad_fn=<StackBackward0>)

RNN复现

单层,单向

def rnn_forward(input,weight_ih,weight_hh,bias_ih,bias_hh,h_prev):
    batch_size,T,input_size = input.shape
    h_dim = weight_ih.shape[0] # 隐藏层维度
    h_out = torch.zeros(batch_size,T,h_dim) # 初始化一个输出(状态)矩阵
    
    for t in range(T):
        #x = input[:,t,:] # 获取当前时刻输入 batch_size * input_size
        x = input[:,t,:].unsqueeze(2) # 做bmm运算时,需要对x扩维变成batch_size * input_size*1
        
        # 这里给weight扩维度,主要是因为x里面有batch,
        w_ih_batch = weight_ih.unsqueeze(0).tile(batch_size,1,1) # batch_size*h_dim*input_size
        w_hh_batch = weight_hh.unsqueeze(0).tile(batch_size,1,1) # batch_size*h_dim*h_dim
        
        w_time_x = torch.bmm(w_ih_batch,x).squeeze(-1)# 因为x是二维,需要x扩维 batch_size*h_dim
        # 上一时刻的隐藏状态
        w_time_h = torch.bmm(w_hh_batch,h_pre.unsqueeze(2)).squeeze(-1) #batch_size*h_dim
        
        h_prev = torch.tanh(w_time_x+biash_ih+w_time_h+bias_hh)
        
        h_out[:,t,:] = h_prev
    return h_out,h_prev.unsqueeze(0)
batch_size,T = 2,3
input_size,hidden_size = 2,3
input = torch.randn(batch_size,T,input_size)
h_prev = torch.randn(batch_size,hidden_size)

weight_ih = torch.randn(hidden_size,input_size)
weight_hh = torch.randn(hidden_size,hidden_size)
bias_ih = torch.randn(hidden_size)
bias_hh = torch.randn(hidden_size)

r = rnn_forward(input,weight_ih,weight_hh,bias_ih,bias_hh,h_prev)
r

image-20220928123316220

  • c[:,1,:]

变为2维的

image-20220927184157503

  • tile用法

只传入一个参数

x = torch.tensor([[1, 2], [3, 4]])
print(x.tile((2, )))
>>> tensor([[1, 2, 1, 2],
            [3, 4, 3, 4]])
# 只有一个参数,可以理解最后一个维度里面的元素进行复制,所以就是把1,2复制两边就是1,2,1,2 把3,4复制两边就是3,4,3,4
# 这里最后一个维度是[1,2],[3,4],里面的元素分别是1,2,3,4

传入两个参数

x = torch.tensor([[1, 2], [3, 4]])

print(x.tile((2, 2)))
>>> tensor([[1, 2, 1, 2],
            [3, 4, 3, 4],
            [1, 2, 1, 2],
            [3, 4, 3, 4]])
# 传入两个参数,首先对最后一个维度里面的元素复制2遍就是[1,2,1,2],[3,4,3,4],然后把最后一个维度的上一个维度的元素复制两遍就是
# [[1, 2, 1, 2],
#  [3, 4, 3, 4],
#  [1, 2, 1, 2],
#  [3, 4, 3, 4]]

传入三个参数

x = torch.randn(2, 2)
print(x)
>>> tensor([[ 1.1165, -0.5559],
            [-0.6341,  0.5215]])

print(x.tile((2, 2, 2)))
>>> tensor([[[ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215],
             [ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215]],

            [[ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215],
             [ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215]]])

# 传入3个参数,首先对最后一个维度里面的元素复制2遍就是[1.1165, -0.5559,  1.1165, -0.5559],然后对最后一个维度的上一个维度里面的元素复制2遍,就是
			[[ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215],
             [ 1.1165, -0.5559,  1.1165, -0.5559],
             [-0.6341,  0.5215, -0.6341,  0.5215]]
# 接着,对到倒数第3个维度里的元素进行复制,这里没有倒数第三个维度,所以创建一个维度,然后再把倒数第三个维度的值复制两遍

双向RNN

# 手写一个bidirectional
def bidirectional_rnn_forward(input,weight_ih,weight_hh,bias_ih,bias_hh,h_prev,
                              weight_ih_reverse,weight_hh_reverse,bias_ih_reverse,bias_hh_reverse,
                             h_prev_reverse):
    batch_size,T,input_size = input.shape
    # 隐藏层的维度
    h_dim = weight_ih.shape[0]
    # output
    h_out = torch.zeros(batch_size,T,h_dim*2) # 初始化一个输出,注意双向是两倍的特征大小
    
    # 正向层,这里对结果取[0]只取output
    forward_output = rnn_forward(input,weight_ih,weight_hh,bias_ih,bias_hh,h_prev)[0]
    
    # 反向层
    backward_output = rnn_forward(torch.flip(input,(1,)),weight_ih,weight_hh,bias_ih,bias_hh,h_prev)[0]
    
    # 有两个dim,分别赋值
    h_out[:,:,:h_dim] = forward_output
    h_out[:,:,h_dim:] = backward_output
    
    return h_out,h_out[:,-1,:].reshape((batch_size,2,h_dim)).transpose(0,1) # 交换维度
# API实现双向
# 验证以下bidirectional_rnn_forward正确性
bi_rnn = nn.RNN(input_size,hidden_size,batch_first=True,bidirectional=True)
h_prev = torch.zeros(2,batch_size,hidden_size) # 这个2表示双向RNN,h_prev[0]是正向,h_prev[1]是反向
bi_rnn_output,bi_state_final = bi_rnn(input,h_prev)
print(bi_rnn_output)
print(bi_state_final)

image-20220929152013576

  • torch.flip

你传入那个dim,我就对那个dim反转,这里

image-20220929145107901

image-20220929145204354

# 查看网络的参数
for k,v in bi_rnn.named_parameters():
    print(k,v)

image-20220929152053445

# 直接把api的bidirectional的weight拿过用到自己实现的
bidirectional_rnn_forward(input,bi_rnn.weight_ih_l0,bi_rnn.weight_hh_l0,
                         bi_rnn.bias_ih_l0,bi_rnn.bias_hh_l0,h_prev[0],
						 bi_rnn.weight_ih_l0_reverse,bi_rnn.weight_hh_l0_reverse,
                         bi_rnn.bias_ih_l0_reverse,bi_rnn.bias_hh_l0_reverse,
                         h_prev[1])

image-20220929152114460

posted @ 2022-10-19 12:06  放学别跑啊  阅读(112)  评论(0编辑  收藏  举报