Sequence Models - Recurrent Neural Networks

Examples of sequence data:

  • Speech recognition
  • Music generation
  • Sentiment classification
  • DNA sequence analysis
  • Machine translation
  • Video activity recognition
  • Name entity recognition

Recurrent Neural Network Model

Why not a standard network?

  • Inputs, outputs can be different lengths in different examples.
  • Doesn't share features learned across different position of text.

Weakness of RNN

only use the earlier information in sequence

(use Bidirectional RNN instead)

Forward Propagation

RNN

a(t)=g1(waaa<t1>+waxx<t>+ba)=g(wa[a<t1>x<t>]+ba)y(t)=g2(wyaa<t>+by)

  • the activation g1 will often be a tanh​ in choice of RNN
  • g2​ will often be
    • binary classification problem: sigmoid
    • k-way classification problem: softmax

Back Propagataion

Backprop

L<t>(y^<t>,y<t>)=y<t>logy^<t>(1y<t>)log(1y^<t>)L(y^,y)=t=1TyL<t>(y^<t>,y<t>)

Different Types of RNN

Backprop
  • many-to-one architecture:

    Sentiment Classification

  • one-to-many architecture

    Music Generation

  • many-to-many architecture:

    Machine Traslation: input, output can be diffent lengths. (encoder, decoder)

Language Model and Sequence Generation

  • Language modelling

    give the probability of a sentence: P(setence)=?

    basic job: estimates the probability of sequences P(y<1>,,y<Ty>)

  • Traingning set: large corpus of english text.

    • add <EOS> at the end of sentence.
    • replace the unkown words with <UNK>
  • Training with RNN model

    replace the x<i> with y<i1> .

    P(y<1>,y<2>,y<3>)=P(y<1>)P(y<2>|y<1>)P(y<3>|y<1>,y<2>)

Sampling novel sequences

Sampling_novel_sequences
  • Sampling a sequence from a trained RNN

    Generate the sentence word by word.

  • Character-level language model

    Vocabulary = [a, b, c, \dots]

Vanishing gradients with RNNS

Basic RNNs is not very good at capturing long-range dependencies.

Exploding gradients in Backpropagation. (addressed by using

Gated Recurrent Unit(GRU)

GRU(simplified)

GRU

c=memeory cell and c<t>=a<t>

c~<t>=tanh(wc[c<t1>x<t>]+bc)Γu=σ(wu[c<t1>x<t>]+bu)[0,1]c<t>=Γu×c~<t>+(1Γu)×c<t1>(element-wise)

c~<t> is a candidate for replacing c<t>

Γu as being either 0 or 1 most of the time.

if Γu0 , the c<t> is maintained pretty much exactly even across many times that.

  • adress vanishing gradient problem
  • learn even very long-range dependencies

Full GRU

c~<t>=tanh(wc[Γr×c<t1>x<t>]+bc)Γu=σ(wu[c<t1>x<t>]+bu)[0,1]Γr=σ(wr[c<t1>x<t>]+br)c<t>=Γu×c~<t>+(1Γu)×c<t1>(element-wise)a<t>=c<t>

Γr is a standing of relevance

Long Short Term Memory (LSTM)

LSTM

c~<t>=tanh(wc[a<t1>x<t>]+bc)Γu=σ(wu[a<t1>x<t>]+bu)[0,1](update)Γf=σ(wf[a<t1>x<t>]+bf)(forget)Γo=σ(wo[a<t1>x<t>]+bo)(output)c<t>=Γu×c~<t>+Γf×c<t1>(element-wise)a<t>=Γo×tanh(c<t>)

peephole connection(element-wise): fifth element affect fifth element.

Bidirectional RNN

a<t> forward prop

Acyclic graph

y^<t>=g(wy[a<t>x<t>]+by)

BRNN with LSTM blocks would be a pretty reasonable first thing to try

Deep RNNs

DeepRNN

Homework: Improvise a Jazz Solo with an LSTM Network

You would like to create a jazz music piece specially for a friend's birthday. However, you don't know how to play any instruments, or how to compose music. Fortunately, you know deep learning and will solve this problem using an LSTM network!

You will train a network to generate novel jazz solos in a style representative of a body of performed work. 😎🎷

There's something coming into me when I saw it... Aye...

Exercise 1 - djmodel

music_generation
n_values = 90 # number of music values reshaper = Reshape((1, n_values)) # Used in Step 2.B of djmodel(), below LSTM_cell = LSTM(n_a, return_state = True) # Used in Step 2.C densor = Dense(n_values, activation='softmax') # Used in Step 2.D
# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) # GRADED FUNCTION: djmodel def djmodel(Tx, LSTM_cell, densor, reshaper): """ Implement the djmodel composed of Tx LSTM cells where each cell is responsible for learning the following note based on the previous note and context. Each cell has the following schema: [X_{t}, a_{t-1}, c0_{t-1}] -> RESHAPE() -> LSTM() -> DENSE() Arguments: Tx -- length of the sequences in the corpus LSTM_cell -- LSTM layer instance densor -- Dense layer instance reshaper -- Reshape layer instance Returns: model -- a keras instance model with inputs [X, a0, c0] """ # Get the shape of input values n_values = densor.units # Get the number of the hidden state vector n_a = LSTM_cell.units # Define the input layer and specify the shape X = Input(shape=(Tx, n_values)) # Define the initial hidden state a0 and initial cell state c0 # using `Input` a0 = Input(shape=(n_a,), name='a0') c0 = Input(shape=(n_a,), name='c0') a = a0 c = c0 ### START CODE HERE ### # Step 1: Create empty list to append the outputs while you iterate (≈1 line) outputs = [] # Step 2: Loop over tx for t in range(Tx): # Step 2.A: select the "t"th time step vector from X. x = X[:, t, :] # Step 2.B: Use reshaper to reshape x to be (1, n_values) (≈1 line) x = reshaper(x) # Step 2.C: Perform one step of the LSTM_cell a, _, c = LSTM_cell(x, initial_state=[a, c]) # Step 2.D: Apply densor to the hidden state output of LSTM_Cell out = densor(a) # Step 2.E: add the output to "outputs" outputs.append(out) # Step 3: Create model instance model = Model(inputs=[X, a0, c0], outputs=outputs) ### END CODE HERE ### return model

We will use:

  • optimizer: Adam optimizer
  • Loss function: categorical cross-entropy (for multi-class classification)
opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.01) model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy']) history = model.fit([X, a0, c0], list(Y), epochs=100, verbose = 0)

Exercise 2 - music_inference_model

music_gen
# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) # GRADED FUNCTION: music_inference_model def music_inference_model(LSTM_cell, densor, Ty=100): """ Uses the trained "LSTM_cell" and "densor" from model() to generate a sequence of values. Arguments: LSTM_cell -- the trained "LSTM_cell" from model(), Keras layer object densor -- the trained "densor" from model(), Keras layer object Ty -- integer, number of time steps to generate Returns: inference_model -- Keras model instance """ # Get the shape of input values n_values = densor.units # Get the number of the hidden state vector n_a = LSTM_cell.units # Define the input of your model with a shape x0 = Input(shape=(1, n_values)) # Define s0, initial hidden state for the decoder LSTM a0 = Input(shape=(n_a,), name='a0') c0 = Input(shape=(n_a,), name='c0') a = a0 c = c0 x = x0 ### START CODE HERE ### # Step 1: Create an empty list of "outputs" to later store your predicted values (≈1 line) outputs = [] # Step 2: Loop over Ty and generate a value at every time step for t in range(Ty): # Step 2.A: Perform one step of LSTM_cell. Use "x", not "x0" (≈1 line) a, _, c = LSTM_cell(x, initial_state = [a, c]) # Step 2.B: Apply Dense layer to the hidden state output of the LSTM_cell (≈1 line) out = densor(a) # Step 2.C: Append the prediction "out" to "outputs". out.shape = (None, 90) (≈1 line) outputs.append(out) # Step 2.D: # Select the next value according to "out", # Set "x" to be the one-hot representation of the selected value # See instructions above. x = tf.math.argmax(out, axis = -1) x = tf.one_hot(x, depth = n_values) # Step 2.E: # Use RepeatVector(1) to convert x into a tensor with shape=(None, 1, 90) x = RepeatVector(1)(x) # Step 3: Create model instance with the correct "inputs" and "outputs" (≈1 line) inference_model = Model(inputs=[x0, a0, c0], outputs=outputs) ### END CODE HERE ### return inference_model

Exercise 3 - predict_and_sample

# UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) # GRADED FUNCTION: predict_and_sample def predict_and_sample(inference_model, x_initializer = x_initializer, a_initializer = a_initializer, c_initializer = c_initializer): """ Predicts the next value of values using the inference model. Arguments: inference_model -- Keras model instance for inference time x_initializer -- numpy array of shape (1, 1, 90), one-hot vector initializing the values generation a_initializer -- numpy array of shape (1, n_a), initializing the hidden state of the LSTM_cell c_initializer -- numpy array of shape (1, n_a), initializing the cell state of the LSTM_cel Returns: results -- numpy-array of shape (Ty, 90), matrix of one-hot vectors representing the values generated indices -- numpy-array of shape (Ty, 1), matrix of indices representing the values generated """ n_values = x_initializer.shape[2] ### START CODE HERE ### # Step 1: Use your inference model to predict an output sequence given x_initializer, a_initializer and c_initializer. pred = inference_model.predict([x_initializer, a_initializer, c_initializer]) # Step 2: Convert "pred" into an np.array() of indices with the maximum probabilities indices = np.argmax(pred, axis = -1) # Step 3: Convert indices to one-hot vectors, the shape of the results should be (Ty, n_values) results = to_categorical(indices, num_classes = x_initializer.shape[-1]) ### END CODE HERE ### return results, indices

__EOF__

本文作者zjp_shadow
本文链接https://www.cnblogs.com/zjp-shadow/p/15139528.html
关于博主:评论和私信会在第一时间回复。或者直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
声援博主:如果您觉得文章对您有帮助,可以点击文章右下角推荐一下。您的鼓励是博主的最大动力!
posted @   zjp_shadow  阅读(767)  评论(0编辑  收藏  举报
编辑推荐:
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
阅读排行:
· 单线程的Redis速度为什么快?
· 展开说说关于C#中ORM框架的用法!
· Pantheons:用 TypeScript 打造主流大模型对话的一站式集成库
· SQL Server 2025 AI相关能力初探
· 为什么 退出登录 或 修改密码 无法使 token 失效
点击右上角即可分享
微信分享提示