Sequence Models - Recurrent Neural Networks
Examples of sequence data:
- Speech recognition
- Music generation
- Sentiment classification
- DNA sequence analysis
- Machine translation
- Video activity recognition
- Name entity recognition
Recurrent Neural Network Model
Why not a standard network?
- Inputs, outputs can be different lengths in different examples.
- Doesn't share features learned across different position of text.
Weakness of RNN
only use the earlier information in sequence
(use Bidirectional RNN instead)
Forward Propagation
- the activation \(g_1\) will often be a \(\tanh\) in choice of RNN
- \(g_2\) will often be
- binary classification problem: \(sigmoid\)
- k-way classification problem: \(softmax\)
Back Propagataion
Different Types of RNN
-
many-to-one architecture:
Sentiment Classification
-
one-to-many architecture
Music Generation
-
many-to-many architecture:
Machine Traslation: input, output can be diffent lengths. (encoder, decoder)
Language Model and Sequence Generation
-
Language modelling
give the probability of a sentence: \(P(\text{setence}) = ?\)
basic job: estimates the probability of sequences \(P(y^{<1>}, \dots, y^{<T_y>})\)
-
Traingning set: large corpus of english text.
- add \(\text{<EOS>}\) at the end of sentence.
- replace the unkown words with \(\text{<UNK>}\)
-
Training with RNN model
replace the \(x^{<i>}\) with \(y^{<i - 1>}\) .
\[P(y^{<1>}, y^{<2>}, y^{<3>}) = P(y^{<1>}) P(y^{<2>} | y^{<1>}) P(y^{<3>} | y^{<1>}, y^{<2>}) \]
Sampling novel sequences
-
Sampling a sequence from a trained RNN
Generate the sentence word by word.
-
Character-level language model
\(\text{Vocabulary = [a, b, c, \dots]}\)
Vanishing gradients with RNNS
Basic RNNs is not very good at capturing long-range dependencies.
Exploding gradients in Backpropagation. (addressed by using
Gated Recurrent Unit(GRU)
GRU(simplified)
\(c = \text{memeory cell}\) and \(c^{<t>} = a^{<t>}\)
\(\tilde{c}^{<t>}\) is a candidate for replacing \(c^{<t>}\)
\(\Gamma_u\) as being either \(0\) or \(1\) most of the time.
if \(\Gamma_u \approx 0\) , the \(c^{<t>}\) is maintained pretty much exactly even across many times that.
- adress vanishing gradient problem
- learn even very long-range dependencies
Full GRU
\(\Gamma_r\) is a standing of relevance
Long Short Term Memory (LSTM)
peephole connection(element-wise): fifth element affect fifth element.
Bidirectional RNN
\(\overrightarrow a^{<t>}\) forward prop
Acyclic graph
BRNN with LSTM blocks would be a pretty reasonable first thing to try
Deep RNNs
Homework: Improvise a Jazz Solo with an LSTM Network
You would like to create a jazz music piece specially for a friend's birthday. However, you don't know how to play any instruments, or how to compose music. Fortunately, you know deep learning and will solve this problem using an LSTM network!
You will train a network to generate novel jazz solos in a style representative of a body of performed work. 😎🎷
There's something coming into me when I saw it... Aye...
Exercise 1 - djmodel
n_values = 90 # number of music values
reshaper = Reshape((1, n_values)) # Used in Step 2.B of djmodel(), below
LSTM_cell = LSTM(n_a, return_state = True) # Used in Step 2.C
densor = Dense(n_values, activation='softmax') # Used in Step 2.D
# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: djmodel
def djmodel(Tx, LSTM_cell, densor, reshaper):
"""
Implement the djmodel composed of Tx LSTM cells where each cell is responsible
for learning the following note based on the previous note and context.
Each cell has the following schema:
[X_{t}, a_{t-1}, c0_{t-1}] -> RESHAPE() -> LSTM() -> DENSE()
Arguments:
Tx -- length of the sequences in the corpus
LSTM_cell -- LSTM layer instance
densor -- Dense layer instance
reshaper -- Reshape layer instance
Returns:
model -- a keras instance model with inputs [X, a0, c0]
"""
# Get the shape of input values
n_values = densor.units
# Get the number of the hidden state vector
n_a = LSTM_cell.units
# Define the input layer and specify the shape
X = Input(shape=(Tx, n_values))
# Define the initial hidden state a0 and initial cell state c0
# using `Input`
a0 = Input(shape=(n_a,), name='a0')
c0 = Input(shape=(n_a,), name='c0')
a = a0
c = c0
### START CODE HERE ###
# Step 1: Create empty list to append the outputs while you iterate (≈1 line)
outputs = []
# Step 2: Loop over tx
for t in range(Tx):
# Step 2.A: select the "t"th time step vector from X.
x = X[:, t, :]
# Step 2.B: Use reshaper to reshape x to be (1, n_values) (≈1 line)
x = reshaper(x)
# Step 2.C: Perform one step of the LSTM_cell
a, _, c = LSTM_cell(x, initial_state=[a, c])
# Step 2.D: Apply densor to the hidden state output of LSTM_Cell
out = densor(a)
# Step 2.E: add the output to "outputs"
outputs.append(out)
# Step 3: Create model instance
model = Model(inputs=[X, a0, c0], outputs=outputs)
### END CODE HERE ###
return model
We will use:
- optimizer: Adam optimizer
- Loss function: categorical cross-entropy (for multi-class classification)
opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit([X, a0, c0], list(Y), epochs=100, verbose = 0)
Exercise 2 - music_inference_model
# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: music_inference_model
def music_inference_model(LSTM_cell, densor, Ty=100):
"""
Uses the trained "LSTM_cell" and "densor" from model() to generate a sequence of values.
Arguments:
LSTM_cell -- the trained "LSTM_cell" from model(), Keras layer object
densor -- the trained "densor" from model(), Keras layer object
Ty -- integer, number of time steps to generate
Returns:
inference_model -- Keras model instance
"""
# Get the shape of input values
n_values = densor.units
# Get the number of the hidden state vector
n_a = LSTM_cell.units
# Define the input of your model with a shape
x0 = Input(shape=(1, n_values))
# Define s0, initial hidden state for the decoder LSTM
a0 = Input(shape=(n_a,), name='a0')
c0 = Input(shape=(n_a,), name='c0')
a = a0
c = c0
x = x0
### START CODE HERE ###
# Step 1: Create an empty list of "outputs" to later store your predicted values (≈1 line)
outputs = []
# Step 2: Loop over Ty and generate a value at every time step
for t in range(Ty):
# Step 2.A: Perform one step of LSTM_cell. Use "x", not "x0" (≈1 line)
a, _, c = LSTM_cell(x, initial_state = [a, c])
# Step 2.B: Apply Dense layer to the hidden state output of the LSTM_cell (≈1 line)
out = densor(a)
# Step 2.C: Append the prediction "out" to "outputs". out.shape = (None, 90) (≈1 line)
outputs.append(out)
# Step 2.D:
# Select the next value according to "out",
# Set "x" to be the one-hot representation of the selected value
# See instructions above.
x = tf.math.argmax(out, axis = -1)
x = tf.one_hot(x, depth = n_values)
# Step 2.E:
# Use RepeatVector(1) to convert x into a tensor with shape=(None, 1, 90)
x = RepeatVector(1)(x)
# Step 3: Create model instance with the correct "inputs" and "outputs" (≈1 line)
inference_model = Model(inputs=[x0, a0, c0], outputs=outputs)
### END CODE HERE ###
return inference_model
Exercise 3 - predict_and_sample
# UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: predict_and_sample
def predict_and_sample(inference_model, x_initializer = x_initializer, a_initializer = a_initializer,
c_initializer = c_initializer):
"""
Predicts the next value of values using the inference model.
Arguments:
inference_model -- Keras model instance for inference time
x_initializer -- numpy array of shape (1, 1, 90), one-hot vector initializing the values generation
a_initializer -- numpy array of shape (1, n_a), initializing the hidden state of the LSTM_cell
c_initializer -- numpy array of shape (1, n_a), initializing the cell state of the LSTM_cel
Returns:
results -- numpy-array of shape (Ty, 90), matrix of one-hot vectors representing the values generated
indices -- numpy-array of shape (Ty, 1), matrix of indices representing the values generated
"""
n_values = x_initializer.shape[2]
### START CODE HERE ###
# Step 1: Use your inference model to predict an output sequence given x_initializer, a_initializer and c_initializer.
pred = inference_model.predict([x_initializer, a_initializer, c_initializer])
# Step 2: Convert "pred" into an np.array() of indices with the maximum probabilities
indices = np.argmax(pred, axis = -1)
# Step 3: Convert indices to one-hot vectors, the shape of the results should be (Ty, n_values)
results = to_categorical(indices, num_classes = x_initializer.shape[-1])
### END CODE HERE ###
return results, indices