吴恩达《深度学习》-课后测验-第五门课序列模型(Sequence Models)-Week 1: Recurrent Neural Networks(第一周测验：循环神经网络)

Week 1 Quiz: Recurrent Neural Networks(第一周测验：循环神经网络)

\1. Suppose your training examples are sentences (sequences of words). Which of the following refers to the jth word in the ith training example?( 假设你的训练样本是句子(单词序列)，下面哪个选项指的是第𝑖个训练样本中的第𝑗个词?)

【】 $𝑥^{(𝑖)<𝑗>} $

【】 $𝑥^{<𝑖>(𝑗) }$

【】 $𝑥^{(𝑗)<𝑖>}$

【】 $𝑥^{<𝑗>(𝑖)}$

答案

【★】 𝑥(𝑖)<𝑗> (注：首先获取第𝑖个训练样本(用括号表示)，然后到𝑗列获取单词(用括尖括号表示)。)

\2. Consider this RNN: This specific type of architecture is appropriate when: (看一下下面的这个循环神经网络：在下面的条件中，满足上图中的网络结构的参数是)

【】$𝑇_𝑥 = 𝑇_𝑦 $

【】 $𝑇_𝑥 < 𝑇_𝑦 $

【】$𝑇_𝑥 > 𝑇_𝑦$

【】$ 𝑇_𝑥 = 1 $

答案

【★】𝑇𝑥 = 𝑇𝑦

\3. To which of these tasks would you apply a many-to-one RNN architecture? (Check all that apply). (上图中每一个输入都与输出相匹配。这些任务中的哪一个会使用多对一的 RNN 体系结构？)

【】 Speech recognition (input an audio clip and output a transcript) (语音识别（输入语音，输出文本）。)

【】 Sentiment classification (input a piece of text and output a 0/1 to denote positive or negative sentiment) (情感分类（输入一段文字，输出 0 或 1 表示正面或者负面的情绪）。)

【】Image classification (input an image and output a label) (图像分类（输入一张图片，输出对应的标签）。)

【】 Gender recognition from speech (input an audio clip and output a label indicating the speaker’s gender) (人声性别识别（输入语音，输出说话人的性别）。)

答案

【★】 Sentiment classification (input a piece of text and output a 0/1 to denote positive or negative sentiment) (情感分类（输入一段文字，输出 0 或 1 表示正面或者负面的情绪）。)

【★】 Gender recognition from speech (input an audio clip and output a label indicating the speaker’s gender) (人声性别识别（输入语音，输出说话人的性别）。)

\4. You are training this RNN language model.

At the $𝑡^{th}$ time step, what is the RNN doing? Choose the best answer.( 假设你现在正在训练下面这个 RNN 的语言模型：在𝑡时，这个 RNN 在做什么？)

【】 Estimating 𝑃($𝑦^{<1>}, 𝑦^{<2>},… , 𝑦^{<𝑡−1>}$)( 计算 𝑃($𝑦^{<1>}, 𝑦^{<2>},… , 𝑦^{<𝑡−1>}$) )

【】 Estimating 𝑃($𝑦^{<𝑡>}$) (计算 𝑃($𝑦^{<𝑡>}$) )

【】Estimating $𝑃(𝑦 ∣ 𝑦^{<1>}, 𝑦^{<2>},… , 𝑦^{<𝑡>})$ ( 计算 $𝑃(𝑦 ∣ 𝑦^{<1>}, 𝑦^{<2>},… , 𝑦^{<𝑡>})$)

【】Estimating $𝑃(𝑦 ∣ 𝑦^{<1>}, 𝑦^{<2>}, … , 𝑦^{<𝑡−1>})$( 计算 $𝑃(𝑦 ∣ 𝑦^{<1>}, 𝑦^{<2>}, … , 𝑦^{<𝑡−1>})$)

答案

Yes,in a language model we try to predict the next step based on the knowledge of all prior steps

【★】Estimating 𝑃(𝑦 ∣ 𝑦 <1>, 𝑦 <2>,… , 𝑦 <𝑡>) ( 计算 𝑃(𝑦 ∣ 𝑦 <1>, 𝑦 <2>, … , 𝑦 <𝑡-1>) )

\5. You have finished training a language model RNN and are using it to sample random sentences, as follows: What are you doing at each time step 𝑡?( 你已经完成了一个语言模型 RNN 的训练，并用它来对句子进行随机取样，如下图：在每个时间步𝑡都在做什么？)

【】Use the probabilities output by the RNN to pick the highest probability word for that time-step as $𝑦^{<𝑡>}$. (ii) Then pass the ground-truth word from the training set to the next timestep. ((1)使用 RNN 输出的概率，选择该时间步的最高概率单词作为$𝑦^{<𝑡>}$，(2)然后将训练集中的正确的单词传递到下一个时间步。)

【】Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as $𝑦^{<𝑡>}$. (ii) Then pass the ground-truth word from the training set to the next timestep. ((1)使用由 RNN 输出的概率将该时间步的所选单词进行随机采样作为 $𝑦^{<𝑡>}$，(2)然后将训练集中的实际单词传递到下一个时间步。)

【】Use the probabilities output by the RNN to pick the highest probability word for that time-step as $𝑦^{<𝑡>}$. (ii) Then pass this selected word to the next time-step. ((1)使用由 RNN 输出的概率来选择该时间步的最高概率词作为 $𝑦^{<𝑡>}$，(2)然后将该选择的词传递给下一个时间步。)

【】 Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as $𝑦^{<𝑡>}$. (ii) Then pass this selected word to the next time-step. ((1)使用 RNN 该时间步输出的概率对单词随机抽样的结果作为$𝑦^{<𝑡>}$，(2)然后将此选定单词传递给下一个时间步。)

答案

【★】 (i) Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as 𝑦 <𝑡>. (ii) Then pass this selected word to the next time-step. ((1)使用 RNN 该时间步输出的概率对单词随机抽样的结果作为𝑦 <𝑡>，(2)然后将此选定单词传递给下一个时间步。)

\6. You are training an RNN, and find that your weights and activations are all taking on the value of NaN (“Not a Number”). Which of these is the most likely cause of this problem?

【】 Vanishing gradient problem. (梯度消失。)

【】Exploding gradient problem. (梯度爆炸。)

【】 ReLU activation function g(.) used to compute g(z), where z is too large. (ReLU 函数作为激活函数𝑔(. )，在计算𝑔(𝑧)时，𝑧的数值过大了。)

【】 Sigmoid activation function g(.) used to compute g(z), where z is too large. (Sigmoid 函数作为激活函数𝑔(. )，在计算𝑔(𝑧)时，𝑧的数值过大了。)

答案

【★】Exploding gradient problem. (梯度爆炸。)

7.Suppose you are training a LSTM. You have a 10000 word vocabulary, and are using an LSTM with 100-dimensional activations a. What is the dimension of Γu at each time step?( 假设你正在训练一个 LSTM 网络，你有一个 10,000 词的词汇表，并且使用一个激活值维度为 100 的 LSTM 块，在每一个时间步中，𝛤𝑢的维度是多少？)

【】 1

【】 100

【】300

【】 10000

答案

【★ 】 100

(注: 𝛤𝑢的向量维度等于 LSTM 中隐藏单元的数量。)

\8. Here’re the update equations for the GRU. Alice proposes to simplify the GRU by always removing the $𝛤_𝑢$. I.e., setting $𝛤_𝑢$ = 1. Betty proposes to simplify the GRU by removing the $𝛤_𝑟$ . I. e., setting $𝛤_𝑟$＝1 always. Which of these models is more likely to work without vanishing gradient problems even when trained on very long input sequences?( 这里有一些 GRU 的更新方程：爱丽丝建议通过移除$𝛤_𝑢$来简化 GRU，即设置$𝛤_𝑢$ = 1。贝蒂提出通过移除$𝛤_𝑟$来简化 GRU，即设置$𝛤_𝑟$＝1。哪种模型更容易在梯度不消失问题的情况下训练，即使在很长的输入序列上也可以进行训练？)

【】 Alice’s model (removing $𝛤_𝑢$), because if $𝛤_𝑟$ ≈ 0 for a timestep, the gradient can propagate back through that timestep without much decay. (爱丽丝的模型（即移除 𝛤𝑢），因为对于一个时间步而言，如果𝛤𝑟 ≈ 0，梯度可以通过时间步反向传播而不会衰减。)

【】Alice’s model (removing 𝛤𝑢), because if 𝛤𝑟 ≈ 1 for a timestep, the gradient can propagate back through that timestep without much decay. (爱丽丝的模型（即移除 𝛤𝑢），因为对于一个时间步而言，如果𝛤𝑟 ≈ 1，梯度可以通过时间步反向传播而不会衰减。)

【】 Betty’s model (removing 𝛤𝑟 ), because if 𝛤u ≈ 0 for a timestep, the gradient can propagate back through that timestep without much decay. (贝蒂的模型（即移除𝛤𝑟），因为对于一个时间步而言，如果𝛤u ≈ 0，梯度可以通过时间步反向传播而不会衰减。)

【】Betty’s model (removing 𝛤𝑟 ), because if 𝛤u ≈ 1 for a timestep, the gradient can propagate back through that timestep without much decay. (贝蒂的模型（即移除𝛤𝑟），因为对于一个时间步而言，如果𝛤u ≈ 1，梯度可以通过时间步反向传播而不会衰减。)

答案

【★ 】 Betty’s model (removing 𝛤𝑟 ), because if 𝛤u ≈ 0 for a timestep, the gradient can propagate back through that timestep without much decay. (贝蒂的模型（即移除𝛤𝑟），因为对于一个时间步而言，如果𝛤u ≈ 0，梯度可以通过时间步反向传播而不会衰减。)

(注：要使信号反向传播而不消失，我们需要𝑐 <𝑡>高度依赖于𝑐 <𝑡−1>。)

\9. Here are the equations for the GRU and the LSTM: From these, we can see that the Update Gate and Forget Gate in the LSTM play a role similar to _ and __ in the GRU. What should go in the the blanks?( 这里有一些 GRU 和 LSTM 的方程: 从这些我们可以看到，在 LSTM 中的更新门和遗忘门在 GRU 中扮演类似___与___的角色，空白处应该填什么？)

【】𝛤𝑢 与 1 − 𝛤𝑢 (𝛤𝑢 与 1 − 𝛤𝑢 )

【】𝛤𝑢与 𝛤𝑟 (𝛤𝑢与 𝛤𝑟 )

【】1 − 𝛤𝑢 与𝛤𝑢 (1 − 𝛤𝑢 与𝛤𝑢 )

【】𝛤𝑟 与𝛤𝑢 (𝛤𝑟 与𝛤𝑢 )

答案

【★】𝛤𝑢 与 1 − 𝛤𝑢 (𝛤𝑢 与 1 − 𝛤𝑢 )

\10. You have a pet dog whose mood is heavily dependent on the current and past few days’ weather. You’ve collected data for the past 365 days on the weather, which you represent as a sequence as $𝑥^{<1>},… , 𝑥^{<365>}$. You’ve also collected data on your dog’s mood, which you represent as $𝑦^{<1>}, … , 𝑦^{<365>}$. You’d like to build a model to map from x→y. Should you use a Unidirectional RNN or Bidirectional RNN for this problem?( 你有一只宠物狗，它的心情很大程度上取决于当前和过去几天的天气。你已经收集了过去 365 天的天气数据 $𝑥^{<1>},… , 𝑥^{<365>}$，这些数据是一个序列，你还收集了你的狗心情的数据$𝑦^{<1>}, … , 𝑦^{<365>}$，你想建立一个模型来从𝑥到𝑦进行映射，你应该使用单向 RNN 还是双向 RNN 来解决这个问题？)

【】Bidirectional RNN, because this allows the prediction of mood on day 𝑡 to take into account more information. (双向 RNN，因为在𝑡日的情绪预测中可以考虑到更多的信息。)

【】 Bidirectional RNN, because this allows backpropagation to compute more accurate gradients.( 双向 RNN，因为这允许反向传播计算中有更精确的梯度。)

【】 Unidirectional RNN, because the value of $𝑦^{<𝑡>}$ depends only on $𝑥^{<1>},… , 𝑥^{<𝑡>}$ ,but not on $𝑥^{<𝑡+1>},… , 𝑥^{<365>}$ (单向 RNN，因为$𝑦^{<𝑡>}$的值仅依赖于$𝑥^{<1>},… , 𝑥^{<𝑡>}$，而不依赖于 $𝑥^{<𝑡+1>},… , 𝑥^{<365>}$。)

【】Unidirectional RNN, because the value of $𝑦^{<𝑡>}$ depends only on $𝑥^{<𝑡>}$, and not other days’ weather.( 单向 RNN，因为$𝑦^{<𝑡>}$的值只取决于$𝑥^{<𝑡>}$，而不是其他天的天气。)

答案

【★】 Unidirectional RNN, because the value of 𝑦 <𝑡> depends only on 𝑥 <1>,… , 𝑥 <𝑡> ,but not on 𝑥 <𝑡+1>,… , 𝑥 <365> (单向 RNN，因为𝑦 <𝑡>的值仅依赖于𝑥 <1>,… , 𝑥 <𝑡>，而不依赖于 𝑥 <𝑡+1>,… , 𝑥 <365>。)

꧁༺ʚ凤⭐尘ɞ༻꧂

吴恩达《深度学习》-课后测验-第五门课序列模型(Sequence Models)-Week 1: Recurrent Neural Networks(第一周测验：循环神经网络)

Week 1 Quiz: Recurrent Neural Networks(第一周测验：循环神经网络)

Week 1 Code Assignments：

✧Course 5 -序列模型(Sequence Models)- 第一周测验 - 循环神经网络

✦assignment 1：Building your Recurrent Neural Network - Step by Step)

✦assignment 2：Character level language model - Dinosaurus land)

✦assignment 3：Improvise a Jazz Solo with an LSTM Network)

公告

꧁༺ʚ凤⭐尘ɞ༻꧂

吴恩达《深度学习》-课后测验-第五门课 序列模型(Sequence Models)-Week 1: Recurrent Neural Networks(第一周测验：循环神经网络)

Week 1 Quiz: Recurrent Neural Networks(第一周测验：循环神经网络)

Week 1 Code Assignments：

✧Course 5 -序列模型(Sequence Models)- 第一周测验 - 循环神经网络

✦assignment 1：Building your Recurrent Neural Network - Step by Step)

✦assignment 2：Character level language model - Dinosaurus land)

✦assignment 3：Improvise a Jazz Solo with an LSTM Network)

公告

吴恩达《深度学习》-课后测验-第五门课序列模型(Sequence Models)-Week 1: Recurrent Neural Networks(第一周测验：循环神经网络)