博客园 首页 私信博主 显示目录 隐藏目录 管理 凤⭐尘

吴恩达《深度学习》-课后测验-第二门课 (Improving Deep Neural Networks:Hyperparameter tuning, Regularization and Optimization)-Week 3- Hyperparameter tuning, Batch Normalization, Programming Frameworks

Week 3 Quiz - Hyperparameter tuning, Batch Normalization, Programming Frameworks(第三周测验 - 超参数调整,批量标 准化,编程框架)

\1. If searching among a large number of hyperparameters, you should try values in a grid rather than random values, so that you can carry out the search more systematically and not rely on chance. True or False? (如果在大量的超参数中搜索最佳的参数值,那么应该尝试在网格中搜索 而不是使用随机值,以便更系统的搜索,而不是依靠运气,请问这句话是正确的吗?)

【 】False (错误)

【 】True (正确)

答案

False

Note: Try random values, don’t do grid search. Because you don’t know which hyperparamerters are more important than others.And to take an extreme example, let’s say that hyperparameter two was that value epsilon that you have in the denominator of the Adam algorithm. So your choice of alpha matters a lot and your choice of epsilon hardly matters. (请注意:应当尝试随机值,不要使用网格搜 索,因为你不知道哪些超参数比其他的更重要。举一个很极端的例子,就比如在 Adam 算法中防止 除零操作的 ε 的值,一般为 1 的负 8 次方,但是和学习率 α 相比,ε 就显得不那么重要了。)

 

\2. Every hyperparameter, if set poorly, can have a huge negative impact on training, and so all hyperparameters are about equally important to tune well. True or False? (每个超参数如果设置得不好,都会对训练产生巨大的负面影响,因此所有的超参数都要调整好,请问这是正确的吗?)

【 】False (错误)

【 】True (正确) \

答案

False

Note:We’ve seen in lecture that some hyperparameters, such as the learning rate, are more critical than others. (注意:我们在视频中讲到的比如学习率这个超参数比其他的超参数更加重要。)

 

\3. During hyperparameter search, whether you try to babysit one model (“Panda” strategy) or train a lot of models in parallel (“Caviar”) is largely determined by: (在超参数搜索过程中,你 尝试只照顾一个模型(使用熊猫策略)还是一起训练大量的模型(鱼子酱策略)在很大程度 上取决于:)

【 】Whether you use batch or mini-batch optimization (是否使用批量(batch)或小批量优化 (mini-batch optimization))

【 】The presence of local minima (and saddle points) in your neural network (神经网络中局部 最小值(鞍点)的存在性)

【 】The amount of computational power you can access (在你能力范围内,你能够拥有多大 的计算能力)

【 】 The number of hyperparameters you have to tune(需要调整的超参数的数量)

答案

【 】The amount of computational power you can access (在你能力范围内,你能够拥有多大 的计算能力)

 

\4. If you think 𝜷 (hyperparameter for momentum) is between on 0.9 and 0.99, which of the following is the recommended way to sample a value for beta? (如果您认为𝜷(动量超参数) 介于 0.9 和 0.99 之间,那么推荐采用以下哪一种方法来对𝜷值进行取样?)

r = np.random.rand()
beta = 1 - 10 ** (-r - 1)
解析

1-β ∈ [0.01,0.1]-->[10^-1,10^-2]-->10^-1,10^(-1-r)--r∈[0,1] β = 1- 10^(-1-r)

 

\5. Finding good hyperparameter values is very time-consuming. So typically you should do it once at the start of the project, and try to find very good hyperparameters so that you don’t ever have to revisit tuning them again. True or false? (找到好的超参数的值是非常耗时的,所 以通常情况下你应该在项目开始时做一次,并尝试找到非常好的超参数,这样你就不必再次重新调整它们。请问这正确吗?)

【 】 False (错误)

【 】True (正确)

答案

False

Note: Minor changes in your model could potentially need you to find good hyperparameters again from scratch. (请注意:模型中的细微变化可能导致您需要从头开始重新找到好的超参数。)

 

\6. In batch normalization as presented in the videos, if you apply it on the 𝒍 th layer of your neural network, what are you normalizing? (在视频中介绍的批量标准化中,如果将其应用于 神经网络的第𝒍层,那么您怎样进行标准化?)

【 】𝑧 [𝑙]

 

\7. In the normalization formula \(𝒛_{𝒏𝒐𝒓𝒎 }^{(𝒊)} = \frac{𝒛^{ (𝒊)}−𝝁}{ \sqrt{𝝈^𝟐+𝜺}}\) , why do we use epsilon? (在标准化公式 \(𝒛_{𝒏𝒐𝒓𝒎 }^{(𝒊)} = \frac{𝒛^{ (𝒊)}−𝝁}{ \sqrt{𝝈^𝟐+𝜺}}\)中,为什么要使用 epsilon)

【 】To avoid division by zero(为了避免除零操作)

 

8.Which of the following statements about γ and β in Batch Norm are true? Only correct options listed(Batch Norm 中关于 γ 和 β 的以下哪些陈述是正确的?)

【 】They can be learned using Adam, Gradient descent with momentum, or RMSprop, not just with gradient descent. (它们可以在 Adam、具有动量的梯度下降或 RMSprop 使中用,而不仅仅是用梯度下降来学习。)

【 】They set the mean and variance of the linear variable 𝑧 [𝑙] of a given layer.( 它们设定给定层的线性变量𝑧 [𝑙] 的均值和方差)

答案

全对

 

\9. After training a neural network with Batch Norm, at test time, to evaluate the neural network on a new example you should: (在训练具有 Batch Norm 的神经网络之后,在测试时间,在新 样本上评估神经网络,您应该)

【 】Perform the needed normalizations, use 𝜇 and \(𝜎^2\) estimated using an exponentially weighted average across mini-batches seen during training. (执行所需的标准化,在训练期间使用了 𝜇 和 \(𝜎^2\) 的指数加权平均值来估计 mini-batches 的情况。)

 

\10. Which of these statements about deep learning programming frameworks are true? (Check all that apply) (关于深度学习编程框架的这些陈述中,哪一个是正确的?)

【 】A programming framework allows you to code up deep learning algorithms with typically fewer lines of code than a lower-level language such as Python. (通过编程框架,您可以使用比 低级语言(如 Python)更少的代码来编写深度学习算法。)

【 】Even if a project is currently open source, good governance of the project helps ensure that the it remains open even in the long term, rather than become closed or modified to benefit only one company. (即使一个项目目前是开源的,项目的良好管理有助于确保它即使在长期内仍然保持开放,而不是仅仅为了一个公司而关闭或修改)

【 】Deep learning programming frameworks require cloud-based machines to run.( 深度学习 编程框架的运行需要基于云的机器。 )

答案

前两个对

 



Week 3 Code Assignments:

【吴恩达课程使用】anaconda (python 3.7) win10安装 tensorflow 1.8

✧Course 2 - 改善深层神经网络 - 第三周测验 - 超参数调整,批量标 准化,编程框架

assignment 3:TensorFlow Tutorial)

posted @ 2020-01-02 10:33  凤☆尘  阅读(515)  评论(0编辑  收藏  举报