深度学习 - 训练相关的超参数

参数说明

Parameter Default(常用值) Range Synopsis/Recommendation
Number of Epochs 20 Depends on scenario Number of times the whole dataset is passed forward and backward through the network
Batch Size 32(32, 64, 128, 256) Depends on scenario and hardware Number of input images (and corresponding labels) that are transferred to device memory at once and then processed simultaneously. Default values are chosen such that a network with up to 100 classes fits onto a device with 8 GB memory. If trained on GPU, set as high as permitted by memory. See also the additional information below.
Learning Rate (λ) 0.001(0.01, 0.001, 0.0001) 0 < λ < 1 Determines the weight of the gradient on the updated loss function arguments; other name: step size. Too large values might result in divergence of the algorithm; very small values will take unnecessarily many steps (compare the figure Progress of Top-1 Error for Different Values of Learning Rate). You can configure to adapt (decrease) the learning rate after a certain number of epochs. See also Finding a Value for the Learning Rate.
Momentum (μ) 0.9(0.5-0.9) 0 ≤ μ < 1 Fraction of the previous update step (vector) to add to the current step This parameter can help to attenuate the fluctuation of the loss function.
Weight Prior (α) 0 0 ≤ α < 1 Regularization parameter penalizing large weights, used to prevent overfitting Start with a low value (e.g., 0.00001) and increase if overfitting occurs.

方法

  • Manual Search
  • Grid Search
  • Random Search
  • Bayesian Optimization

参考

posted @ 2022-02-12 15:55  郑大峰  阅读(59)  评论(0编辑  收藏  举报