Number of Epochs |
20 |
Depends on scenario |
Number of times the whole dataset is passed forward and backward through the network |
Batch Size |
32(32, 64, 128, 256) |
Depends on scenario and hardware |
Number of input images (and corresponding labels) that are transferred to device memory at once and then processed simultaneously. Default values are chosen such that a network with up to 100 classes fits onto a device with 8 GB memory. If trained on GPU, set as high as permitted by memory. See also the additional information below. |
Learning Rate (λ) |
0.001(0.01, 0.001, 0.0001) |
0 < λ < 1 |
Determines the weight of the gradient on the updated loss function arguments; other name: step size. Too large values might result in divergence of the algorithm; very small values will take unnecessarily many steps (compare the figure Progress of Top-1 Error for Different Values of Learning Rate). You can configure to adapt (decrease) the learning rate after a certain number of epochs. See also Finding a Value for the Learning Rate. |
Momentum (μ) |
0.9(0.5-0.9) |
0 ≤ μ < 1 |
Fraction of the previous update step (vector) to add to the current step This parameter can help to attenuate the fluctuation of the loss function. |
Weight Prior (α) |
0 |
0 ≤ α < 1 |
Regularization parameter penalizing large weights, used to prevent overfitting Start with a low value (e.g., 0.00001) and increase if overfitting occurs. |