PaperReading20200223

CanChen ggchen@mail.ustc.edu.cn


 

AdaBatch

  • Motivation: Current stochastic gradient descend methods use fixed batchsize. Small batchsize with small learning rate leads to fast convergence while large batchsize offers more parallelism. This paper proposes AdaBatch, during which we can change the batchsize.
  • Method: Increasing batchsize equals to decreasing learning rate under some approximations. With this formula, the author did several experiments and proved that increasing batchsize progressively maintain the test accuracy within 1% while providing more parallelism.
  • Contribution: The paper gives us some engineering experience which can be very helpful.
 

“You might also like this model”

  • Motivation: Current network performance prediction methods focus on a fixed dataset while different datasets have different features.
  • Method: This paper proposes a recommendation system for unknown datasets, which consists of three parts, namely, network encoder, dataset similarity extractor and network performance predictor. To get network encoding presentation, this paper views a certain network architecuture as a sentence and proposes sentence a prediction task and a sentence perplexity task.
  • Contribution: Compared with previous works, the paper takes dataset similarity into consideration.
posted @ 2020-02-23 22:33  Klaus-Chen  阅读(69)  评论(0编辑  收藏  举报