Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge - LZ_Jaja

Link of the Paper: https://arxiv.org/abs/1609.06647

A Correlative Paper: Show and Tell: A Neural Image Caption Generator (Link of the Paper: https://arxiv.org/abs/1411.4555)

Main Points ( Improvements Over the CVPR2015 Model ):

Image Model Improvement: GoogLeNet ( 22 layers ) -> Batch Normalization Model.
Image Model Fine Tuning: fine tuning the image model must be carried after the LSTM parameters have settled on a good language model.
Scheduled Sampling: a fully guided scheme using the true previous word -> a less guided scheme which mostly uses the model generated word instead.
Ensembling
Beam Size Reduction: the best beam size turned out to be small: 3.

posted on 2018-08-14 18:21 LZ_Jaja 阅读(267) 评论(0) 编辑收藏举报

刷新页面返回顶部