-ResNet is used to extract image features,obtain the output of the specified layer. Build the mapping for the description to generate the word vector, and then use the LSTM output. After full connection, predict the following words and calculate cross entropy loss. 

posted on 2020-10-24 08:08  黑暗尽头的超音速炬火  阅读(84)  评论(0编辑  收藏  举报