GPU下train 模型出现nan

When training on GPU, the error "Model diverged with loss = NaN" is often caused by a sotmax that's getting a symbol larger than vocab_size

   

posted @ 2019-01-31 22:01  simple_wxl  阅读(866)  评论(0编辑  收藏  举报