2021 年 10月 11 日随笔档案 - initial_h

2021年10月11日

Decoupling Value and Policy for Generalization in Reinforcement Learning

摘要： **发表时间：**2021（ICML2021） **文章要点：**这篇文章想说，通常在训练PG这类算法特别是图像作为输入的任务的时候，主流的做法是policy和value用一个网络表征，没有分开。这会导致policy overfitting，因为学value比学policy需要更多的信息，如果用一个阅读全文

posted @ 2021-10-11 11:36 initial_h 阅读(194) 评论(0) 推荐(0) 编辑

initial_h

https://github.com/initial-h

公告