2023 年 7月 10 日随笔档案 - initial_h

2023年7月10日

Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

摘要： **发表时间：**2021 (NeurIPS 2021) **文章要点：**理论表明，更高的hindsight TD error，更加on policy,以及更准的target Q value的样本应该有更高的采样权重（The theory suggests that data with highe 阅读全文

posted @ 2023-07-10 12:53 initial_h 阅读(94) 评论(0) 推荐(0) 编辑

initial_h

https://github.com/initial-h

公告