摘要: **发表时间:**2021 (NeurIPS 2021) **文章要点:**理论表明,更高的hindsight TD error,更加on policy,以及更准的target Q value的样本应该有更高的采样权重(The theory suggests that data with highe 阅读全文
posted @ 2023-07-10 12:53 initial_h 阅读(94) 评论(0) 推荐(0) 编辑