随笔档案「2018年11月2日」：[Reinforcement Learning] Policy Gradient... - Poll的笔记

2018年11月2日

[Reinforcement Learning] Policy Gradient Methods

摘要：上一篇博文的内容整理了我们如何去近似价值函数或者是动作价值函数的方法： $$ V_{\theta}(s)\approx V^{\pi}(s) \\ Q_{\theta}(s)\approx Q^{\pi}(s, a) $$ 通过机器学习的方法我们一旦近似了价值函数或者是动作价值函数就可以通过一些策略阅读全文

posted @ 2018-11-02 09:52 Poll的笔记阅读(7153) 评论(3) 推荐(1)

Poll的笔记

[三叶草精神] what hurts more,the pain of hard work or the pain of regret?

公告