2019 年 4月 7 日随笔档案 - Ruidongch

2019年4月7日

文献笔记:Policy Gradient Methods for Reinforcement Learning with Function Approximation

摘要：本篇论文讨论了策略梯度方法的函数逼近问题。首先明确策略梯度的目标函数：最大化策略$\pi$下的累计回报$\rho(\pi)$ \[\rho ( \pi ) = E \left\{ \sum _ { t = 1 } ^ { \infty } \gamma ^ { t - 1 } r _ { t } | 阅读全文

posted @ 2019-04-07 08:12 Ruidongch 阅读(2241) 评论(0) 推荐(0) 编辑

Ruidongch

公告