2019 年 4月 23 日随笔档案 - Ruidongch

2019年4月23日

摘要：该理解建立在Policy Gradient Methods for Reinforcement Learning with Function Approximation论文阅读理解之上首先明确优化目标$\rho(\pi)$,其中策略$\pi$是包含参数$\theta$的未知函数，一般有两种形式。阅读全文

posted @ 2019-04-23 12:37 Ruidongch 阅读(287) 评论(0) 推荐(0) 编辑

Ruidongch

公告