2019 年 7月 30 日随笔档案 - Junfei_Wang

2019年7月30日

Temporal-Difference Learning for Prediction

摘要： In Monte Carlo Learning, we've got the estimation of value function: Gt is the episode return from time t, which can be calculated by: Please recall, 阅读全文

posted @ 2019-07-30 11:01 Junfei_Wang 阅读(208) 评论(0) 推荐(0) 编辑

Rhys_Wang

公告