2023 年 8月 13 日随笔档案 - 鸽鸽的书房

2023年8月13日

摘要： ## TD learning of state values The data/experience required by the algorithm: - $\left(s_0, r_1, s_1, \ldots, s_t, r_{t+1}, s_{t+1}, \ldots\right)$ or 阅读全文

posted @ 2023-08-13 16:47 鸽鸽的书房阅读(14) 评论(0) 推荐(0) 编辑

【RL】CH2-Bellman equation

摘要： ### the discounted return $$ \begin{aligned} G_t & =R_{t+1}+\gamma R_{t+2}+\gamma^2 R_{t+3}+\ldots \\ & =R_{t+1}+\gamma\left(R_{t+2}+\gamma R_{t+3}+\l 阅读全文

posted @ 2023-08-13 16:05 鸽鸽的书房阅读(9) 评论(0) 推荐(0) 编辑

【RL】CH1-Basic Concepts

摘要： # 1.7 Markov decision processes This section presents these concepts in a more formal way under the framework of Markov decision processes (MDPs). An 阅读全文

posted @ 2023-08-13 15:30 鸽鸽的书房阅读(5) 评论(0) 推荐(0) 编辑

鸽鸽的书房

端庄厚重，谦卑含容；戒骄戒惰，但求有恒。

公告