摘要: ## TD learning of state values The data/experience required by the algorithm: - $\left(s_0, r_1, s_1, \ldots, s_t, r_{t+1}, s_{t+1}, \ldots\right)$ or 阅读全文
posted @ 2023-08-13 16:47 鸽鸽的书房 阅读(14) 评论(0) 推荐(0) 编辑
摘要: ### the discounted return $$ \begin{aligned} G_t & =R_{t+1}+\gamma R_{t+2}+\gamma^2 R_{t+3}+\ldots \\ & =R_{t+1}+\gamma\left(R_{t+2}+\gamma R_{t+3}+\l 阅读全文
posted @ 2023-08-13 16:05 鸽鸽的书房 阅读(9) 评论(0) 推荐(0) 编辑
摘要: # 1.7 Markov decision processes This section presents these concepts in a more formal way under the framework of Markov decision processes (MDPs). An 阅读全文
posted @ 2023-08-13 15:30 鸽鸽的书房 阅读(5) 评论(0) 推荐(0) 编辑