摘要: ## TD learning of state values The data/experience required by the algorithm: - (s0,r1,s1,,st,rt+1,st+1,) or 阅读全文
posted @ 2023-08-13 16:47 鸽鸽的书房 阅读(19) 评论(0) 推荐(0) 编辑
摘要: ### the discounted return $$ \begin{aligned} G_t & =R_{t+1}+\gamma R_{t+2}+\gamma^2 R_{t+3}+\ldots \\ & =R_{t+1}+\gamma\left(R_{t+2}+\gamma R_{t+3}+\l 阅读全文
posted @ 2023-08-13 16:05 鸽鸽的书房 阅读(10) 评论(0) 推荐(0) 编辑
摘要: # 1.7 Markov decision processes This section presents these concepts in a more formal way under the framework of Markov decision processes (MDPs). An 阅读全文
posted @ 2023-08-13 15:30 鸽鸽的书房 阅读(8) 评论(0) 推荐(0) 编辑
点击右上角即可分享
微信分享提示