摘要:
## TD learning of state values The data/experience required by the algorithm: - or 阅读全文
摘要:
### the discounted return $$ \begin{aligned} G_t & =R_{t+1}+\gamma R_{t+2}+\gamma^2 R_{t+3}+\ldots \\ & =R_{t+1}+\gamma\left(R_{t+2}+\gamma R_{t+3}+\l 阅读全文
摘要:
# 1.7 Markov decision processes This section presents these concepts in a more formal way under the framework of Markov decision processes (MDPs). An 阅读全文