摘要: 目录引入TD learing of state valuesTD learing of action values SarsaTD learing of action values Expected SarsaTD learing of action values n-step SarsaTD le 阅读全文
posted @ 2024-10-29 21:10 cxy8 阅读(99) 评论(0) 推荐(0) 编辑
摘要: 目录Robbins-Monro algorithmStochastic gradient descentBGD、MBGD、 and SGDSummary Robbins-Monro algorithm 迭代式求平均数的算法 Stochasticapproximation(SA) 阅读全文
posted @ 2024-10-29 14:02 cxy8 阅读(90) 评论(0) 推荐(0) 编辑
摘要: 目录MC BasicMC Exploring StartsMC Epsilon-Greedy MC Basic 从modelbaseReinforcementlearning过渡到modelfree的\(\: Reinforceme 阅读全文
posted @ 2024-10-29 09:44 cxy8 阅读(76) 评论(0) 推荐(0) 编辑
点击右上角即可分享
微信分享提示