摘要:
目录引入TD learing of state valuesTD learing of action values SarsaTD learing of action values Expected SarsaTD learing of action values n-step SarsaTD le 阅读全文
摘要:
目录Robbins-Monro algorithmStochastic gradient descentBGD、MBGD、 and SGDSummary Robbins-Monro algorithm 迭代式求平均数的算法 阅读全文