gym中的action_repeat

Specifically, we average performance over 10 random seeds, and reduce the number of training observations inverse proportionally to the action repeat value.

——— SAC_AE

这里的意思是，dm_control中的 action_repeat是个超参数，一般情况下都是根据之前的论文中的超参数设置的。但要搞清楚action_repeat、frame_stack和env step之间的关系：一般env step是真实的环境运行步数，也即是action_repeat=1时和环境交互的实际步数，我们假设智能体在1000000 step时收敛，则训练步数和环境交互的步数都是1000000.

可当action_repeat=2时，则意味着智能体的一个动作会重复执行2次，即采取同样的动作和环境连续交互2次，这时训练步数则是 1000000/2=500000。即只需要训练500000步即可。

同理，当repeat_action越来越大时，则训练步数则会成比例减少。

frame_stack mumber 则是和训练无关，主要是关系到训练的输入数据的格式。比如，如果fames_stack mumber 是3，则输入训练的数据格式为：3*84*84.

posted @ 2022-08-21 20:48 呦呦南山阅读(118) 评论(0) 编辑收藏举报

刷新页面返回顶部

gym中的action_repeat

公告