人形机器人 —— 强化学习:站立和行走的奖励函数设置
相关:
Reward Shaping
General Configuration for Standing
A general configuration for standing involves ensuring that the original URDF (Unified Robot Description Format) model is set to fulfill the standing position. The goal is to minimize deviation from this original position during training.
If necessary, an orientation reward can be included to encourage the robot to maintain an upright posture. This can be achieved by adding a term to the reward function that penalizes deviations from the desired orientation.
Walking Rewards
For training the robot to walk, we have an additional set of rewards that are added to the standing rewards. Crucially, maintaining the original standing position accounts for 80% of the total reward during initial training, which ensures the policy first learns a stable standing position. This is essential since standing represents the base distribution from which other behaviors must develop.
Forward Velocity Reward: This reward encourages the robot to move forward. It can be defined as a function of the robot’s forward velocity, but is weighted to be less significant initially to prevent premature optimization for walking before stability is achieved.
Additional rewards such as feet clearance and contact forces are crucial for achieving sim2real transfer and handling various real-world properties like friction coefficients. These rewards ensure the policy learns realistic locomotion patterns that can translate to physical robots. The action smoothness reward particularly helps generate commands that are feasible for real-world actuators to execute under typical PID control schemes.
机器人站立时的奖励函数,站立的位置不能离初始位置太远,根据站立后的水平位置距离初始位置的距离进行惩罚设置;站立后的朝向方向应该是直立的,根据站立后机器人的姿态与直立姿态的差距进行惩罚奖励设置。
训练机器人行走时的奖励需要设置为多个阶段,如初始阶段和正常阶段,初始阶段时可以认为是主要进行站立训练,这时候机器人站立后的位置与初始位置距离造成的惩罚是总奖励惩罚的80%,此时以训练站立效果为主;随着训练站立效果比较成熟后逐渐增加行走方面的奖励,此时减少站立的奖励所在总奖励的比例,行走时的奖励包括行走的速度奖励,但是为了使机器人行走稳定,因此行走部分的奖励比重不大,驱动器(电机)的输出的平滑程度也是需要考虑的奖励和惩罚,可以根据机器人驱动器距离PID控制时的输出值的距离判定其平滑程度,从而给予奖励和惩罚。
posted on 2024-12-06 23:17 Angry_Panda 阅读(151) 评论(0) 编辑 收藏 举报
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· winform 绘制太阳,地球,月球 运作规律
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
2023-12-06 如何看待以色列的“全民兵役制度”
2023-12-06 韩国网费比其他国家贵10倍?—— 因网费太高,直播平台 Twitch 宣布2024年2月退出韩国市场
2022-12-06 机器学习领域中假设检验的使用
2020-12-06 【转载】 CNN训练Cifar-10技巧
2020-12-06 【转载】 ReLu(Rectified Linear Units)激活函数