Genesis: A generative and simulated physical realm for general-purpose embodied-AI learning.
Genesis 是一款用于具身智能的生成式仿真平台,目前仍在开发调试阶段。其具备自主生成仿真场景的能力,并支持可微仿真 differentiable simulation。Genesis-Embodied-AI/Genesis
在论文 Towards Generalist Robots: A Promising Paradigm via Generative Simulation 作者介绍了其对 generative simulation 的思考。
1. generative simulation
1.1 The Question
在论文中,作者指出,多模态大模型(Multimodal Fundation Model, MFM)在诸多领域 virtual world 完成各项任务。然而,在物理世界 physical world,机器人或具身智能体需要执行 physical actions 并与环境交互。由于现有的 MFM 仅由 virtual data 训练,仍不具备 physical action 的能力。
此外,过去十余年,相关的机器人研究主要集中在两类:1)improving robots’ high-level cognition, reasoning and planning capabilities; 2) enabling robots with low-level motor skills. 其中,得益于 MFM 的发展,目前已基本能解决 high-level cognition and planning 等问题;但仍缺乏 diverse range of low-level motor skills, which can be seamless activated in response to specific sub-tasks generated by high-level planners given a high-level task or goal. 也就是说,笔者认为,现有的 MFM 已经能够比较好的实现 high-level congnition, reasoning and planning 等能力了,但对于繁杂多样的 low-level motor skills,仍不具备这样的能力。
鉴于大模型能力的提升 achieved by simply scaling up language and vision models,笔者认为,不断提升模型尺寸并提供大规模数据,可能是实现 generalist robots 最简单有效的途径。因此,我们面临的问题就是,How do we scale up data collection for robotics to the level that matches the data scale used in training existing large language and vision models, thereby enabling robots to master a variety of low-level motor skills across diverse task settings with a unified neural policy?
1.2 The Idea
笔者认为,依靠从现实世界中收集数据存在许多局限,large-scale simulation 是可能的一个替代方案。但是,现有的诸多研究中, simulation-trained policies 的模式 limited to specific tasks and short horizons,原因有以下两个方面:1)RL or trajectory optimization is effective for generating short-horizon task-specific low-level motor skills, but effectively extending them to long-horizon tasks involving sub-goal planning and skill switching still remains as an open research problem; and 2) there lacks a mechanism to scale up the diversity of tasks and scenes in simulation.
然而笔者意识到:1) existing MFMs can help decompose a high level task into sub-tasks with arbitrarily fine granularity, which allows control policies to concentrate on accomplishing low-level sub-tasks, and 2) they can further help significantly scale up the diversity of task and scene configurations in simulation, via automated task and scene generation in simulation.
也就是说,可以利用现有的 MFM 来做这件事。generating diverse low-level sub-task descriptions, and subsequently task-specific scene components, configurations, and training supervisions such as reward or loss functions.
1.3 How
在论文中,作者给出了实现 generative simulation 的范式。通常分为以下几个阶段:
-
Sub-task, including a) given high-level goals and b) given scene configurations. 也就是有两种途径:a)给定 high-level goals 后,由 MFM 生成 low-level task;或者 b)给定场景设置 scene configuration,由 MFM 生成该场景下会有哪些 low-level sub-tasks.
-
Scene component. 生成 sub-task 后,接下来便可生成所对应场景中所包含的元素有哪些。
-
Scene configuration, including a) bottom-up (in language space), and b) top-down (in image space). 也是两种方式:a)得到场景中包含的元素 component 后,由 MFM 生成场景的空间布置;或者 b)由 MFM 生成场景的图片,再由图片转化为场景布置。
-
Textures, object, and dynamics. 这部分涉及到仿真的资产 assets,可以生成,也可以基于一些现有的数据集,等等。
-
Reward or loss functions. 同样,可以通过 MFM 生成所需要的 reward and loss function.
最后,作者认为:Once diverse data for a vast number of individual tasks and configurations are obtained, we can start training a large-scale policy model by either learning from scratch by practicing in the generated scenes, or training task-specific policies first and distill them (e.g. via behavior cloning) into a single generalist policy model conditioned on visual input and language instructions.
1.4 Limitations
例如需要通用仿真平台等。
(待续)