一个AI模型统治所有机器人
原文链接:Robotic Control Module: One AI Model for Any Robot - IEEE Spectrum
A new model can operate virtually any robot design, including arms, quadrupeds, and drones
一个新模型几乎可以操作任何机器人设计,包括机械臂、四足机器人和无人机
UC Berkeley/Carnegie Mellon University
加州大学伯克利分校/卡内基梅隆大学
The software used to control a robot is normally highly adapted to its specific physical set up. But now researchers have created a single general-purpose robotic control policy that can operate robotic arms, wheeled robots, quadrupeds, and even drones.
通常,用于控制机器人的软件需要高度适应其特定的物理设置。但现在,研究人员已经创建了一个通用的机器人控制策略,该策略可以操作机械臂、轮式机器人、四足机器人,甚至无人机。
One of the biggest challenges when it comes to applying machine learning to robotics is the paucity of data. While computer vision and natural language processing can piggyback off the vast quantities of image and text data found on the Internet, collecting robot data is costly and time-consuming.
将机器学习应用于机器人技术时,面临的最大挑战之一是数据的匮乏。虽然计算机视觉和自然语言处理可以依赖互联网上大量的图像和文本数据,但收集机器人数据既昂贵又耗时。
To get around this, there have been growing efforts to pool data collected by different groups on different kinds of robots, including the Open X-Embodiment and DROID datasets. The hope is that training on diverse robotics data will lead to “positive transfer,” which refers to when skills learned from training on one task help to boost performance on another.
为了解决这个问题,人们越来越努力地将不同组在不同类型机器人上收集的数据进行汇总,包括Open X-Embodiment和DROID数据集。人们希望通过对多样化的机器人数据进行训练,能够实现“正向迁移”,即在一个任务上学习到的技能有助于提升在另一个任务上的性能。
The problem is that robots often have very different embodiments—a term used to describe their physical layout and suite of sensors and actuators—so the data they collect can vary significantly. For instance, a robotic arm might be static, have a complex arrangement of joints and fingers, and collect video from a camera on its wrist. In contrast, a quadruped robot is regularly on the move and relies on force feedback from its legs to maneuver. The kinds of tasks and actions these machines are trained to carry out are also diverse: The arm may pick and place objects, while the quadruped needs keen navigation.
问题在于,机器人往往具有非常不同的物理形态——这个词用于描述它们的物理布局以及传感器和执行器的组合——因此它们收集的数据可能会有很大差异。例如,机械臂可能是静态的,具有复杂的关节和手指排列,并从手腕上的摄像头收集视频。相比之下,四足机器人则经常处于移动状态,并依赖腿部的力反馈进行操控。这些机器被训练执行的任务和动作也各不相同:机械臂可能用于抓取和放置物体,而四足机器人则需要敏锐的导航能力。
That makes training a single AI model on these large collections of data challenging, says Homer Walke, a Ph.D. student at the University of California, Berkeley. So far, most attempts have either focused on data from a narrower selection of similar robots or researchers have manually tweaked data to make observations from different robots more similar. But in research to be presented at the Conference on Robot Learning (CoRL) in Munich in November, they unveiled a new model called CrossFormer that can train on data from a diverse set of robots and control them just as well as specialized control policies.
加州大学伯克利分校的博士生荷马·沃克表示,这使得在这些庞大的数据集上训练一个单一的AI模型变得具有挑战性。到目前为止,大多数尝试要么集中在来自更窄范围相似机器人的数据上,要么研究人员手动调整数据以使来自不同机器人的观测结果更加相似。但在11月慕尼黑机器人学习大会(CoRL)上公布的研究中,他们推出了一种名为CrossFormer的新模型,该模型可以在来自不同机器人的数据集上进行训练,并且控制效果与专门的控制策略一样好。
How to control diverse robots with the same AI model
如何使用相同的AI模型控制不同的机器人
The team used the same model architecture that powers large language model, known as a transformer. In many ways, the challenge the researchers were trying to solve is not dissimilar to that facing a chatbot, says Walke. In language modeling, the AI has to to pick out similar patterns in sentences with different lengths and word orders. Robot data can also be arranged in a sequence much like a written sentence, but depending on the particular embodiment, observations and actions vary in length and order too.
该团队使用了与大型语言模型(称为转换器)相同的模型架构。沃克说,从很多方面来看,研究人员试图解决的挑战与聊天机器人面临的挑战并无不同。在语言建模中,人工智能必须挑选出不同长度和词序的句子中的相似模式。机器人数据也可以像书面句子一样按顺序排列,但根据具体实现方式的不同,观测值和动作的长度和顺序也会有所不同。
“Words might appear in different locations in a sentence, but they still mean the same thing,” says Walke. “In our task, an observation image might appear in different locations in the sequence, but it’s still fundamentally an image and we still want to treat it like an image.”
沃克说:“单词可能出现在句子中的不同位置,但它们的意思仍然相同。在我们的任务中,观测图像可能出现在序列中的不同位置,但它本质上仍然是一张图像,我们仍然希望像处理图像一样处理它。”
Most machine learning approaches work through a sequence one element at a time, but transformers can process the entire stream of data at once. This allows them to analyze the relationship between different elements and makes them better at handling sequences that are not standardized, much like the diverse data found in large robotics datasets.
大多数机器学习方法都是通过一次处理序列中的一个元素来工作的,但转换器可以一次性处理整个数据流。这使得它们能够分析不同元素之间的关系,并且更擅长处理非标准化的序列,就像大型机器人数据集中发现的多样化数据一样。
Walke and his colleagues aren’t the first to train transformers on large-scale robotics data. But previous approaches have either trained solely on data from robotic arms with broadly similar embodiments or manually converted input data to a common format to make it easier to process. In contrast, CrossFormer can process images from cameras positioned above a robot, at head height or on a robotic arms wrist, as well as joint position data from both quadrupeds and robotic arms, without any tweaks.
沃克和他的同事并不是第一批在大规模机器人数据上训练转换器的人。但以前的方法要么仅对大体上相似的机械臂的数据进行训练,要么手动将输入数据转换为通用格式以使其更易于处理。相比之下,CrossFormer可以处理来自机器人上方、头部高度或机械臂手腕的摄像头拍摄的图像,以及来自四足动物和机械臂的关节位置数据,而无需进行任何调整。
The result is a single control policy that can operate single robotic arms, pairs of robotic arms, quadrupeds, and wheeled robots on tasks as varied as picking and placing objects, cutting sushi, and obstacle avoidance. Crucially, it matched the performance of specialized models tailored for each robot and outperformed previous approaches trained on diverse robotic data. The team even tested whether the model could control an embodiment not included in the dataset—a small quadcopter. While they simplified things by making the drone fly at a fixed altitude, CrossFormer still outperformed the previous best method.
结果是一个单一的控制策略,可以操作单个机械臂、成对的机械臂、四足动物和轮式机器人,执行从抓取和放置物体、切割寿司到避开障碍物等各种任务。至关重要的是,它的性能与为每个机器人量身定制的专用模型相匹配,并且优于之前在各种机器人数据上训练的方法。研究团队甚至测试了该模型是否可以控制数据集中未包含的一种形态——一架小型四轴飞行器。虽然他们通过让无人机在固定高度飞行来简化了操作,但CrossFormer的表现仍然优于之前最好的方法。
“That was definitely pretty cool,” says Ria Doshi, an undergraduate student at Berkeley. “I think that as we scale up our policy to be able to train on even larger sets of diverse data, it’ll become easier to see this kind of zero shot transfer onto robots that have been completely unseen in the training.”
伯克利大学本科生里亚·多西说:“这绝对非常酷。我认为,随着我们扩大政策规模,能够在更大的多样化数据集上进行训练,将更容易看到这种零样本迁移到在训练中完全未见过的机器人上。”
The limitations of one AI model for all robots
一个AI模型用于所有机器人的局限性
The team admits there’s still work to do, however. The model is too big for any of the robots’ embedded chips and instead has to be run from a server. Even then, processing times are only just fast enough to support real-time operation, and Walke admits that could break down if they scale up the model. “When you pack so much data into a model it has to be very big and that means running it for real-time control becomes difficult.”
然而,研究团队承认仍有工作要做。该模型对于任何机器人的嵌入式芯片来说都太大了,因此必须从服务器运行。即便如此,处理时间也只是勉强能够支持实时操作,沃克承认,如果扩大模型规模,可能会崩溃。“当你把这么多数据放入模型中时,它必须非常大,这意味着进行实时控制变得困难。”
One potential workaround would be to use an approach called distillation, says Oier Mees, a postdoctoral research at Berkley and part of the CrossFormer team. This essentially involves training a smaller model to mimic the larger model, and if successful can result in similar performance for a much smaller computational budget.
伯克利大学博士后研究员、CrossFormer团队成员之一奥耶·米斯表示,一个潜在的解决方案是使用一种称为蒸馏的方法。这基本上涉及训练一个较小的模型来模仿较大的模型,如果成功,可以在更小的计算成本下获得相似的性能。
But of more importance than the computing resource problem is that the team failed to see any positive transfer in their experiments, as CrossFormer simply matched previous performance rather than exceeding it. Walke thinks progress in computer vision and natural language processing suggests that training on more data could be the key.
但比计算资源问题更重要的是,研究团队在他们的实验中没有看到任何积极的迁移,因为CrossFormer只是与之前的性能相匹配,而没有超过它。沃克认为,计算机视觉和自然语言处理方面的进展表明,在更多数据上进行训练可能是关键。
Others say it might not be that simple. Jeannette Bohg, a professor of robotics at Stanford University, says the ability to train on such a diverse dataset is a significant contribution. But she wonders whether part of the reason why the researchers didn’t see positive transfer is their insistence on not aligning the input data. Previous research that trained on robots with similar observation and action data has shown evidence of such cross-overs. “By getting rid of this alignment, they may have also gotten rid of this significant positive transfer that we’ve seen in other work,” Bohg says.
其他人认为这可能没那么简单。斯坦福大学机器人学教授珍妮特·博格说,能够在如此多样化的数据集上进行训练是一个重要的贡献。但她想知道,研究人员没有看到积极迁移的部分原因是否是他们坚持不对输入数据进行对齐。之前的研究在具有相似观测和动作数据的机器人上进行了训练,并显示了这种跨界的证据。“通过消除这种对齐,他们可能也消除了我们在其他工作中看到的这种显著的积极迁移,”博格说。
It’s also not clear if the approach will boost performance on tasks specific to particular embodiments or robotic applications, says Ram Ramamoorthy, a robotics professor at Edinburgh University. The work is a promising step towards helping robots capture concepts common to most robots, like “avoid this obstacle,” he says. But it may be less useful for tackling control problems specific to a particular robot, such as how to knead dough or navigate a forest, which are often the hardest to solve.
爱丁堡大学机器人学教授拉姆·拉马穆蒂表示,目前尚不清楚这种方法是否会提高针对特定形态或机器人应用的任务的性能。他说,这项工作是帮助机器人捕捉大多数机器人共有的概念(如“避开这个障碍物”)的一个有前途的步骤。但对于解决特定于某个机器人的控制问题,如如何揉面团或在森林中导航等,这种方法可能用处不大,而这些问题往往是最难解决的。
产品名称 |
京东店铺 |
智能佳Mobile ALOHA2 机械臂 完整套装 斯坦福ALOHA 深度学习 家政服务ROS开源实验平台 高端复合机器人 ALOHA 2机械臂 |
|
智能佳机械臂 Mobile ALOHA 斯坦福机械臂 完整复刻版 复合机器人 远程操控机械臂ROS开源学习实验平台 Mobile ALOHA 机械臂 |
https://item.jd.com/10100493559285.html |
智能佳机器人
400 099 1872
淘宝店铺:首页-智能佳机器人-淘宝网 (taobao.com)
企业淘宝:首页-智能佳机器人官方店铺-淘宝网 (taobao.com)