【论文阅读】IROS2022: Dynamics-Aware Spatiotemporal Occupancy Prediction in Urban Environments
0.参考与前言
完整题目: Dynamics-Aware Spatiotemporal Occupancy Prediction in Urban Environments
论文链接:https://arxiv.org/abs/2209.13172
代码链接:无
缩写:occupancy grid map (OGM), sensor grid maps (SGMs), residual grid maps (RGMs)
1. Motivation
任务:detection and segmentation of moving obstacles
在同一个框架里实现这两个功能: ① detects and segments 场景里的动态障碍物;② predict the spatiotemporal evolution of the environment
形式为:occupancy-based environment representations
The OGMs discretize the environment into grid cells and consider the binary free or occupied hypotheses. Each cell in the OGMs contains the belief of its respective occupancy probability
对于被遮挡的表示呢,有 evidential occupancy grid map (eOGM) [2], where each cell carries an additional information on the occluded occupancy hypothesis in its information channel, in addition to the occupied and free channels
因为其和RGB images的相似 discretized spatial structure 所以大多预测方法也可以直接使用
Related work
[3] 基于卷积LSTM 重新提出了predictive coding network,可以很好的捕获 static dynamic 但是 suffers from vanishing dynamic objects in the predictions at longer time horizons.
[6] 基于 [3] 的PredNet develop a double-prong model,一个prong用作静止的OGMs [ motion model 学习静止环境的相对运动 ],另一个主要接收动态的OGMs;输出为两个prongs合起来;缺点是需要比较精准的物体检测和跟踪信息
本篇主要是extend [6] 的工作,将static dynamic object segmentation 与 prediction一起做,这样就不需要检测和跟踪先验结果了
[11] Range images are used as an intermediate representation of the point clouds to reduce computational complexity
Contribution
方法上: segmentation with SalsaNext [11, 12]
- We develop a method that integrates static-dynamic object segmentation and local environment prediction together, without assuming knowledge of static and dynamic objects in the scene.
- We propose using an occupancy-based environment representation across the entire system to enable direct integration.
2. Method
与[11] range image不同的是,这里SGM和RGM作为输入
- 首先像[3] 一样 使用Markov random field去掉地面
- 接下来分为两个部分:static-dynamic object segmentation 用来将静态和动态分开
- prediction module,预测未来的OGMs
2.1 Framework
OGMs 是根据 [3] 生成的
2.2 Static-Dynamic Object Segmentation
输入是:SGMs 和 RGMs;输出为 discretized dynamic masks
SGMs 为 \(\R^{W \times H}\) 每个cell三个状态 free occupied occluded,使用ray tracing 可以决定 free space and occupancy class
RGMs 为 \(\R^{W \times H}\) 根据现在时刻和过去的SGMs生成的,past SGMs 先根据ego motion 转到当前坐标系下,然后对比cells里的状态变化,如果从一个已知类到另一个 则设为1,否则为0;注意这里我们并不考虑 occluded 部分
然后RGMs concatenated到current SGMs上去 成为一个extra channel;所以整体上是SGM提供temporal info,RGM提供spatial info
输出为:一个二进制的mask,\(M_d \in \R^{W \times H}\) 1 代表 dynamic 0 代表static
2.3 Environment Prediction
dynamic mask是\(M_d\) static 就是:\(1-M_d\)
eOGMs是一种alternative representations使用Dempster–Shafer Theory (DST)来更新grid cells
Each allowable hypothesis is associated with its corresponding Dempster–Shafer belief mass, which represents the degree of occupancy belief in that cell [10].
eOGMs 是 \(\R^{W \times H \times C}\) number of channels 包含 Dempster-Shafer beilef masses for occupied \(m({O}) \in [0,1]\) and free \(m({F}) \in [0,1]\) 也就是两个通道
3. 实验及结果
Setting
实验设置:分辨率为0.33,长度范围为42mx42m,分辨率的选择主要是为了each vehicle is covered by a sufficient number of cells
model 首先用 SGMs 和 RGMs 分割出环境中的 动静态物体,RGMs 由 t 和 t-5 (0.5s eariler) 的 SGMs生成;0.6, 0.1, 0.3 train, validation and test 分布
prediction model 收到 static and dynamic OGMs生成 mask 作为输入,输出为 环境的整个 complete OGM predictions,整体是 20 连续帧,也就是2s的驾驶数据
[6] 里是 使用 过去 5帧OGM去预测未来15帧OGM;而本文根据 [8] 建议 训练分为两种模式,一种是根据现在的OGM预测下一帧OGM;第二种则是:finetuning the model to recursively predict the next 15 OGMs,然后权重参数由上一个模式初始化而来
部署的实时性在 i7-5930K 3.5GHz 和 一块TITAN X上为 82ms (12 Hz)
Results
评估指标
- MSE metric is used to assess how well the predicted occupancy probability for each cell corresponds to its ground truth value.
- IS metric is used to measure how well the structure of the scene is maintained in the OGM predictions. To calculate the IS metric, the minimum Manhattan distance is calculated between two grid cells (one from the target OGM and the other from the predicted OGM) with the same occupancy classes (occupied, free, and unknown)
定量表格
定性分析
原文中的 Future work will consider incorporating semantic segmentation as well. We hypothesize that the model can perform better if given the ability to learn semantics in the scene, which can help with predicting the motion models of different object types.
碎碎念
IROS 2022 到时候还是 online 听一下这个 October 26, 2022 15:00-15:10, Paper WeB-3.3
Our IoU metric over the static and moving objects are 99:5% and 54:5%, respectively, and the average IoU is 77:0%.
-
问了作者 好像是 cell size 会把一些动静态点一起分到一个,但是因为zoom的声音原因 我并没有能听清所有的部分…
-
看了很多这个领域的 好像都是这个
文中提到的参考 大多是开源了的
- [3] Dynamic Environment Prediction in Urban Scenes using Recurrent Representation Learning https://github.com/mitkina/EnvironmentPrediction
- [6] Double-Prong ConvLSTM for Spatiotemporal Occupancy Prediction in Dynamic Environments https://github.com/sisl/Double-Prong-Occupancy
赠人点赞 手有余香 😆;正向回馈 才能更好开放记录 hhh