【论文阅读】IROS2017: Voxblox & RAL2019: Voxblox++

IROS2017: Voxblox & RAL2019: Voxblox++
Status: Finished
Type: RAL
Year: 2019
组织/Sensor: ETH-ASL

参考与前言

此文档涵盖了两篇内容，从2017年IROS的voxblox到2019年RAL的voxblox++，但是主要重点在voxblox哈~

论文链接：https://arxiv.org/abs/1611.03631 and https://arxiv.org/abs/1903.00268

代码链接：https://github.com/ethz-asl/voxblox and https://github.com/ethz-asl/voxblox-plusplus

voxblox的文档链接：https://voxblox.readthedocs.io/en/latest/index.html

后续关于语义的也可以看看，同ETH-ASL这篇：

ICRA2022: Panoptic Multi-TSDFs: a Flexible Representation for Online Multi-resolution Volumetric Mapping and Long-term Dynamic Scene Consistency

voxblox实现了三种不同形式的积分策略：

Fast

Merged 应对大场景，将多个voxels捆绑在一起进行投影

Simple 直接遍历的操作，很淳朴简单哈

与Octomap相比运行时间 Octomap对每个voxel都进行映射，但是voxblox面对大规模场景时候可以(使用Merged策略) 对voxel进行捆绑映射，能够在节省运行时间同时精度不产生明显下降。版权声明：本文为CSDN博主「憨憨2号」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。原文链接：https://blog.csdn.net/qq_45401419/article/details/125125993

【语义地图】voxblox++ :Volumetric Semantic Mapping
使用一个在线的已经定位了的RGBD摄像机的扫描，能够增量式构建volumetric object-level 的地图。

使用一个帧帧分割框架 + instance-aware 的语义预测的无监督几何方法来同时检测已经识别过得场景元素和之前没有见过的物体

data-association：在不同的帧之间追踪已经预测到的物体实例

一个地图整合策略把关于它们的3D形状，位置，以及语义信息融合进入一个全局栅格地图（global volume）
原文链接：https://zhuanlan.zhihu.com/p/117665107

前提知识：

ESDFs (Euclidean Signed Distance Fields) are a voxel grid where every point contains its Euclidean distance to the nearest obstacle
TSDFs (Truncated Signed Distance Fields) use projective distance, which is the distance along the sensor ray to the measured surface, and calculate these distances only within a short truncation radius around the surface boundary.

1. Motivation

因为小型无人机的规划需求，通过我们获取与障碍物之间的距离信息是通过 ESDFs。voxblox 主要是使用 TSDF进行建图，然后增量构建ESDFs

voxblox++ 则是走到了object-level，所以voxblox++其实是在voxblox基础上加了语义的label

下面主要是介绍一下voxblox和其延续的voxblox++论文上提到的点。首先是voxblox中说明了为什么使用TSDF进行操作：TSDFs are fast to build and smooth out sensor noise over many observations, and are designed to produce surface meshes.

voxblox 主要还是关注在无人机需要这个地图是用来进行规划的，所以最终形态其实是ESDF 来做规划使用，对比之前的：

[3] 可以增量式构建distance map，但是缺点是 maximum size of the map需要是已知，而且不能动态调整
octomap[4] 虽然能使用，但是难以让人理解的 different for human to parse

为了解决以上问题我们提出了voxblox 这样的系统，可以增量式构建ESDF，同时underlying map representation 可以可视化；同时从TSDF中直接提取距离信息来构建ESDF

而voxblox++ 指出在机械臂抓取中，我们通常需要知道更多信息，其中就包括了3D物体的模型大小，类型等，但是在真实世界中exhibit large variability in object appearance, shape, placement and location, posing a direct chagenge to robotic perception. 虽然CV有针对pixel-level的分割，但是仅识别训练中遇到的；完全基于几何的方法可以适用于openset，但是他们 tend to over-segment the reconstructed objects and additionally fail to provide any semantic information about them, making highlevel scene understanding and task planning impractical.

voxblox++ 系统主要就是增量的构建精确几何信息的volumetric maps，同时标注出所有的object instance，从[7] 的 incremental geometry-based scene segmentation approach然后扩展到完整的 instance-aware semantic mapping

Contribution

voxblox 的主要贡献就是第一个提出使用TSDFs 增量构建ESDFs，然后分析了不同的构建TSDFs的方式在large voxels size的情况下，提升构建速度和表面精度。

而voxblox++ 则是专注于语义的部分，首先是结合了geometric-semantic segmentation that extends object detection，同时有关于预测出的label怎样在多帧之间进行跟踪，匹配等

2. Method

voxblox：

为了exploration和mapping，使用了[12] 提出的voxel hashing；同时因为mapping的block position and their locations in memory通过hash table存储，可以实现O(1)的插入和查找，这种数据结构适合 flexible to growing maps，然后比Octomap更快 O(logn)

voxblox++：

这样看来两个是不太一样的任务不应该放在阅读 hhh ，进阶版干点语义的事大概是这感觉，过程总结：

A frame-wise segmentation scheme combines an unsupervised geometric segmentation of depth images [9] with semantic object predictions from RGB [1]. The use of semantics allows the system to infer the category of some of the 3D segments predicted in a frame, as well as to group segments by the object instance to which they belong. 所以主要是使用深度相机做无监督的几何分割，同时对RGB图片也做mask，得到refined
The tracking of the individual predicted instances across multiple frames is addressed by matching perframe predictions to existing segments in the global map via a data association strategy.
Observed surface geometry and segmentation information are integrated into a global Truncated Signed Distance Field (TSDF) map volume.

主要就是接受了Mask R-CNN 走到点云的点去给出label，注意有时候可能存在不同物体点之间有overlap，voxblox++ 论文中设了一个阈值

整理的integration 也就是使用了voxblox进行的进图，然后给每个voxel分配label info，选取各自最大的object label和semantic class

2.2 TSDF构建

对于文中更新的方式则是如下公式，\(\bf x, p, s \in \R^3\)

\[\begin{aligned}d(\mathbf{x}, \mathbf{p}, \mathbf{s}) &=\|\mathbf{p}-\mathbf{x}\| \operatorname{sign}((\mathbf{p}-\mathbf{x}) \bullet(\mathbf{p}-\mathbf{s})) \\w_{\text {const }}(\mathbf{x}, \mathbf{p}) &=1 \\D_{i+1}(\mathbf{x}, \mathbf{p}) &=\frac{W_i(\mathbf{x}) D_i(\mathbf{x})+w(\mathbf{x}, \mathbf{p}) d(\mathbf{x}, \mathbf{p})}{W_i(\mathbf{x})+w(\mathbf{x}, \mathbf{p})} \\W_{i+1}(\mathbf{x}, \mathbf{p}) &=\min \left(W_i(\mathbf{x})+w(\mathbf{x}, \mathbf{p}), W_{\max }\right)\end{aligned} \]

x表示current voxel的中心位置
p表示传感器数据的3D point位置
s表示传感器中心
d为来自传感器点的新更新数据

最后关于如何merge 新收的数据和之前的voxel grid

For each point in the sensor scan, we project its position to the voxel grid, and group it with all other points mapping to the same voxel.
Then we take the weighted mean of all points and colors within each voxel, and do raycasting only once on this mean position

速度上比普通的raycasting方法快了20倍

在上面weight是常量为1，但是本文提出了使用更sophisticated weight，主要是[19]中针对RGB-D 发现 the \(\sigma\) of a single ray measurement varied predominantly with z2，其中z为相机坐标系下测量的深度信息，结合对RGB-D model的 behind-surface drop-off的简单假设，设置如下权重：

\[w_{\text {quad }}(\mathbf{x}, \mathbf{p})=\left\{\begin{array}{lr}\frac{1}{z^2} & -\epsilon<d \\\frac{1}{z^2} \frac{1}{\delta-\epsilon}(d+\delta) & -\delta<d<-\epsilon \\0 & d<-\delta\end{array}\right. \]

其中 truncation distance of \(\delta=4v \text{ and }\epsilon=v\)，其中v 为voxel size

code对应和 paper里中间那个条件没有对应起来：

// Thread safe.
float TsdfIntegratorBase::getVoxelWeight(const Point& point_C) const {
  if (config_.use_const_weight) {
    return 1.0f;
  }
  const FloatingPoint dist_z = std::abs(point_C.z());
  if (dist_z > kEpsilon) {
    return 1.0f / (dist_z * dist_z);
  }
  return 0.0f;
}

2.3 TSDF → ESDF

由voxblox 文档截图出来的：

代码主要在 esdf_integrator.cc 文件中

3. 实验及结果

从图五可以看出voxel size越小 error越小，大了之后的Quadratic Weight操作对于error的减小也有帮助，速度上本篇提出的速度最快，耗时最少

voxblox++

定量结果主要是和3D semantic instance-segmentation的一个方法对比IoU

定性结果

同时还有每个部分所耗的时间，文中给出了计算平台型号

4. Conclusion

所以voxblox主要是把RGB-D收到的信息做一个彩色建图，使用TSDF进行距离信息保留和构建，同时直接从TSDF增量生成ESDF给到规划使用，是一个非常明确下游任务需要的地图类型，real-time, efficient 也并未讨论未来工作

voxblox++添加了每个点上的object level和segmentation label信息，当然这样是耗时的，所以未来工作减少耗时，同时还有 involves investigating the optimal way to fuse RGB and depth information within a unified per-frame object detection, discovery and segmentation framework.

赠人点赞手有余香 😆；正向回馈才能更好开放记录 hhh

posted @ 2022-10-12 10:16 Kin_Zhang 阅读(599) 评论(0) 编辑收藏举报

刷新页面返回顶部

张聪明 (Kin_Zhang)

HKUST-MPhil 学生无人驾驶、强化学习研究中

【论文阅读】IROS2017: Voxblox & RAL2019: Voxblox++

1. Motivation

Contribution

2. Method

2.2 TSDF构建

2.3 TSDF → ESDF

3. 实验及结果

4. Conclusion

公告

张聪明 (Kin_Zhang)

HKUST-MPhil 学生 无人驾驶、强化学习研究中

【论文阅读】IROS2017: Voxblox & RAL2019: Voxblox++

1. Motivation

Contribution

2. Method

2.2 TSDF构建

2.3 TSDF → ESDF

3. 实验及结果

4. Conclusion

公告

HKUST-MPhil 学生无人驾驶、强化学习研究中