DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors 阅读笔记
Introduction
-
Previous Work
- bundle adjustment or loop closure
- VoxHashing with Signed Distance Function
- deep geometry learning
-
Challenges
- explicitly modeled against sensor noise or view occlusion
- an accurate camera tracking formulation
- an efficient surface mapping strategy
-
DI-Fusion
- extend the original local implicit grids
- adapt it into PLIVox
- an additional uncertainty encoding
- approximate gradient for solving the camera tracking problem efficiently
- encoder-decoder network design
Dataset: 3D RGB-D benchmark (ICL-NUIM, ScanNet dataset),
Related Works
-
Online 3D Reconstruction and SLAM.
-
Learned Probabilistic Reconstruction.
-
Implicit Representation.
Method
Given a sequential RGB-D stream, DI-Fusion incrementally builds up a 3D scene based on a novel PLIVox representation. (implicitly parameterized by a neural network and encodes useful local scene priors effectively)
We represent the reconstructed 3D scene with PLIVoxs (Sec. 3.1).
Given input RGB-D frames, we first estimate the camera pose by finding the best alignment between the current depth point cloud and the map (Sec. 3.2), then the depth observations are integrated (Sec. 3.3) for surface mapping.
Scene mesh can be extracted any time on demand at any resolution.
Note that both the camera tracking and surface mapping are performed directly on the deep implicit representation.
PLIVox Representation
The scene reconstructed is sparsely partitioned into evenly-spaced voxels (PLIVoxs).
query its corresponding PLIVox index
The local coordinate of in is calculated as ( being the voxel size)
Probabilistic Signed Distance Function.
the output at every position is not a SDF but a SDF distribution
Here we model the SDF distribution as a canonical Gaussian distribution
We encode the PSDF with a latent vector using an encoder-decoder deep neural network
Encoder-Decoder Neural Network.
-
is to convert the measurements from each depth point observation at frame to obeservation latent vectors
Point measurement’s local coordinate and normal direction are transformed to an -dimensional feature vector using only FC (Fully Connected) layers.
Then the feature vectors from multiple points are aggregated to one latent vector using a mean-pooling layer.
-
transform the concatenation of the local coordinate and the latent vector are taken as input and the output is a 2-tuple , represents Guassian parameters described before.
two latent vectors and in and are different latent vectors.
Network Training.
Train the and jointly in an end-to-end fashion, setting .
two datasets:
-
for encoder, which is a set of tuples for each PLIVOX which points and sampled from the scene surface
-
for decoder, which consists of tuples where points yi are randomly sampled within a PLIVox using a strategy similar to ith being the SDF at point .
The goal of training is to maximize the likelihood of the dataset D for all training PLIVoxs.
Specifically, the loss function for each PLIVox is written as
we regularize the norm of the latent vector with a -loss which reflects the prior distributions of
The final loss function is:
Camera Tracking
A frame-to-model camera tracking method. Learned deep priors have enough information of the 3D scene for an accurate camera pose estimation.
Formulate the PSGD function as an objective function for camera pose estimation, with an approximate gradient for the objective function over camera pose.
-
Tracking
We denote the RGB-D observation at frame as
The depth measurement can be re-projected to 3D as point measurements where is the projection function and is its inverse.
Goal is to estimate 's camera pose by optimizing the relative pose between and , i.e.
The following objective function is minimized in our system
where and are the SDF term and intensity term respectively, and is a weight parameter.
-
SDF Term
Perform frame-to-model alignment of the point measurements to the on-surface geometry decoded by .
We choose to minimize the signed distance value of each point in when transformed by the optimized camera pose.
The objective function is:
One important step to optimize the SDF term is the com-
putation of 's gradient with respect to , i.e.Treat to be constant during the local linearization
whete is rotation part of and
-
Intensity Term
It is defined as
where is the image domain.
This intensity term takes effect when the SDF term fails in areas with fewer geometric details such as wall or floor.
Surface Mapping
After the camera pose of RGB-D observation is estimated, we need to update the mapping from observation based on the deep implicit representation, by fusing new scene geometry from new observations with noise, which is also referred to as geometry integration.
-
Geometry Integration.
We perform the geometry integration by updating the geometry latent vector with the observation latent vector encoded by the point measurements .
We transform according to and then estimate the normal of each point measurement, obtaining
In each PLIVox , the point measurements and observation latent vector using
is then updated as:
where the weight is set to the number of points within the PLIVox as .
-
Mesh Extraction
Divide each PLIVox into equally-spaced volumetric grids and query the SDFs for each grid with the decode using the PLIVox's latent vector.
Double each PLIVox’s domain such that the volumetric grids between neighboring PLIVoxs overlap with each other.
The final SDF of each volumetric grid is trilinearly interpolated with the SDFs decoded from the overlapping PLIVoxs.
__EOF__

本文链接:https://www.cnblogs.com/zjp-shadow/p/16021270.html
关于博主:评论和私信会在第一时间回复。或者直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
声援博主:如果您觉得文章对您有帮助,可以点击文章右下角【推荐】一下。您的鼓励是博主的最大动力!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 单线程的Redis速度为什么快?
· 展开说说关于C#中ORM框架的用法!
· Pantheons:用 TypeScript 打造主流大模型对话的一站式集成库
· SQL Server 2025 AI相关能力初探
· 为什么 退出登录 或 修改密码 无法使 token 失效
2018-03-18 2-SAT 问题与解法小结