IC-GVINS: A Robust, Real-Time, INS-Centric GNSS-Visual-Inertial Navigation System

IC-GVINS:一种稳健的实时以惯性导航系统为中心的全球导航卫星系统视觉惯性导航系统

Xiaoji Niu $ {}^{\circledR } $ , Hailiang Tang $ {}^{\circledR } $ , Tisheng Zhang $ {}^{\circledR } $ , Jing Fan, and Jingnan Liu $ {}^{\circledR } $

Niu Xiaoji $ {}^{\circledR } $，Tang Hailiang $ {}^{\circledR } $，Zhang Tisheng $ {}^{\circledR } $，Fan Jing和Liu Jingnan $ {}^{\circledR } $

Abstract-Visual navigation systems are susceptible to complex environments, while inertial navigation systems (INS) are not affected by external factors. Hence, we present IC-GVINS, a robust, real-time, INS-centric global navigation satellite system (GNSS)- visual-inertial navigation system to fully utilize the INS advantages. The Earth rotation has been compensated in the INS to improve the accuracy of high-grade inertial measurement units (IMUs). To promote the system robustness in high-dynamic conditions, the precise INS information is employed to assist the feature tracking and landmark triangulation. With a GNSS-aided initialization, the IMU, visual, and GNSS measurements are tightly fused in a unified world frame within the factor graph optimization framework. Dedicated experiments were conducted in the public vehicle and private robot datasets to evaluate the proposed method. The results demonstrate that IC-GVINS exhibits superior robustness and accuracy in complex environments. The proposed method with the INS-centric architecture yields improved robustness and accuracy compared to the state-of-the-art methods.

摘要-视觉导航系统容易受到复杂环境的影响，而惯性导航系统(INS)不受外部因素的影响。因此，我们提出了IC-GVINS，一种稳健的实时以INS为中心的全球导航卫星系统(GNSS)视觉惯性导航系统，以充分利用INS的优势。已在INS中补偿地球自转，以提高高等级惯性测量单元(IMU)的精度。为了提高系统在高动态条件下的稳健性，采用精确的INS信息来辅助特征跟踪和地标三角测量。在GNSS辅助初始化的情况下，IMU、视觉和GNSS测量在因子图优化框架内紧密融合于统一的世界坐标系中。我们在公共车辆和私人机器人数据集上进行了专门的实验，以评估所提出的方法。结果表明，IC-GVINS在复杂环境中表现出优越的稳健性和精度。与最先进的方法相比，采用以INS为中心的架构的所提出的方法提高了稳健性和精度。

Index Terms-Factor graph optimization, multisensor fusion navigation, state estimation, visual-inertial navigation system.

关键词-因子图优化，多传感器融合导航，状态估计，视觉惯性导航系统。

I. INTRODUCTION

I. 引言

CI ONTINUOUS, robust, and accurate positioning is es- sential for autonomous vehicles and robots in complex environments [1]. Visual-inertial navigation system (VINS) has become a practical solution for autonomous navigation due to its higher accuracy and lower cost [2]. It has been historically difficult to achieve a robust and reliable positioning for VINS in complex environments because the visual system is susceptible to illumination change and moving objects [3]. In contrast, the inertial measurement unit (IMU) is not affected by these external environment factors, and the inertial navigation system (INS) can achieve continuous high-frequency positioning independently [4]. The low-cost micro-electro-mechanical system (MEMS) INS cannot provide long-term (e.g., longer than 1 minute) high-accuracy positioning. Nevertheless, it can achieve decimeter-level positioning within several seconds [5]. However, most current VINSs are visual-centric or visual-driven, while the INS precision has not been well considered, such as in [6], [7]. Furthermore, the INS information contributes a little to the visual processes in these systems, which might degrade system robustness and accuracy in visual-degenerated environments. Hence, we propose an INS-centric VINS (IC-VINS) to utilize the INS advantages fully. We further incorporate the global navigation satellite system (GNSS) into the proposed IC-VINS to construct an INS-centric GNSS-visual-inertial navigation system (IC-GVINS) to perform continuous, robust, and accurate positioning in large-scale challenging environments.

连续、稳健和准确的定位对于复杂环境中的自主车辆和机器人至关重要 [1]。视觉惯性导航系统 (VINS) 由于其更高的准确性和更低的成本，已成为自主导航的实际解决方案 [2]。在复杂环境中实现 VINS 的稳健和可靠定位历来是一个挑战，因为视觉系统容易受到光照变化和移动物体的影响 [3]。相比之下，惯性测量单元 (IMU) 不受这些外部环境因素的影响，惯性导航系统 (INS) 可以独立实现连续的高频定位 [4]。低成本的微电机械系统 (MEMS) INS 无法提供长期(例如，超过 1 分钟)的高精度定位。然而，它可以在几秒钟内实现分米级定位 [5]。然而，目前大多数 VINS 都是以视觉为中心或以视觉驱动的，而 INS 的精度并未得到充分考虑，例如在 [6]、[7] 中。此外，INS 信息对这些系统中的视觉过程贡献不大，这可能会降低系统在视觉退化环境中的稳健性和准确性。因此，我们提出了一种以 INS 为中心的 VINS (IC-VINS)，以充分利用 INS 的优势。我们进一步将全球导航卫星系统 (GNSS) 纳入所提出的 IC-VINS，构建一个以 INS 为中心的 GNSS-视觉惯性导航系统 (IC-GVINS)，以在大规模挑战性环境中执行连续、稳健和准确的定位。

Conventionally, the state estimation problem in VINS is addressed through filtering [8], [9], [10], [11]. However, we have noticed some insufficient usage of the INS in recent filtering-based approaches. For example, OpenVINS [9] is a visual-driven system because the system will pause if no image is received. The independent INS should be adopted for real-time navigation without hesitation. Besides, the INS does not contribute to the feature tracking in [9]. Similarly, the direct image intensity patches were employed as landmark descriptors allowing for tracking non-corner features in an IEKF-based visual-inertial odometry (VIO) [10]. R-VIO [11] is a robocentric visual-inertial odometry within the multi-state constraint Kalman filters (MSCKF) framework. Though the filtering-based VINSs have exhibited considerable accuracy, they theoretically suffer from significant linearization errors, which may ruin the estimator and further degrade the robustness and accuracy [12].

传统上，VINS 中的状态估计问题是通过滤波方法来解决的 [8]，[9]，[10]，[11]。然而，我们注意到在最近的基于滤波的方法中，惯性导航系统(INS)的使用存在不足。例如，OpenVINS [9] 是一个以视觉为驱动的系统，因为如果没有接收到图像，系统将会暂停。独立的 INS 应该被采用以实现实时导航而不犹豫。此外，INS 在 [9] 中并未对特征跟踪做出贡献。同样，直接的图像强度补丁被用作地标描述符，从而允许在基于 IEKF 的视觉惯性里程计(VIO)中跟踪非角点特征 [10]。R-VIO [11] 是一个在多状态约束卡尔曼滤波器(MSCKF)框架内的以机器人为中心的视觉惯性里程计。尽管基于滤波的 VINS 展现出了相当的准确性，但从理论上讲，它们遭受显著的线性化误差，这可能会破坏估计器并进一步降低鲁棒性和准确性 [12]。

By solving maximum a posterior (MAP) estimation, factor graph optimization (FGO) has been proven to be more efficient and accurate than the filtering-based approaches for VINS [2], [12]. Nevertheless, the INS information has not been fully used in most FGO-based VINSs. Besides, the IMU measurements have only been employed to construct a relative constraint factor, such as the IMU preintegration factor [5], [6], [7], [13], [14]. VINS-Mono [6] adopts a sliding-window optimizer to achieve pose estimation, but their estimator relies more on high-frequency visual observations. Besides, their visual processes [6] are relatively rough, which limits their accuracy in large-scale complex environments. In ORB_SLAM3 [7], the camera pose predicted by the INS is used to assist the ORB feature tracking instead of using the unreliable ad-hoc motion mode. ORB_SLAM3 is still driven by visual images, thus unsuitable for real-time navigation. Similarly, Kimera-VIO [13] is a keyframe-based visual-inertial estimator that can perform both full and fixed-lag smoothing using GTAM [15]. A novel approach is proposed in [14], which combines the strengths of the accurate VIO with the globally consistent keyframe-based bundle adjustment (BA). Their works [14] are built upon the reality that the INS accuracy might quickly degrade after several seconds of integration. However, as mentioned above, the INS can maintain decimeter-level positioning within several seconds [5], even for MEMS IMU.

通过解决最大后验(MAP)估计，因子图优化(FGO)已被证明在视觉惯性导航系统(VINS)中比基于滤波的方法更高效和准确 [2]，[12]。然而，在大多数基于FGO的VINS中，惯性导航系统(INS)信息并未得到充分利用。此外，惯性测量单元(IMU)测量仅用于构建相对约束因子，例如IMU预积分因子 [5]，[6]，[7]，[13]，[14]。VINS-Mono [6] 采用滑动窗口优化器来实现位姿估计，但他们的估计器更依赖于高频视觉观测。此外，他们的视觉处理 [6] 相对粗糙，这限制了他们在大规模复杂环境中的准确性。在ORB_SLAM3 [7] 中，INS预测的相机位姿用于辅助ORB特征跟踪，而不是使用不可靠的临时运动模式。ORB_SLAM3仍然依赖于视觉图像，因此不适合实时导航。类似地，Kimera-VIO [13] 是一种基于关键帧的视觉惯性估计器，可以使用GTAM [15] 执行全平滑和固定滞后平滑。文献 [14] 中提出了一种新方法，结合了准确的VIO与全局一致的基于关键帧的束调整(BA)的优点。他们的工作 [14] 建立在INS准确性在几秒钟的积分后可能迅速下降的现实基础上。然而，如上所述，INS可以在几秒内保持分米级定位 [5]，即使对于MEMS IMU也是如此。

Manuscript received 28 July 2022; accepted 17 November 2022. Date of publication 23 November 2022; date of current version 30 November 2022. This letter was recommended for publication by Associate Editor M. Kaess and Editor S. Behnke upon evaluation of the reviewers' comments. This work was supported in part by the National Key Research and Development Program of China under Grant 2020YFB0505803 and in part by the National Natural Science Foundation of China under Grant 41974024. (Corresponding author: Tisheng Zhang.)

手稿于2022年7月28日收到；2022年11月17日接受。出版日期为2022年11月23日；当前版本日期为2022年11月30日。该信件在评估审稿人意见后，由副编辑M. Kaess和编辑S. Behnke推荐出版。本研究部分由中国国家重点研发计划资助，资助编号2020YFB0505803，部分由中国国家自然科学基金资助，资助编号41974024。(通讯作者:张提生。)

The authors are with the GNSS Research Center, Wuhan University, Wuhan 430079, China (e-mail: xjniu@ whu.edu.cn; thl@ whu.edu.cn; zts@ whu.edu.cn; jingfan@whu.edu.cn; jnliu@whu.edu.cn).

作者来自中国武汉大学GNSS研究中心，地址:武汉430079(电子邮件:xjniu@whu.edu.cn；thl@whu.edu.cn；zts@whu.edu.cn；jingfan@whu.edu.cn；jnliu@whu.edu.cn)。

We open-source the proposed IC-GVINS and the multisensor datasets on GitHub (https://github.com/i2Nav-WHU/IC-GVINS).

我们在GitHub上开源了提出的IC-GVINS和多传感器数据集(https://github.com/i2Nav-WHU/IC-GVINS)。

The high-accuracy industrial-grade MEMS IMU has been widely used for autonomous navigation because the cost has been lower with improved accuracy [4]. However, the INS information has not been well considered, and the INS mechanization algorithm has been relatively rough in these optimization-based VINSs. Besides, most of these VINSs are driven by visual images and unsuitable for real-time applications that need stable and continuous positioning. Moreover, the visual system is delicate and can be easily affected by environments, especially in complex scenes. Hence, the independent INS can play a more critical role in both the state estimation and visual processes of VINS to improve the robustness and accuracy.

高精度工业级MEMS IMU因其成本降低且精度提高而被广泛用于自主导航[4]。然而，INS信息未得到充分考虑，INS机械化算法在这些基于优化的VINS中相对粗糙。此外，这些VINS大多由视觉图像驱动，不适合需要稳定和连续定位的实时应用。此外，视觉系统较为脆弱，容易受到环境的影响，尤其是在复杂场景中。因此，独立的INS在VINS的状态估计和视觉处理中的作用更加关键，以提高鲁棒性和准确性。

The GNSS can achieve absolute positioning in large-scale environments, and thus it has been widely used for outdoor navigation. By using the real-time kinematic (RTK) [4], the GNSS can perform centimeter-level positioning in open-sky environments. In VINS-Fusion [16], the GNSS is integrated into a global estimator, while the local estimator is a VINS. The GNSS can help estimate the IMU biases, but the GNSS is separated from the VINS estimator in [16]. The GNSS raw measurements are tightly incorporated into a VINS in GVINS [17], which can provide global estimation under indoor-outdoor environments. The approach in [17] is based on [6], but the visual processes have not been improved. Hence, GVINS [17] might also degrade robustness and accuracy in GNSS-denied environments. The GNSS can also help to initialize the VINS. In [18], the GNSS/INS integration and VINS are launched simultaneously to initialize a GNSS-visual-inertial navigation system for a land vehicle, but the approach is loosely coupled. G-VIDO [19] is a similar system, but they further incorporate the vehicle dynamic to improve the system accuracy. In [20], a tightly coupled optimization-based GNSS-Visual-Inertial odometry is proposed, but the GNSS does not contribute to the initialization of the visual system. Moreover, the GNSS works in a different world frame from the VINS system in all these systems [16], [17], [18], [19], [20], and the VINS has to be initialized separately. The GNSS can help initialize the INS first and further initialize the VINS. Hence, the GNSS and VINS can work in a unified world frame without extra transformation.

GNSS 可以在大规模环境中实现绝对定位，因此它被广泛应用于户外导航。通过使用实时动态定位 (RTK) [4]，GNSS 可以在开阔天空环境中实现厘米级定位。在 VINS-Fusion [16] 中，GNSS 被集成到全局估计器中，而局部估计器是 VINS。GNSS 可以帮助估计 IMU 偏差，但在 [16] 中，GNSS 与 VINS 估计器是分开的。GNSS 原始测量值被紧密整合到 GVINS [17] 中，该系统可以在室内外环境下提供全局估计。[17] 中的方法基于 [6]，但视觉处理没有得到改善。因此，GVINS [17] 可能在 GNSS 无法使用的环境中降低鲁棒性和准确性。GNSS 还可以帮助初始化 VINS。在 [18] 中，GNSS/INS 集成和 VINS 同时启动，以初始化用于陆地车辆的 GNSS-视觉惯性导航系统，但该方法是松散耦合的。G-VIDO [19] 是一个类似的系统，但他们进一步结合了车辆动态以提高系统准确性。在 [20] 中，提出了一种基于紧耦合优化的 GNSS-视觉-惯性里程计，但 GNSS 对视觉系统的初始化没有贡献。此外，在所有这些系统 [16]、[17]、[18]、[19]、[20] 中，GNSS 在与 VINS 系统不同的世界坐标系中工作，VINS 必须单独初始化。GNSS 可以先帮助初始化 INS，然后进一步初始化 VINS。因此，GNSS 和 VINS 可以在统一的世界坐标系中工作，而无需额外的转换。

The visual system may be affected by various degenerated scenes in complex environments. The INS can independently provide precise and high-frequency poses in the short term and may not be affected by external environmental factors. Inspired by these advantages of the INS, we propose an INS-centric GNSS-visual-inertial navigation system to utilize the precise INS information fully. The GNSS is adopted to achieve an accurate initialization and perform absolute positioning in large-scale environments. The main contributions of our work are as follows:

视觉系统可能会受到复杂环境中各种退化场景的影响。惯性导航系统(INS)能够独立地在短期内提供精确且高频的位姿，并且可能不受外部环境因素的影响。受到INS这些优势的启发，我们提出了一种以INS为中心的全球导航卫星系统(GNSS)-视觉-惯性导航系统，以充分利用精确的INS信息。GNSS被采用以实现准确的初始化并在大规模环境中进行绝对定位。我们工作的主要贡献如下:

We propose a tightly-coupled INS-centric GNSS-visual-inertial navigation system (IC-GVINS) within the FGO framework to fully utilize the precise INS information. The INS-centric designs include the precise INS with the Earth rotation compensated, the GNSS-aided initialization, and the INS-aided visual processes.
我们在FGO框架内提出了一种紧耦合的以INS为中心的GNSS-视觉-惯性导航系统(IC-GVINS)，以充分利用精确的INS信息。以INS为中心的设计包括补偿地球自转的精确INS、GNSS辅助初始化和INS辅助视觉处理。

0192d713-eeb7-7e4d-8e6e-c4e7a2013e16_1_870_188_768_416_0.jpg

Fig. 1. System pipeline of the proposed IC-GVINS. The filled blocks denote the proposed works in this letter.

图1. 提出的IC-GVINS的系统流程图。填充的方块表示本文提出的工作。

IC-VINS, the VINS subsystem of IC-GVINS, is a keyframe-based estimator with strict outlier-culling algorithms. The precise INS information is employed to assist the feature tracking and landmark triangulation and improve the robustness in high-dynamic conditions.
IC-VINS，IC-GVINS的VINS子系统，是一个基于关键帧的估计器，具有严格的异常值剔除算法。精确的INS信息被用来辅助特征跟踪和地标三角测量，并提高高动态条件下的鲁棒性。
The proposed method is evaluated in both the public vehicle and private robot datasets. Dedicated experiment results indicate that the proposed method yields improved robustness and accuracy compared to the SOTA methods in complex environments.
提出的算法在公共车辆和私人机器人数据集上进行了评估。专门的实验结果表明，与现有最先进的方法相比，所提方法在复杂环境中具有更好的鲁棒性和准确性。
We open-source the proposed IC-GVINS and the well-synchronized multisensor robot datasets on GitHub.
我们在GitHub上开源了所提的IC-GVINS和同步良好的多传感器机器人数据集。

II. SYSTEM OVERVIEW

II. 系统概述

The proposed IC-GVINS is driven by a precise INS mechanization, as depicted in Fig. 1. A GNSS/INS integration is conducted first to initialize the INS to obtain the rough IMU biases and absolute attitude estimation. The absolute attitude is aligned to the local navigation frame (gravity aligned) [4], [5], and thus the GNSS can be directly incorporated into the FGO without extra transformation. Once the INS is initialized, the prior pose from the INS is employed to assist the feature tracking and the landmark triangulation. Finally, the IMU, visual, and GNSS measurements are tightly fused within the FGO framework to achieve MAP estimation. The estimated states are fed back to the INS mechanization module to update the newest INS states for real-time navigation.

所提出的 IC-GVINS 由精确的 INS 机械化驱动，如图 1 所示。首先进行 GNSS/INS 集成，以初始化 INS，获取粗略的 IMU 偏差和绝对姿态估计。绝对姿态与当地导航框架(重力对齐)[4]、[5] 对齐，因此 GNSS 可以直接纳入 FGO，而无需额外的转换。一旦 INS 初始化完成，INS 的先验姿态被用来辅助特征跟踪和地标三角测量。最后，IMU、视觉和 GNSS 测量在 FGO 框架内紧密融合，以实现 MAP 估计。估计的状态被反馈到 INS 机械化模块，以更新最新的 INS 状态，实现实时导航。

III. Methodology

III. 方法论

In this section, the methodology of the proposed IC-GVINS is presented. The system core is a precise INS mechanization with the Earth rotation compensated. A GNSS/INS integration is conducted first to initialize the INS. The visual processes are assisted by the prior pose from the INS. Finally, all the measurements are tightly fused within the FGO framework.

在本节中，提出的 IC-GVINS 的方法论被介绍。系统核心是一个精确的 INS 机械化，考虑了地球自转的影响。首先进行 GNSS/INS 集成以初始化 INS。视觉过程由 INS 的先验姿态辅助。最后，所有测量在 FGO 框架内紧密融合。

A.INS Mechanization

A. INS 机械化

The Earth rotation compensation is not a negligible factor for industrial-grade or higher-grade MEMS IMUs. To fully utilize the INS precision, we follow our previous work in [5] to adopt the precise INS mechanization algorithm, compensating for the Earth rotation and the Coriolis acceleration [4]. The INS kinematic model is defined as follows:

地球自转补偿对于工业级或更高级别的 MEMS IMU 不是一个可以忽视的因素。为了充分利用 INS 的精度，我们遵循之前的工作 [5]，采用精确的 INS 机械化算法，补偿地球自转和科里奥利加速度 [4]。INS 运动模型定义如下:

\[{\dot{\mathbf{p}}}_{\mathrm{{wb}}}^{\mathrm{w}} = {\mathbf{v}}_{\mathrm{{wb}}}^{\mathrm{w}} \]

\[{\dot{\mathbf{v}}}_{\mathrm{{wb}}}^{\mathrm{w}} = {\mathbf{R}}_{\mathrm{b}}^{\mathrm{w}}{\mathbf{f}}^{\mathrm{b}} + {\mathbf{g}}^{\mathrm{w}} - 2\left\lbrack {{\mathbf{w}}_{\mathrm{{ie}}}^{\mathrm{w}} \times }\right\rbrack {\mathbf{v}}_{\mathrm{{wb}}}^{\mathrm{w}}, \]

\[{\dot{\mathbf{q}}}_{\mathrm{b}}^{\mathrm{w}} = \frac{1}{2}{\mathbf{q}}_{\mathrm{b}}^{\mathrm{w}} \otimes \left\lbrack \begin{matrix} 0 \\ {\mathbf{w}}_{\mathrm{{wb}}}^{\mathrm{b}} \end{matrix}\right\rbrack ,{\mathbf{w}}_{\mathrm{{wb}}}^{\mathrm{b}} = {\mathbf{w}}_{\mathrm{{ib}}}^{\mathrm{b}} - {\mathbf{R}}_{\mathrm{w}}^{\mathrm{b}}{\mathbf{w}}_{\mathrm{{ie}}}^{\mathrm{w}}, \tag{1} \]

where $ {\mathbf{p}}{\mathrm{{wb}}}^{\mathrm{w}} $ and $ {\mathbf{v}}{\mathrm{{wb}}}^{\mathrm{w}} $ are the position and velocity of the IMU frame (b-frame) in the world frame (w-frame), respectively; the quaternion $ {\mathbf{q}}{\mathrm{b}}^{\mathrm{w}} $ and the rotation matrix $ {\mathbf{R}}{\mathrm{b}}^{\mathrm{w}} $ denote the rotation of the b-frame with respect to the w-frame; the w-frame is defined at the initial position of the navigation frame (n-frame) or the local geodetic north-east-down (NED) frame; the IMU frame is defined as the body frame (b-frame); $ {\mathbf{g}}^{\mathrm{w}} $ and $ {\mathbf{w}}{\text{ie }}^{\mathrm{w}} $ are the gravity vector and the Earth rotation rate in the w-frame; $ {\mathbf{w}}{\mathrm{{ib}}}^{\mathrm{b}} $ is the compensated angular velocity from the gyroscope; $ \otimes $ denotes the quaternion product. The precise INS mechanization can be formulated by adopting the kinematic model in (1) [5]. The INS pose is directly used for real-time navigation and provides aid for the visual processes, as depicted in Fig. 1.

其中 $ {\mathbf{p}}{\mathrm{{wb}}}^{\mathrm{w}} $ 和 $ {\mathbf{v}}{\mathrm{{wb}}}^{\mathrm{w}} $ 分别是 IMU 坐标系(b-frame)在世界坐标系(w-frame)中的位置和速度；四元数 $ {\mathbf{q}}{\mathrm{b}}^{\mathrm{w}} $ 和旋转矩阵 $ {\mathbf{R}}{\mathrm{b}}^{\mathrm{w}} $ 表示 b-frame 相对于 w-frame 的旋转；w-frame 定义在导航坐标系(n-frame)或局部大地坐标系的初始位置(NED)；IMU 坐标系定义为机体坐标系(b-frame)；$ {\mathbf{g}}^{\mathrm{w}} $ 和 $ {\mathbf{w}}{\text{ie }}^{\mathrm{w}} $ 分别是 w-frame 中的重力向量和地球自转速率；$ {\mathbf{w}}{\mathrm{{ib}}}^{\mathrm{b}} $ 是来自陀螺仪的补偿角速度；$ \otimes $ 表示四元数乘积。精确的惯性导航系统(INS)机制可以通过采用(1)[5] 中的运动学模型来构建。INS 位姿直接用于实时导航，并为视觉处理提供辅助，如图 1 所示。

B. GNSS-Aided Initialization

B. GNSS 辅助初始化

The initialization is an essential procedure for VINS, which determines the system robustness and accuracy [6], [7]. As an INS-centric system, the most critical task is to initialize the INS. An FGO-based GNSS/INS integration is adopted to initialize the INS, and the FGO framework is described in Section III.C. A rough estimation of roll, pitch, and gyroscope biases can be obtained during stationary states by detecting zero-velocity conditions [21]. Dynamic conditions are needed to obtain the absolute attitude from the GNSS. Travelling along a straight line for land vehicles [21] or rarely moving sideways for unmanned aerial vehicles (UAVs) [22] is assumed during the initialization. The absolute attitude is essential for IC-GVINS as we can incorporate the GNSS directly without other coordinate transformations. Besides, the precise IMU preintegration needs the absolute attitude to compensate for the Earth rotation [5]. The GNSS is necessary to initialize the INS in the current implementation for IC-GVINS. Nevertheless, for non-GNSS applications, a stationary condition or a wheeled odometer can help to initialize the INS.

初始化是 VINS 的一个重要过程，它决定了系统的鲁棒性和准确性 [6], [7]。作为一个以 INS 为中心的系统，最关键的任务是初始化 INS。采用基于 FGO 的 GNSS/INS 集成来初始化 INS，FGO 框架在第 III.C 节中描述。在静态状态下，通过检测零速度条件 [21] 可以获得滚转、俯仰和陀螺仪偏差的粗略估计。动态条件是从 GNSS 获得绝对姿态所必需的。在初始化过程中，假设地面车辆 [21] 沿直线行驶或无人机 (UAV) [22] 很少横向移动。绝对姿态对于 IC-GVINS 至关重要，因为我们可以直接结合 GNSS，而无需其他坐标变换。此外，精确的 IMU 预积分需要绝对姿态来补偿地球自转 [5]。在当前的 IC-GVINS 实现中，GNSS 是初始化 INS 的必要条件。然而，对于非 GNSS 应用，静态条件或轮式里程计可以帮助初始化 INS。

The initialized INS can provide prior pose for the visual processes; thus, the visual system is directly initialized with the INS aiding. Once the landmarks have been triangulated, the visual reprojection factors can be constructed using visual observations. A joint optimization is conducted to refine the state estimation further and improve the INS precision. According to our experiments, only 5 seconds of GNSS positioning (in dynamic conditions) is needed to perform an accurate initialization for the proposed method. In comparison, the GNSS-visual-inertial initialization time is 9 seconds in [18] and $ 4 \sim 9 $ seconds in [17]. Once the initialization is finished, the VINS subsystem IC-VINS can work independently without the GNSS.

初始化后的 INS 可以为视觉过程提供先验姿态；因此，视觉系统直接在 INS 的辅助下进行初始化。一旦地标被三角测量，视觉重投影因子可以利用视觉观测构建。进行联合优化以进一步细化状态估计并提高 INS 精度。根据我们的实验，仅需 5 秒的 GNSS 定位(在动态条件下)即可对所提出的方法进行准确初始化。相比之下，GNSS-视觉-惯性初始化时间在 [18] 中为 9 秒，在 [17] 中为 $ 4 \sim 9 $ 秒。一旦初始化完成，VINS 子系统 IC-VINS 可以独立于 GNSS 工作。

C. INS-Aided Visual Processes

C. INS 辅助视觉过程

The VINS subsystem IC-VINS is a keyframe-based visual-inertial navigation system. The prior pose from the INS is utilized in the whole visual processes, including the feature tracking and the landmark triangulation. Strict outlier-culling algorithms are conducted to improve the robustness and accuracy further.

VINS 子系统 IC-VINS 是一个基于关键帧的视觉惯性导航系统。INS 提供的先前位姿在整个视觉过程中被利用，包括特征跟踪和地标三角测量。严格的异常值剔除算法被实施，以进一步提高系统的鲁棒性和准确性。

Feature Detection and Tracking: The Shi-Tomasi corner features are detected in our visual front end. The image is first divided into several grids with a set size, e.g., 200 pixels. The visual features are detected separately in each grid, and a minimum separation of two neighboring pixels is also set to maintain a uniform distribution of the features. Multi-thread technology is employed to improve detection efficiency.
特征检测与跟踪:在我们的视觉前端中检测 Shi-Tomasi 角点特征。图像首先被划分为几个固定大小的网格，例如 200 像素。视觉特征在每个网格中分别检测，并设置相邻两个像素之间的最小间隔，以保持特征的均匀分布。采用多线程技术以提高检测效率。

The Lukas-Kanade optical flow algorithm is adopted to track the features. It is challenging for the optical flow algorithm with a limited pyramid level in high-dynamic scenes. Hence, we propose an INS-aided feature tracking algorithm to improve the system robustness. For those features without the initial depth, we predict the initial optical flow estimations by compensating the rotation, and the RANSAC is employed to reject outliers. For those features with depth, the initial optical flow estimations are calculated by projecting the depth into the image plane. We also track the features in the backward direction (from the current to the previous frame) and remove the failed matches. The continuity of the feature tracking can be significantly improved with the INS aiding, especially in high-dynamic conditions. Nevertheless, the prior pose only provides the initial estimations, and the optical flow algorithm determines the final estimations. The tracked features will be undistorted for further processes.

采用 Lukas-Kanade 光流算法来跟踪特征。在高动态场景中，光流算法在有限的金字塔层级下具有挑战性。因此，我们提出了一种 INS 辅助特征跟踪算法，以提高系统的鲁棒性。对于那些没有初始深度的特征，我们通过补偿旋转来预测初始光流估计，并使用 RANSAC 来剔除异常值。对于那些具有深度的特征，初始光流估计通过将深度投影到图像平面来计算。我们还在反向方向(从当前帧到前一帧)跟踪特征，并移除失败的匹配。特征跟踪的连续性在 INS 辅助下可以显著提高，特别是在高动态条件下。然而，先前的位姿仅提供初始估计，光流算法决定最终估计。被跟踪的特征将在进一步处理之前去畸变。

Once the features are tracked, the keyframe selection is conducted. We first calculate the average parallax between the current frame and the last keyframe. The prior pose from the INS is adopted to compensate for the rotation rather than the raw gyroscope measurements in [6]. If the average parallax is larger than a fixed threshold, e.g., 20 pixels, then the current frame is selected as a new keyframe. The selected keyframe will be used to triangulate landmarks and further construct the reprojection factors in the FGO. However, if the vehicle is in a stationary state or the average parallax is smaller than the threshold for a long time, no new optimization will be conducted in the FGO, which might degrade the accuracy. Hence, if no new keyframe is selected after a long time, e.g., 0.5 seconds, a new observation frame will be inserted into the keyframe queue. The observation frame will be used only one time and will be removed after the optimization.

一旦特征被跟踪，就会进行关键帧选择。我们首先计算当前帧与最后一个关键帧之间的平均视差。采用来自惯性导航系统(INS)的先前姿态来补偿旋转，而不是使用 [6] 中的原始陀螺仪测量值。如果平均视差大于固定阈值，例如 20 像素，则当前帧将被选为新的关键帧。所选的关键帧将用于三角测量地标，并进一步构建 FGO 中的重投影因子。然而，如果车辆处于静止状态或平均视差在较长时间内小于阈值，则在 FGO 中不会进行新的优化，这可能会降低准确性。因此，如果在较长时间内(例如 0.5 秒)没有选择新的关键帧，则将新观察帧插入关键帧队列。观察帧将仅使用一次，并将在优化后被移除。

Triangulation: With the prior pose from the INS, the triangulation has become a part of the visual front end. When a new keyframe is selected, the triangulation will be conducted using the current and previous keyframes. The triangulation determines the initial depth of the landmarks, which will be further estimated in the FGO. Hence, a strict outlier-culling algorithm is conducted in the triangulation to prevent the outlier landmarks or poorly initialized landmarks from ruining the FGO estimator. Parallax is first calculated between the feature in the current keyframe and the corresponding feature in the first observed keyframe. If the parallax is too small, e.g., 10 pixels, the visual feature will be tracked until the parallax is enough, which can improve the precision of the triangulated depths. Then, the prior pose from the INS is used to triangulate the landmarks, and the depth of the landmark in its first observed keyframe can be obtained. We further check the depths to ensure the correctness of the triangulation. Only those depths within a range, e.g., $ 1 \sim {100} $ meters, will be added to the landmark queue or treated as outliers.
三角测量:利用惯性导航系统(INS)提供的先前姿态，三角测量已成为视觉前端的一部分。当选择一个新的关键帧时，将使用当前和先前的关键帧进行三角测量。三角测量确定地标的初始深度，这将在FGO中进一步估计。因此，在三角测量中进行严格的离群值剔除算法，以防止离群地标或初始化不良的地标破坏FGO估计器。首先计算当前关键帧中的特征与首次观察到的关键帧中对应特征之间的视差。如果视差过小，例如10像素，则将跟踪视觉特征，直到视差足够，这可以提高三角测量深度的精度。然后，利用INS提供的先前姿态对地标进行三角测量，并可以获得其首次观察到的关键帧中的地标深度。我们进一步检查深度以确保三角测量的正确性。只有在范围内的深度，例如$ 1 \sim {100} $米，将被添加到地标队列或视为离群值。

0192d713-eeb7-7e4d-8e6e-c4e7a2013e16_3_96_184_757_382_0.jpg

Fig. 2. FGO framework of the IC-GVINS. The visual landmarks are represented by a single block for better visualization.

图2. IC-GVINS的FGO框架。视觉地标通过一个单一的块表示，以便更好地可视化。

D. Factor Graph Optimization

D. 因子图优化

A sliding-window optimizer is adopted to fuse all the measurements within the FGO framework tightly. When a new keyframe is selected or a new GNSS-RTK measurement is valid, a new time node will be inserted into the sliding window, and the FGO will be carried out to perform MAP estimation. The IMU preintegration factor is constructed between each consecutive time node. The FGO framework of the proposed IC-GVINS is depicted in Fig. 2.

采用滑动窗口优化器以紧密融合FGO框架内的所有测量。当选择一个新的关键帧或新的GNSS-RTK测量有效时，将在滑动窗口中插入一个新的时间节点，并进行FGO以执行最大后验估计(MAP估计)。IMU预积分因子在每个连续时间节点之间构建。所提出的IC-GVINS的FGO框架如图2所示。

Formulation: The state vector $ \mathbf{X} $ in the sliding window of IC-GVINS can be defined as
公式化:IC-GVINS滑动窗口中的状态向量$ \mathbf{X} $可以定义为

\[\mathbf{X} = \left\lbrack {{\mathbf{x}}_{0},{\mathbf{x}}_{1},\ldots ,{\mathbf{x}}_{n},{\mathbf{x}}_{\mathrm{c}}^{\mathrm{b}},{\delta }_{0},{\delta }_{1},\ldots ,{\delta }_{l}}\right\rbrack , \]

\[{\mathbf{x}}_{k} = \left\lbrack {{\mathbf{p}}_{{\mathrm{{wb}}}_{k}}^{\mathrm{w}},{\mathbf{q}}_{{\mathrm{b}}_{k}}^{\mathrm{w}},{\mathbf{v}}_{{\mathrm{{wb}}}_{k}}^{\mathrm{w}},{\mathbf{b}}_{{g}_{k}},{\mathbf{b}}_{{a}_{k}}}\right\rbrack , k \in \left\lbrack {0, n}\right\rbrack , \]

\[{\mathbf{x}}_{\mathrm{c}}^{\mathrm{b}} = \left\lbrack {{\mathbf{p}}_{\mathrm{{bc}}}^{\mathrm{b}},{\mathbf{q}}_{\mathrm{c}}^{\mathrm{b}}}\right\rbrack \tag{2} \]

where $ {\mathbf{x}}{k} $ is the IMU state at each time node, as shown in Fig. 2; the IMU state includes the position, attitude quaternion, and velocity in the w-frame, and the gyroscope biases $ {\mathbf{b}} $ and accelerometer biases $ {\mathbf{b}}{a};n $ is the number of time nodes in the sliding window; $ {\mathbf{x}}{\mathrm{c}}^{\mathrm{b}} $ is the extrinsic parameters between the camera frame (c-frame) and the IMU b-frame; $ \delta $ is the inverse depth parameter of the landmark in its first observed keyframe.

在每个时间节点处，$ {\mathbf{x}}{k} $ 是 IMU 状态，如图 2 所示；IMU 状态包括在 w-frame 中的位置、姿态四元数和速度，以及陀螺仪偏差 $ {\mathbf{b}} $ 和加速度计偏差 $ {\mathbf{b}}{a};n $ 是滑动窗口中的时间节点数量；$ {\mathbf{x}}{\mathrm{c}}^{\mathrm{b}} $ 是相机框架 (c-frame) 和 IMU b-frame 之间的外部参数；$ \delta $ 是在其第一次观察的关键帧中的地标的逆深度参数。

The MAP estimation in IC-GVINS can be formulated by minimizing the sum of the prior and the Mahalanobis norm of all measurements as

IC-GVINS 中的 MAP 估计可以通过最小化所有测量值的先验和马哈拉诺比斯范数的总和来表述为

\[\mathop{\min }\limits_{\mathbf{X}}\left\{ \begin{array}{l} {\begin{Vmatrix}{\mathbf{r}}_{p} - {\mathbf{H}}_{p}\mathbf{X}\end{Vmatrix}}^{2} + \mathop{\sum }\limits_{{k \in \left\lbrack {1, n}\right\rbrack }}{\begin{Vmatrix}{\mathbf{r}}_{Pre}\left( {\widetilde{\mathbf{z}}}_{k - 1, k}^{Pre},\mathbf{X}\right) \end{Vmatrix}}_{{\mathbf{\sum }}_{k - 1, k}^{Pre}}^{2} \\ + \mathop{\sum }\limits_{{l \in \mathbf{L}}}{\begin{Vmatrix}{\mathbf{r}}_{V}\left( {\widetilde{\mathbf{z}}}_{l \cdot j}^{{V}_{i, j}},\mathbf{X}\right) \end{Vmatrix}}_{{\mathbf{\sum }}_{l}^{{V}_{i, j}}}^{2} \\ + \mathop{\sum }\limits_{{h \in \left\lbrack {0, m}\right\rbrack }}{\begin{Vmatrix}{\mathbf{r}}_{GNSS}\left( {\widetilde{\mathbf{z}}}_{h}^{GNSS},\mathbf{X}\right) \end{Vmatrix}}_{{\mathbf{\sum }}_{h}^{GNSS}}^{2} \end{array}\right\} \]

(3)where $ {\mathbf{r}}{\text{Pre }} $ are the residuals of the IMU preintegration measurements; $ {\mathbf{r}} $ are the residuals of the visual measurements; $ {\mathbf{r}}{GNSS} $ are the residuals of the GNSS-RTK measurements; $ \sum $ is the covariance for each measurement; $ \left{ {{\mathbf{r}},{\mathbf{H}}_{p}}\right} $ represents the prior from marginalization [6]; $ m $ is the number of the GNSS-RTK measurements in the sliding window; $ \mathbf{L} $ is the landmark map in the sliding window, and $ l $ is the landmark in the map; $ i $ denotes the reference keyframe of the landmark $ l $ , and $ j $ is another keyframe. The Ceres solver [23] is adopted to solve this FGO problem.

其中 $ {\mathbf{r}}{\text{Pre }} $ 是 IMU 预积分测量的残差；$ {\mathbf{r}} $ 是视觉测量的残差；$ {\mathbf{r}}{GNSS} $ 是 GNSS-RTK 测量的残差；$ \sum $ 是每个测量的协方差；$ \left{ {{\mathbf{r}},{\mathbf{H}}_{p}}\right} $ 代表来自边际化的先验 [6]；$ m $ 是滑动窗口中 GNSS-RTK 测量的数量；$ \mathbf{L} $ 是滑动窗口中的地标地图，$ l $ 是地图中的地标；$ i $ 表示地标 $ l $ 的参考关键帧，$ j $ 是另一个关键帧。采用 Ceres 求解器 [23] 来解决这个 FGO 问题。

IMU Preintegration Factor: The Earth rotation compensation has been proven to improve the accuracy of the industrial-grade MEMS-IMU preintegration, and thus we follow our refined IMU preintegration [5] in this letter. The residual of the IMU preintegration measurement can be written as
IMU 预积分因子:已证明地球自转补偿可以提高工业级 MEMS-IMU 预积分的准确性，因此我们在本信中遵循我们改进的 IMU 预积分 [5]。IMU 预积分测量的残差可以写为

\[{\mathbf{r}}_{\text{Pre }}\left( {{\widetilde{\mathbf{z}}}_{k - 1, k}^{\text{Pre }},\mathbf{X}}\right) \]

\[= \left\lbrack \begin{matrix} {\left( {\mathbf{R}}_{{\mathbf{b}}_{k - 1}}^{\mathrm{w}}\right) }^{T}\left( \begin{matrix} {p}_{{\mathbf{w}}_{b}}^{\mathrm{w}} - {p}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} - {v}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} & - {v}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} - {d}_{k - 1, k} \\ - {0.5}{g}^{N/e} & + {\alpha }_{b}^{2}{g}^{N/e}{v}_{k, k - 1, k} \end{matrix}\right) \\ {\left( {\mathbf{R}}_{{\mathbf{b}}_{k - 1}}^{\mathrm{w}}\right) }^{T}\left( \begin{matrix} {v}_{{\mathbf{w}}_{{\mathbf{b}}_{k}}^{\mathrm{w}}}^{\mathrm{w}} - {v}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} - {g}^{\mathrm{w}}\Delta {t}_{k - 1, k} \\ + {\omega }_{b}^{T/e}{v}_{k} - {v}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} - {g}^{\mathrm{w}}\Delta {t}_{k - 1, k} \\ - \Delta {\widetilde{\phi }}_{k - 1}^{T/e}{v}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} - {t}_{{\mathbf{w}}_{k}}^{\mathrm{w}} - {t}_{{\mathbf{w}}_{k}}^{\mathrm{w}} + {\widehat{\alpha }}_{{\mathbf{b}}_{k - 1}^{T/e}}^{T/e} \\ {b}_{{\mathbf{b}}_{k}}^{\mathrm{w}} - {b}_{{\mathbf{b}}_{k - 1}}^{\mathrm{w}} - {t}_{{\mathbf{w}}_{k}}^{\mathrm{w}} - {t}_{{\mathbf{w}}_{k}}^{\mathrm{w}} + {\widehat{\alpha }}_{{\mathbf{b}}_{k - 1}^{T/e}}^{T/e}{v}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {e}_{{\mathbf{w}}_{k - 1}^{T/e}}^{\mathrm{w}} + {\widehat{\alpha }}_{{\mathbf{b}}_{k - 1}^{T/e}}^{T/e}{v}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} + {t}_{{\mathbf{w}}_{k - 1}}^{\mathrm{w}} + {t}_{\mathbf{w}} \end{matrix}\right) \end{matrix}\right\rbrack \]

(4)

where $ \Delta {\mathbf{p}}{g/{cor}, k - 1, k}^{\mathrm{w}} $ and $ \Delta {\mathbf{v}}, k - 1, k}^{\mathrm{w}} $ are the Coriolis correction term for position and velocity preintegration, respectively; $ \Delta {\widehat{\mathbf{p}}}{k - 1, k}^{Pre},\Delta {\widehat{\mathbf{v}}}^{Pre} $ , and $ {\widehat{\mathbf{q}}}{k - 1, k}^{Pre} $ are the position, velocity and attitude preintegration measurements, respectively; quaternion $ {\mathbf{q}}{{\mathrm{w}}{\mathrm{i}\left( {k - 1}\right) }}^{\mathrm{w}}\left( {t}\right) $ is the rotation caused by the Earth rotation [5].

其中 $ \Delta {\mathbf{p}}{g/{cor}, k - 1, k}^{\mathrm{w}} $ 和 $ \Delta {\mathbf{v}}, k - 1, k}^{\mathrm{w}} $ 分别是位置和速度预积分的科里奥利修正项；$ \Delta {\widehat{\mathbf{p}}}{k - 1, k}^{Pre},\Delta {\widehat{\mathbf{v}}}^{Pre} $ 和 $ {\widehat{\mathbf{q}}}{k - 1, k}^{Pre} $ 分别是位置、速度和姿态的预积分测量；四元数 $ {\mathbf{q}}{{\mathrm{w}}{\mathrm{i}\left( {k - 1}\right) }}^{\mathrm{w}}\left( {t}\right) $ 是由地球自转引起的旋转 [5]。

Visual Reprojection Factor: We follow [6], [17] to construct the visual reprojection factor in the unit camera frame. The observed feature in the pixel plane can be expressed as $ {\widetilde{\mathbf{p}}}{\mathrm{b}} $ . For a landmark $ l $ with its inverse depth $ {\delta } $ in the first observed keyframe $ i $ , and another observed keyframe $ j $ , we can write the visual reprojection residual as
视觉重投影因子:我们遵循 [6]、[17] 在单位相机框架中构建视觉重投影因子。像素平面中的观察特征可以表示为 $ {\widetilde{\mathbf{p}}}{\mathrm{b}} $。对于在第一个观察关键帧 $ i $ 中具有其逆深度 $ {\delta } $ 的地标 $ l $ 和另一个观察关键帧 $ j $，我们可以将视觉重投影残差写为

\[{\mathbf{r}}_{V}\left( {{\widetilde{\mathbf{z}}}_{l}^{{V}_{i, j}},\mathbf{X}}\right) = {\left\lbrack \begin{array}{ll} {\mathbf{b}}_{1} & {\mathbf{b}}_{2} \end{array}\right\rbrack }^{T} \cdot \left( {\frac{{\widehat{\mathbf{p}}}_{{\mathrm{c}}_{j}}}{\begin{Vmatrix}{\widehat{\mathbf{p}}}_{{\mathrm{c}}_{j}}\end{Vmatrix}} - {\pi }_{\mathrm{c}}^{-1}\left( {\widetilde{\mathbf{p}}}_{{\mathrm{p}}_{j}}\right) }\right) , \]

\[{\widehat{\mathbf{p}}}_{{\mathrm{c}}_{j}} = {\mathbf{R}}_{\mathrm{b}}^{\mathrm{c}}\left( {{\mathbf{R}}_{\mathrm{w}}^{{\mathrm{b}}_{j}}\left( \begin{matrix} {\mathbf{R}}_{{\mathrm{b}}_{i}}^{\mathrm{w}}\left( {{\mathbf{R}}_{\mathrm{c}}^{\mathrm{b}}\frac{1}{{\delta }_{l}}{\pi }_{\mathrm{c}}^{-1}\left( {\widetilde{\mathbf{p}}}_{{\mathrm{p}}_{i}}\right) + {\mathbf{p}}_{\mathrm{{bc}}}^{\mathrm{b}}}\right) \\ + {\mathbf{p}}_{{\mathrm{w}}_{{\mathrm{b}}_{i}}}^{\mathrm{w}} - {\mathbf{p}}_{{\mathrm{{wb}}}_{j}}^{\mathrm{w}} \end{matrix}\right) - {\mathbf{p}}_{\mathrm{{bc}}}^{\mathrm{b}}}\right) , \]

(5)

where $ {\pi }{\mathrm{c}}^{-1} $ is the back camera projection function, which transforms a feature in the pixel plane $ {p}{\mathrm{p}} $ into the unit camera frame using the camera intrinsic parameters; $ {\mathbf{b}}{1} $ and $ {\mathbf{b}} $ are two orthogonal bases that span the tangent plane of $ {\widehat{\mathbf{p}}}{{\mathrm{c}}{i}} $ .

其中 $ {\pi }{\mathrm{c}}^{-1} $ 是后置相机投影函数，它使用相机内参将像素平面中的特征 $ {p}{\mathrm{p}} $ 转换为单位相机框架；$ {\mathbf{b}}{1} $ 和 $ {\mathbf{b}} $ 是两个正交基，构成 $ {\widehat{\mathbf{p}}}{{\mathrm{c}}{i}} $ 的切平面。

GNSS-RTK Factor: The GNSS-RTK positioning in geodetic coordinates can be converted to the local w-frame as $ {\widehat{\mathbf{p}}}{GNSS}^{\mathrm{w}} $ [4]. By considering the GNSS lever-arms $ {\mathbf{l}}^{\mathrm{b}} $ in the b-frame, the residual of the GNSS-RTK measurement can be written as
GNSS-RTK因子:GNSS-RTK在大地坐标中的定位可以转换为局部w框架，如 $ {\widehat{\mathbf{p}}}{GNSS}^{\mathrm{w}} $ [4] 所示。通过考虑b框架中的GNSS杠杆臂 $ {\mathbf{l}}^{\mathrm{b}} $，GNSS-RTK测量的残差可以写为

\[{\mathbf{r}}_{GNSS}\left( {{\widetilde{\mathbf{z}}}_{h}^{GNSS},\mathbf{X}}\right) = {\mathbf{p}}_{{\mathrm{{wb}}}_{h}}^{\mathrm{w}} + {\mathbf{R}}_{{\mathrm{b}}_{h}}^{\mathrm{w}}{\mathbf{l}}_{GNSS}^{\mathrm{b}} - {\widehat{\mathbf{p}}}_{{GNSS}, h}^{\mathrm{w}}. \tag{6} \]

The GNSS RTK is directly incorporated into the FGO without extra coordinate transformation or yaw alignment as in [16], [17], [18], [19], [20], which benefits from the INS-centric architecture.

GNSS RTK直接纳入FGO中，无需像 [16]、[17]、[18]、[19]、[20] 中那样进行额外的坐标转换或偏航对齐，这得益于以INS为中心的架构。

Outlier Culling: A two-step optimization is employed in the IC-GVINS. After the first optimization, the chi-square test is adopted to remove all unsatisfied visual reprojection factors from the optimizer rather than the landmark map. The second optimization is then carried out to achieve a better state estimation. Once these two optimizations are finished, the outlier-culling process is implemented. The position of the landmarks in the w-frame is first calculated. Each landmark depth and reprojection error are then evaluated in its observed keyframes. The unsatisfied feature observations, e.g., the depths are not within $ 1 \sim {100} $ meters or the reprojection errors exceed 4.5 pixels, will be marked as outliers and will not be used in the following optimization. Furthermore, the average reprojection error of each landmark is calculated, and the landmark will be removed from the landmark map if the error is larger than the threshold, e.g., 1.5 pixels. We both remove landmark outliers and feature observation outliers, which significantly improve the robustness and accuracy. We also employ the chi-square test to judge GNSS outliers after the first optimization. However, we do not remove the GNSS outliers but reweight them to mitigate their effects. This method can avoid removing the valid GNSS observations and thus improve the system robustness.
异常值剔除:在 IC-GVINS 中采用了两步优化。在第一次优化后，采用卡方检验从优化器中移除所有不满足的视觉重投影因子，而不是地标图。然后进行第二次优化，以实现更好的状态估计。一旦这两次优化完成，就会实施异常值剔除过程。首先计算地标在世界坐标系中的位置。然后评估每个地标的深度和重投影误差在其观察到的关键帧中。不满足的特征观测，例如，深度不在 $ 1 \sim {100} $ 米范围内或重投影误差超过 4.5 像素，将被标记为异常值，并且在后续优化中不使用。此外，计算每个地标的平均重投影误差，如果误差大于阈值，例如 1.5 像素，则该地标将从地标图中移除。我们同时移除地标异常值和特征观测异常值，这显著提高了系统的鲁棒性和准确性。我们还在第一次优化后使用卡方检验来判断 GNSS 异常值。然而，我们并不移除 GNSS 异常值，而是对其进行重加权以减轻其影响。这种方法可以避免移除有效的 GNSS 观测，从而提高系统的鲁棒性。

IV. EXPERIMENTS AND RESULTS

IV. 实验与结果

A. Implementation and Evaluation Setup

A. 实施与评估设置

The proposed IC-GVINS is implemented under the Robot Operating System (ROS) framework. The employed sensors include a monocular camera, a MEMS IMU, and a GNSS-RTK receiver. IC-VINS, the VINS subsystem of IC-GVINS, was adopted to evaluate the system robustness and accuracy during the GNSS outages. IC-VINS uses 5 seconds of GNSS for the system initialization. After initialization, IC-VINS uses only the monocular camera and the MEMS IMU. The noise parameter for the visual feature was set to 1.5 pixels without tuning, similar to VINS-Mono [6]. The noise parameters for the employed MEMS IMUs were tuned in the optimization-based GNSS/INS integration by batch processes [5].

提出的 IC-GVINS 在机器人操作系统(ROS)框架下实现。所使用的传感器包括单目相机、MEMS IMU 和 GNSS-RTK 接收器。IC-VINS，IC-GVINS 的 VINS 子系统，被采用来评估系统在 GNSS 中断期间的鲁棒性和准确性。IC-VINS 使用 5 秒的 GNSS 进行系统初始化。初始化后，IC-VINS 仅使用单目相机和 MEMS IMU。视觉特征的噪声参数设置为 1.5 像素，未进行调优，类似于 VINS-Mono [6]。所使用的 MEMS IMU 的噪声参数通过基于优化的 GNSS/INS 集成的批处理过程进行了调优 [5]。

We performed comparisons with the SOTA visual-inertial navigation systems VINS-Mono (without relocalization) [6] and OpenVINS [9] and the loosely-coupled GNSS/VINS integration VINS-Fusion (without relocalization) [16]. Here, VINS-Mono is employed because it is also a sliding-window VINS, similar to IC-VINS. Compared to VINS-Mono, our work has improved the front end in the feature detection, the feature tracking and the triangulation, and the back end with the improved IMU preintegration and the outlier-culling algorithm, as depicted in Fig. 1. The temporal and spatial parameters between the camera and IMU are all estimated and calibrated online. Evo [24] is adopted to quantitatively calculate the absolute rotation error (ARE) and absolute translation error (ATE). All the results in the following parts are running in real-time on a desktop PC (AMD R7-3700X). An onboard ARM computer (NVIDIA Xavier) was adopted to evaluate the real-time performance of IC-GVINS.

我们与最先进的视觉惯性导航系统 VINS-Mono(无重定位)[6]、OpenVINS [9] 以及松耦合的 GNSS/VINS 集成 VINS-Fusion(无重定位)[16] 进行了比较。在这里，采用 VINS-Mono 是因为它也是一个滑动窗口 VINS，类似于 IC-VINS。与 VINS-Mono 相比，我们的工作在特征检测、特征跟踪和三角测量的前端进行了改进，并在 IMU 预积分和异常值剔除算法的后端进行了改进，如图 1 所示。相机和 IMU 之间的时间和空间参数均在线估计和标定。Evo [24] 被采用来定量计算绝对旋转误差(ARE)和绝对平移误差(ATE)。以下部分的所有结果均在桌面 PC(AMD R7-3700X)上实时运行。采用了车载 ARM 计算机(NVIDIA Xavier)来评估 IC-GVINS 的实时性能。

0192d713-eeb7-7e4d-8e6e-c4e7a2013e16_4_959_201_629_469_0.jpg

Fig. 3. The trajectories in the KAIST urban38 dataset. VINS-Mono almost fails in this dataset, and it also occurs a large deviation for VINS-Fusion. The cyan rectangle denotes the GNSS-degenerated scenes in Fig. 5.

图 3. KAIST urban38 数据集中的轨迹。VINS-Mono 在该数据集中几乎失败，VINS-Fusion 也出现了较大的偏差。青色矩形表示图 5 中的 GNSS 退化场景。

0192d713-eeb7-7e4d-8e6e-c4e7a2013e16_4_952_852_623_466_0.jpg

Fig. 4. The trajectories in the KAIST urban39 dataset. The cyan rectangle denotes the GNSS-degenerated scenes in Fig. 5.

图 4. KAIST urban39 数据集中的轨迹。青色矩形表示图 5 中的 GNSS 退化场景。

B. Public Dataset

B. 公共数据集

We evaluated the proposed method in the KAIST Complex Urban Dataset [25]. This dataset was collected by a vehicle in complex urban environments, with a maximum speed of around $ {15}\mathrm{;m}/\mathrm{s} $ . The employed sensors include the left camera (with a resolution of $ {1280} \times {560} $ ), the industrial-grade MEMS IMU MTi- 300 (with the gyroscope bias instability of $ {10}^{ \circ }/\mathrm{{hr}} $ ), and the VRS-RTK GPS. The sequences urban38 and urban39 were adopted for the evaluation. The trajectory lengths are 11191 meters (2154 seconds) and 10678 meters (1856 seconds), respectively. As the vehicle travels very fast, we used a max of 200 features for all the systems to improve the robustness. We failed to run OpenVINS in this dataset, and thus it is not included in this part.

我们在 KAIST 复杂城市数据集 [25] 中评估了所提出的方法。该数据集由一辆车在复杂城市环境中收集，最大速度约为 $ {15}\mathrm{;m}/\mathrm{s} $。所使用的传感器包括左侧相机(分辨率为 $ {1280} \times {560} $)、工业级 MEMS IMU MTi-300(陀螺仪偏置不稳定性为 $ {10}^{ \circ }/\mathrm{{hr}} $)和 VRS-RTK GPS。序列 urban38 和 urban39 被用于评估。轨迹长度分别为 11191 米(2154 秒)和 10678 米(1856 秒)。由于车辆行驶速度非常快，我们对所有系统使用了最多 200 个特征以提高鲁棒性。我们未能在该数据集中运行 OpenVINS，因此未将其包含在此部分中。

The urban38 and urban39 are the two most difficult sequences in the KAIST dataset because of the high-speed motion and the large number of moving objects (mainly vehicles and pedestrians). Nevertheless, the proposed method exhibits superior accuracy in this dataset, as depicted in Figs. 3 and 4. IC-VINS has very few drifts in both two sequences, while VINS-Mono has large drifts, especially in the urban38. These complex scenes may result in the degeneration of the visual system but may not affect the INS. Hence, IC-VINS with the INS-centric architecture can survive and run well in these scenes. In contrast, VINS-Mono, relying much on the visual system, demonstrates unsatisfied robustness and accuracy and almost fails in the urban38. With the help of the GNSS, IC-GVINS is well aligned to the ground truth, though there are many GNSS-degenerated scenes, as depicted in Fig. 5. This benefits from the tightly-coupled structure of IC-GVINS, and thus the GNSS outlier can be judged and reweighted. As can be seen in Figs. 3 and 4, VINS-Fusion exhibits inferior accuracy in these GNSS-degenerated scenes because no outlier-culling method is adopted.

urban38 和 urban39 是 KAIST 数据集中最困难的两个序列，因为它们具有高速运动和大量移动物体(主要是车辆和行人)。尽管如此，所提出的方法在该数据集中表现出优越的准确性，如图 3 和图 4 所示。IC-VINS 在这两个序列中几乎没有漂移，而 VINS-Mono 则有较大的漂移，尤其是在 urban38 中。这些复杂场景可能导致视觉系统的退化，但可能不会影响惯性导航系统(INS)。因此，采用以 INS 为中心架构的 IC-VINS 能够在这些场景中生存并良好运作。相比之下，VINS-Mono 过于依赖视觉系统，表现出不满意的鲁棒性和准确性，并且在 urban38 中几乎失败。在 GNSS 的帮助下，IC-GVINS 与真实值很好地对齐，尽管存在许多 GNSS 退化场景，如图 5 所示。这得益于 IC-GVINS 的紧耦合结构，因此可以判断和重新加权 GNSS 异常值。如图 3 和图 4 所示，VINS-Fusion 在这些 GNSS 退化场景中表现出较差的准确性，因为没有采用异常值剔除方法。

0192d713-eeb7-7e4d-8e6e-c4e7a2013e16_5_188_184_582_490_0.jpg

Fig. 5. The GNSS-degenerated scenes in the KAIST dataset. These scenes are marked in Figs. 3 and 4.

图 5. KAIST 数据集中的 GNSS 退化场景。这些场景在图 3 和图 4 中被标记。

TABLE I

ABSOLUTE POSE ERROR IN THE KAIST DATASET

KAIST 数据集中的绝对姿态误差

ARE / ATE (deg / m)	urban38	urban39
VINS-Mono	4.28 / 125.88	4.91 / 94.47
VINS-Fusion	8.64 / 32.05	$ {6.33}/{10.01} $
IC-VINS	1.44 / 10.83	$ {1.77}/{13.07} $
IC-GVINS	1.31 / 4.27	1.32 / 3.84

ARE / ATE (度 / 米)	urban38	urban39
VINS-Mono	4.28 / 125.88	4.91 / 94.47
VINS-Fusion	8.64 / 32.05	$ {6.33}/{10.01} $
IC-VINS	1.44 / 10.83	$ {1.77}/{13.07} $
IC-GVINS	1.31 / 4.27	1.32 / 3.84

We calculated the absolute pose error in the urban38 and urban39, as shown in Table I. IC-GVINS yields the best accuracy in this dataset, and the accuracy is significantly improved compared to IC-VINS. VINS-Fusion exhibits the worst rotation accuracy, mainly because of the effect of the GNSS outliers. IC-VINS also yields superior accuracy than VINS-Mono and even VINS-Fusion in the urban38. The results demonstrate that the proposed method with the INS-centric architecture is practical in these complex urban environments. Specifically, by fully using the INS information, the proposed method can mitigate the impact of the visual-challenging scenes and exhibit satisfied robustness and accuracy.

我们计算了 urban38 和 urban39 中的绝对姿态误差，如表 I 所示。IC-GVINS 在该数据集中提供了最佳的准确性，与 IC-VINS 相比，准确性显著提高。VINS-Fusion 的旋转准确性最差，主要是由于 GNSS 异常值的影响。IC-VINS 在 urban38 中的准确性也优于 VINS-Mono，甚至优于 VINS-Fusion。结果表明，所提出的基于 INS 的架构方法在这些复杂的城市环境中是实用的。具体而言，通过充分利用 INS 信息，所提出的方法可以减轻视觉挑战场景的影响，并表现出令人满意的鲁棒性和准确性。

C. Private Dataset

C. 私有数据集

The private dataset, building, was collected by a wheeled robot in complex campus scenes where there were many trees and buildings. Many fast-moving objects around the road also make this dataset highly challenging. The sensors include a monocular camera (Allied Vision Mako-G131 with a resolution of 1280x1024), an industrial-grade MEMS IMU (ADI ADIS16465 with the gyroscope bias instability of $ {2}^{ \circ }/\mathrm{{hr}} $ ), and a GNSS-RTK receiver (NovAtel OEM-718D). All the sensors have been synchronized through hardware trigger to the GNSS time. The intrinsic and extrinsic parameters of the camera have been calibrated using the Kalibr [26]. The employed ground-truth system is a high-accuracy Position and Orientation System (POS), using the GNSS RTK and a navigation-grade IMU. The ground truth $ ({0.02}\mathrm{;m} $ for position and 0.01 deg for attitude) was generated by a post-processing GNSS/INS integration software. The average speed of the wheeled robot is about $ {1.5}\mathrm{;m}/\mathrm{s} $ . The trajectory length of the building dataset is 1337 meters (950 seconds). As there are rich visual textures in this dataset, we used a max of 120 features.

私有数据集 building 是由一台轮式机器人在复杂的校园场景中收集的，该场景中有许多树木和建筑物。道路周围的许多快速移动物体也使得该数据集具有很高的挑战性。传感器包括一台单目相机(Allied Vision Mako-G131，分辨率为 1280x1024)、一台工业级 MEMS IMU(ADI ADIS16465，陀螺仪偏置不稳定性为 $ {2}^{ \circ }/\mathrm{{hr}} $)和一台 GNSS-RTK 接收器(NovAtel OEM-718D)。所有传感器通过硬件触发与 GNSS 时间同步。相机的内参和外参已使用 Kalibr 进行标定 [26]。所采用的真实值系统是一个高精度的位置和姿态系统(POS)，使用 GNSS RTK 和导航级 IMU。位置的真实值 $ ({0.02}\mathrm{;m} $ 和姿态的真实值为 0.01 度)是通过后处理 GNSS/INS 集成软件生成的。轮式机器人的平均速度约为 $ {1.5}\mathrm{;m}/\mathrm{s} $。建筑数据集的轨迹长度为 1337 米(950 秒)。由于该数据集中有丰富的视觉纹理，我们最多使用了 120 个特征。

0192d713-eeb7-7e4d-8e6e-c4e7a2013e16_5_919_185_671_390_0.jpg

Fig. 6. The test scenes in the building dataset. The cyan rectangle denotes the GNSS-outage area in Fig. 7.

图6. 建筑数据集中的测试场景。青色矩形表示图7中的GNSS失效区域。

0192d713-eeb7-7e4d-8e6e-c4e7a2013e16_5_919_709_666_466_0.jpg

Fig. 7. The trajectories in the building dataset. The cyan rectangle corresponds to the GNSS-outage area in Fig. 6.

图7. 建筑数据集中的轨迹。青色矩形对应于图6中的GNSS失效区域。

As shown in Fig. 6, many GNSS-degenerated scenes exist in the building dataset, and the GNSS is even interrupted in a narrow corridor. There are large drifts for VINS-Mono and OpenVINS, while there are only small drifts for IC-VINS, as depicted in Fig. 7. Besides, IC-GVINS is well aligned to the ground truth, even though there are GNSS outliers and outages. In contrast, VINS-Fusion has a notable drift because of the impact of the GNSS outliers, as depicted in Fig. 6.

如图6所示，建筑数据集中存在许多GNSS退化场景，甚至在狭窄的走廊中GNSS也会中断。VINS-Mono和OpenVINS存在较大的漂移，而IC-VINS则仅有小的漂移，如图7所示。此外，尽管存在GNSS异常值和失效，IC-GVINS与真实值对齐良好。相比之下，由于GNSS异常值的影响，VINS-Fusion出现了显著的漂移，如图6所示。

We also calculated the absolute pose error, as exhibited in Table II. The results demonstrate that IC-VINS yields higher accuracy than VINS-Mono and OpenVINS. In addition, VINS-Fusion shows worse accuracy than VINS-Mono, because the GNSS outlier may ruin the estimator significantly. In contrast, IC-GVINS exhibits improved accuracy compared to IC-VINS and performs the best in the building dataset. As can be seen, the proposed INS-centric can fully utilize the INS information and thus can mitigate the impact of the visual-challenging scenes in complex environments. Moreover, the employed outlier-culling algorithm for visual and GNSS observations can significantly improve the system robustness.

我们还计算了绝对姿态误差，如表II所示。结果表明，IC-VINS的准确性高于VINS-Mono和OpenVINS。此外，VINS-Fusion的准确性低于VINS-Mono，因为GNSS异常值可能会显著破坏估计器。相比之下，IC-GVINS的准确性相较于IC-VINS有所提高，并且在建筑数据集中表现最佳。可以看出，所提出的以INS为中心的方法能够充分利用INS信息，从而减轻复杂环境中视觉挑战场景的影响。此外，针对视觉和GNSS观测所采用的异常值剔除算法可以显著提高系统的鲁棒性。

TABLE II

ABSOLUTE POSE ERROR IN THE ROBOT DATASET

机器人数据集中的绝对姿态误差

ARE / ATE (deg / m)	VINS-Mono	VINS-Fusion	OpenVINS	IC-VINS	IC-GVINS
building	$ {0.67}/{5.46} $	$ {8.30}/{5.53} $	$ {2.98}/{6.01} $	$ {0.41}/{1.83} $	0.40 / 0.86

ARE / ATE (度 / 米)	VINS-Mono	VINS-Fusion	OpenVINS	IC-VINS	IC-GVINS
建筑	$ {0.67}/{5.46} $	$ {8.30}/{5.53} $	$ {2.98}/{6.01} $	$ {0.41}/{1.83} $	0.40 / 0.86

TABLE III

ABSOLUTE POSE ERROR IN DIFFERENT CONFIGURATIONS

不同配置中的绝对姿态误差

ARE / ATE (deg / m)	urban38	urban39	building
IC-VINS	$ {1.44}/{10.83} $	1.77 / 13.07	$ {0.41}/{1.83} $
IC-VINS-E	1.45 / 9.91	2.08 / 15.64	$ {0.62}/{2.09} $
IC-VINS-O	$ {1.62}/{12.88} $	1.65 / 12.12	$ {0.70}/{2.34} $
IC-VINS-I	$ {1.54}/{11.37} $	2.23 / 15.83	$ {0.50}/{1.90} $

ARE / ATE (度 / 米)	urban38	urban39	建筑
IC-VINS	$ {1.44}/{10.83} $	1.77 / 13.07	$ {0.41}/{1.83} $
IC-VINS-E	1.45 / 9.91	2.08 / 15.64	$ {0.62}/{2.09} $
IC-VINS-O	$ {1.62}/{12.88} $	1.65 / 12.12	$ {0.70}/{2.34} $
IC-VINS-I	$ {1.54}/{11.37} $	2.23 / 15.83	$ {0.50}/{1.90} $

The IC-VINS-E denotes the method without the Earth rotation compensation in IMU preintegration. The IC-VINS-O denotes the method without the strict outlier-culling strategy. The IC-VINS-I denotes the method without the INS aiding in feature tracking.

IC-VINS-E表示不进行地球自转补偿的IMU预积分方法。IC-VINS-O表示不采用严格异常值剔除策略的方法。IC-VINS-I表示不在特征跟踪中使用INS辅助的方法。

D. Robustness Evaluation

D. 鲁棒性评估

To fully demonstrate the robustness of the proposed method, we further evaluated the effects of the Earth rotation compensation, the strict outlier-culling algorithm, and the INS aiding in feature tracking. Three extra configurations were employed for the evaluation, as shown in Table III.

为了充分展示所提方法的鲁棒性，我们进一步评估了地球自转补偿、严格的异常值剔除算法以及惯性导航系统(INS)在特征跟踪中的辅助效果。评估中采用了三种额外的配置，如表 III 所示。

Effect of the Earth Rotation Compensation: The gyroscope bias-instability parameters for MTi- $ {300}\left( {{10}^{ \circ }/\mathrm{{hr}}}\right) $ in the KAIST dataset and ADIS $ {16465}\left( {{2}^{ \circ }/\mathrm{{hr}}}\right) $ in the robot dataset are all smaller than the Earth rotation rate of $ {15}^{ \circ }/\mathrm{{hr}} $ . Thus, it is necessary to compensate for the Earth rotation in the INS mechanization and IMU preintegration. We compared the results of IC-VINS and IC-VINS-E (without compensating for the Earth rotation), as depicted in Table III. The results indicate that the Earth rotation compensation can improve the system accuracy in the urban39 and building, while the translation accuracy degrades a little in the urban38. MTi-300 is not precise enough to sense the Earth rotation; thus, the effect of the Earth rotation compensation for Mti-300 in the KAIST dataset should not be significant. Besides, the impact of the Earth rotation compensation cannot be effectively determined if the visual observations are sufficient, as mentioned in [5].
地球自转补偿的影响:KAIST 数据集中 MTi- $ {300}\left( {{10}^{ \circ }/\mathrm{{hr}}}\right) $ 的陀螺仪偏置不稳定性参数和机器人数据集中 ADIS $ {16465}\left( {{2}^{ \circ }/\mathrm{{hr}}}\right) $ 的参数均小于地球自转速率 $ {15}^{ \circ }/\mathrm{{hr}} $。因此，在 INS 机制和 IMU 预积分中有必要对地球自转进行补偿。我们比较了 IC-VINS 和 IC-VINS-E(未补偿地球自转)的结果，如表 III 所示。结果表明，地球自转补偿可以提高城市39和建筑中的系统精度，而在城市38中的平移精度略有下降。MTi-300 对地球自转的感知不够精确；因此，KAIST 数据集中 Mti-300 的地球自转补偿效果不应显著。此外，如果视觉观测足够，地球自转补偿的影响无法有效确定，如 [5] 中提到的。

As ADIS16465 is more precise, we further evaluated the effect of the Earth rotation compensation by detecting different visual features in the robot dataset. As can be seen in Table IV, the effect of the Earth rotation compensation is more significant when the visual features are fewer. The results demonstrate that the Earth rotation compensation can improve the system accuracy, especially when the visual system is weak, i.e., the visual-challenging scenes. Hence, we suggest compensating for the Earth rotation if a high-grade IMU is employed, which can improve the system accuracy in complex environments.

由于 ADIS16465 更为精确，我们进一步通过检测机器人数据集中的不同视觉特征来评估地球自转补偿的效果。如表 IV 所示，当视觉特征较少时，地球自转补偿的效果更为显著。结果表明，地球自转补偿可以提高系统精度，特别是在视觉系统较弱的情况下，即在视觉挑战场景中。因此，我们建议在使用高等级 IMU 时进行地球自转补偿，这可以提高复杂环境中的系统精度。

TABLE IV

ABSOLUTE POSE ERROR CONCERNING DIFFERENT VISUAL FEATURES IN THE ROBOT DATASET

关于机器人数据集中不同视觉特征的绝对位姿误差

ARE / ATE (deg / m)	120	60	30
IC-VINS	0.41 / 1.83	$ {0.55}/{1.82} $	0.65 / 2.21
IC-VINS-E	$ {0.62}/{2.09} $	$ {0.90}/{2.45} $	$ {0.69}/{2.40} $

ARE / ATE (度 / 米)	120	60	30
IC-VINS	0.41 / 1.83	$ {0.55}/{1.82} $	0.65 / 2.21
IC-VINS-E	$ {0.62}/{2.09} $	$ {0.90}/{2.45} $	$ {0.69}/{2.40} $

0192d713-eeb7-7e4d-8e6e-c4e7a2013e16_6_980_454_563_428_0.jpg

Fig. 8. Comparison of the number of the landmarks in the building dataset. The green rectangles in the figure denote the areas where it occurs speed bumps and potholes.

图8. 建筑数据集中地标数量的比较。图中的绿色矩形表示出现减速带和坑洼的区域。

Effect of the Strict Outlier-Culling Algorithm: Previous results have demonstrated that the employed GNSS outlier-culling algorithm can significantly improve the system robustness and accuracy. As for the visual outlier-culling algorithm, we compared the results of IC-VINS and IC-VINS-O, as exhibited in Table IV. IC-VINS-O uses only the outlier-culling algorithm in VINS-Mono [6] without using the strict outlier-culling algorithm described in Section III.C. 2 and III.D.5. The results indicate that IC-VINS outperforms IC-VINS-O in the urban38 and building, while the accuracy degrades a little in the urban39. The strict outlier-culling algorithm will result in fewer valid visual landmarks, but motions are needed to triangulate new landmarks. However, the vehicle has to stop at the traffic lights in the KAIST dataset frequently, and the passing vehicles may interrupt the feature tracking, resulting in fewer valid visual landmarks. The new landmarks cannot be created during stationary states with a monocular camera. Detecting more visual features in the KAIST dataset may solve this problem. Hence, we suggest employing the proposed outlier-culling algorithm to improve the robustness, especially in complex environments.
严格异常值剔除算法的影响:先前的结果表明，所采用的GNSS异常值剔除算法可以显著提高系统的鲁棒性和准确性。至于视觉异常值剔除算法，我们比较了IC-VINS和IC-VINS-O的结果，如表IV所示。IC-VINS-O仅使用VINS-Mono中的异常值剔除算法[6]，而未使用第III.C.2节和III.D.5节中描述的严格异常值剔除算法。结果表明，IC-VINS在urban38和建筑中优于IC-VINS-O，而在urban39中的准确性略有下降。严格的异常值剔除算法会导致有效视觉地标的减少，但需要运动来三角测量新的地标。然而，在KAIST数据集中，车辆必须频繁在红灯前停下，经过的车辆可能会干扰特征跟踪，导致有效视觉地标的减少。在静止状态下，单目相机无法创建新的地标。在KAIST数据集中检测更多视觉特征可能会解决这个问题。因此，我们建议采用所提出的异常值剔除算法以提高鲁棒性，特别是在复杂环境中。
Effect of the INS Aiding in Feature Tracking: We also compared the results of IC-VINS and IC-VINS-I (without the INS aiding in the feature tracking) in Table IV. The results illustrate that the INS aiding in feature tracking can improve the system accuracy, especially in the high-dynamic dataset, i.e., the KAIST dataset. For the low-speed wheeled robot, the effect of the INS aiding is limited. In the building dataset, there are several speed bumps and potholes which may cause aggressive motion, making feature tracking extremely challenging. Hence, we compared the landmarks in the building dataset to evaluate the effect of the INS aiding in feature tracking. As depicted in Fig. 8, without the INS aiding, the valid landmarks are far fewer than 20 in such cases and are even close to 0 . With the help of INS aiding, the valid landmarks are more than 20 during the whole travel. The results demonstrate that the INS aiding can improve the robustness of the feature tracking significantly, especially in high-dynamic scenes.
INS 辅助在特征跟踪中的影响:我们还在表 IV 中比较了 IC-VINS 和 IC-VINS-I(在特征跟踪中没有 INS 辅助)的结果。结果表明，INS 辅助在特征跟踪中可以提高系统的准确性，特别是在高动态数据集，即 KAIST 数据集。对于低速轮式机器人，INS 辅助的效果有限。在建筑数据集中，有几个减速带和坑洼可能导致激烈的运动，使特征跟踪变得极具挑战性。因此，我们比较了建筑数据集中的地标，以评估 INS 辅助在特征跟踪中的效果。如图 8 所示，在没有 INS 辅助的情况下，有效地标的数量在这种情况下远低于 20，甚至接近 0。在 INS 辅助的帮助下，整个行程中有效地标的数量超过 20。结果表明，INS 辅助可以显著提高特征跟踪的鲁棒性，特别是在高动态场景中。

TABLE V

AVERAGE RUNNING TIME OF IC-GVINS

IC-GVINS 的平均运行时间

PC / Onboard (ms)	urban38	urban39	building
Front-end	$ {11.5}/{35.9} $	$ {11.8}/{39.8} $	$ {14.4}/{32.4} $
FGO	$ {18.4}/{73.2} $	18.3 / 76.5	17.4 / 101.5

PC / 车载 (毫秒)	urban38	urban39	建筑
前端	$ {11.5}/{35.9} $	$ {11.8}/{39.8} $	$ {14.4}/{32.4} $
FGO	$ {18.4}/{73.2} $	18.3 / 76.5	17.4 / 101.5

Here, the FGO is only conducted when a new keyframe is selected.

在这里，只有在选择新关键帧时才进行 FGO。

E.Run Time Analysis

E. 运行时间分析

The average running times of IC-GVINS are shown in Table VI. All the experiments are running within the ROS framework, demonstrating that IC-GVINS can perform real-time positioning on both the desktop PC (AMD R7-3700X) and the onboard ARM computer (NVIDIA Xavier).

IC-GVINS 的平均运行时间如表 VI 所示。所有实验均在 ROS 框架内运行，表明 IC-GVINS 可以在桌面 PC(AMD R7-3700X)和车载 ARM 计算机(NVIDIA Xavier)上进行实时定位。

V. CONCLUSION

V. 结论

A robust, real-time, INS-centric GNSS-visual-inertial navigation system is presented in this letter. As the visual system may be affected by degenerated scenes, the precise INS information is fully employed in the visual processes and state estimation to improve the system robustness and accuracy in complex environments. With the GNSS-aided initialization, the IMU, visual, and GNSS measurements can be tightly fused in a unified world frame within the FGO framework. We performed experiments in both the high-speed vehicle and the low-speed robot datasets. IC-GVINS exhibits superior robustness and accuracy in degenerated and challenging scenes. The results demonstrate that the proposed method with the INS-centric architecture can significantly improve the system robustness and accuracy compared to the SOTA methods in complex environments.

本文提出了一种强健的实时INS中心的GNSS-视觉惯性导航系统。由于视觉系统可能受到退化场景的影响，因此在视觉处理和状态估计中充分利用精确的INS信息，以提高系统在复杂环境中的鲁棒性和准确性。通过GNSS辅助初始化，IMU、视觉和GNSS测量可以在FGO框架内紧密融合到统一的世界坐标系中。我们在高速车辆和低速机器人数据集上进行了实验。IC-GVINS在退化和具有挑战性的场景中表现出卓越的鲁棒性和准确性。结果表明，所提出的INS中心架构的方法相比于现有的最先进方法，在复杂环境中可以显著提高系统的鲁棒性和准确性。

REFERENCES

参考文献

[1] R. Siegwart, I. R. Nourbakhsh, and D. Scaramuzza, Introduction to Autonomous Mobile Robots, 2nd ed. Cambridge, MA, USA: MIT Press, 2011.

[2] C. Cadena et al., "Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age," IEEE Trans. Robot., vol. 32, no. 6, pp. 1309-1332, Dec. 2016.

[3] J. Janai, F. Güney, A. Behl, and A. Geiger, "Computer vision for autonomous vehicles: Problems, datasets and state of the art," Mar. 2021. [Online]. Available: http://arxiv.org/abs/1704.05519

[4] P. D. Groves, Principles of GNSS, Inertial, and Multisensor Integrated Navigation Systems. Norwood, MA, USA: Artech House, 2008.

[5] H. Tang, T. Zhang, X. Niu, J. Fan, and J. Liu, "Impact of the Earth rotation compensation on MEMS-IMU preintegration of factor graph optimization," IEEE Sens. J., vol. 22, no. 17, pp. 17194-17204, Sep. 2022.

[6] T. Qin, P. Li, and S. Shen, "VINS-mono: A robust and versatile monocular visual-inertial state estimator," IEEE Trans. Robot., vol. 34, no. 4, pp. 1004-1020, Aug. 2018.

[7] C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós, "ORB-SLAM3: An accurate open-source library for visual, visual-inertial, and multimap SLAM," IEEE Trans. Robot., vol. 37, no. 6, pp. 1874-1890, Dec. 2021.

[8] A. I. Mourikis and S. I. Roumeliotis, "A multi-state constraint Kalman filter for vision-aided inertial navigation," in Proc. IEEE Int. Conf. Robot. Automat., 2007, pp. 3565-3572.

[9] P. Geneva, K. Eckenhoff, W. Lee, Y. Yang, and G. Huang, "OpenVINS: A research platform for visual-inertial estimation," in Proc. IEEE Int. Conf. Robot. Automat., 2020, pp. 4666-4672.

[10] M. Bloesch, M. Burri, S. Omari, M. Hutter, and R. Siegwart, "Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback," Int. J. Robot. Res., vol. 36, no. 10, pp. 1053-1072, Sep. 2017.

[11] Z. Huai and G. Huang, "Robocentric visual-inertial odometry," Int. J. Robot. Res., vol. 41, no. 7, pp. 667-689, Jul. 2022.

[12] G. Huang, "Visual-inertial navigation: A concise review," in Proc. IEEE Int. Conf. Robot. Automat., 2019, pp. 9572-9582.

[13] A. Rosinol, M. Abate, Y. Chang, and L. Carlone, "Kimera: An open-source library for real-time metric-semantic localization and mapping," in Proc. IEEE Int. Conf. Robot. Automat., 2020, pp. 1689-1696.

[14] V. Usenko, N. Demmel, D. Schubert, J. Stückler, and D. Cremers, "Visual-inertial mapping with non-linear factor recovery," IEEE Robot. Automat. Lett., vol. 5, no. 2, pp. 422-429, Apr. 2020.

[15] F. Dellaert, Factor Graphs and GTSAM: A Hands-on Introduction. Atlanta, GA, USA: Georgia Inst. Technol., 2012.

[16] T. Qin, S. Cao, J. Pan, and S. Shen, "A general optimization-based framework for global pose estimation with multiple sensors," Jan. 2019, arXiv:1901.03642. [Online]. Available: http://arxiv.org/abs/1901.03642

[17] S. Cao, X. Lu, and S. Shen, "GVINS: Tightly coupled gnss-visual-inertial fusion for smooth and consistent state estimation," IEEE Trans. Robot., vol. 38, no. 4, pp. 2004-2021, Aug. 2022.

[18] R. Jin, J. Liu, H. Zhang, and X. Niu, "Fast and accurate initialization for monocular vision/INS/GNSS integrated system on land vehicle," IEEE Sens. J., vol. 21, no. 22, pp. 26074-26085, Nov. 2021.

[19] L. Xiong et al., "G-VIDO: A vehicle dynamics and intermittent GNSS-aided visual-inertial state estimator for autonomous driving," IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8, pp. 11845-11861, Aug. 2022.

[20] S. Han, F. Deng, T. Li, and H. Pei, "Tightly coupled optimization-based GPS-visual-inertial odometry with online calibration and initialization," Mar. 2022. [Online]. Available: http://arxiv.org/abs/2203.02677

[21] Q. Zhang, S. Li, Z. Xu, and X. Niu, "Velocity-based optimization-based alignment (VBOBA) of low-end MEMS IMU/GNSS for low dynamic applications," IEEE Sens. J., vol. 20, no. 10, pp. 5527-5539, May 2020.

[22] D. Wang, H. Lv, and J. Wu, "In-flight initial alignment for small UAV MEMS-based navigation via adaptive unscented Kalman filtering approach," Aerosp. Sci. Technol., vol. 61, pp. 73-84, Feb. 2017.

[23] Agarwal, Sameer, Mierle, and Keir, "Ceres solver - A large scale nonlinear optimization library," 2022. [Online]. Available: http://ceres-solver.org/

[24] M. Grupp, "Evo," Jul. 2022. [Online]. Available: https://github.com/ MichaelGrupp/evo

[25] J. Jeong, Y. Cho, Y.-S. Shin, H. Roh, and A. Kim, "Complex urban dataset with multi-level sensors from highly diverse urban environments," Int. J. Robot. Res., vol. 38, no. 6, pp. 642-657, May 2019.

[26] J. Rehder, J. Nikolic, T. Schneider, T. Hinzmann, and R. Siegwart, "Extending Kalibr: Calibrating the extrinsics of multiple IMUs and of individual axes," in Proc. IEEE Int. Conf. Robot. Automat., 2016, pp. 4304-4311.

posted @ 2024-10-29 18:20 cold_moon 阅读(67) 评论(0) 编辑收藏举报

刷新页面返回顶部

coldMoon

知行合一

IC-GVINS: A Robust, Real-Time, INS-Centric GNSS-Visual-Inertial Navigation System

IC-GVINS: A Robust, Real-Time, INS-Centric GNSS-Visual-Inertial Navigation System

IC-GVINS:一种稳健的实时以惯性导航系统为中心的全球导航卫星系统视觉惯性导航系统

I. INTRODUCTION

I. 引言

II. SYSTEM OVERVIEW

II. 系统概述

III. Methodology

III. 方法论

A.INS Mechanization

A. INS 机械化

B. GNSS-Aided Initialization

B. GNSS 辅助初始化

C. INS-Aided Visual Processes

C. INS 辅助视觉过程

D. Factor Graph Optimization

D. 因子图优化

IV. EXPERIMENTS AND RESULTS

IV. 实验与结果

A. Implementation and Evaluation Setup

A. 实施与评估设置

B. Public Dataset

B. 公共数据集

C. Private Dataset

C. 私有数据集

D. Robustness Evaluation

D. 鲁棒性评估

E.Run Time Analysis

E. 运行时间分析

V. CONCLUSION

V. 结论

REFERENCES

参考文献

公告