视觉里程计1-SLAMCP7

——20.11.27

在CP2里面讲到，视觉SLAM主要分为视觉前端以及优化后端，后端也被称为视觉里程计（VO）。根据相邻图像的信息，估计粗略的相机运动，给后端提供较好的初始值。VO的实现方法按是否需要提取特征，分为特征点的前端以及不提特征的直接法前端。特征点的前端运行稳定，对光照和动态物体不敏感。

一、特征点法

1.特征点

VO的主要问题是如何根据图像来估计相机运动，图像本身是一个由亮度以及色彩组成的矩阵，如果从矩阵层面考虑运动估计，将会十分困难。采用另一做法。1.选取比较有代表性的点（-视角发生少量变化后，保持不变 -各个图像中相同的点）2.在这些点的基础上，讨论相机位姿估计问题，以及这些点的定位问题。在SLAM中又称这些点是位标，在视觉SLAM中路标则指图像特征。

特征是图像信息的另一种数字表达形式。数字图像在计算机中以灰度值矩阵的方式存储。在视觉里程计中，我们希望特征点在相机运动后保持稳定。但是灰度值受光照、形变、物体材质的影响严重，不同图像之间变化很大，不够稳定。理想情况下在相机发生少许变化之后，我们还能从图像中分辨出此时与上一帧对应的相同点。因此仅凭灰度值是不够的，我们需要对图像提取特征点。

我们通常可以把角点、边缘和区块作为图像中具有代表的地方。但在图像中很明显分辨度是：角点>边缘>区块。所以一种直观的提取方式就是在不用的图像中提取角点之后两个图像一一对应，此时角点就是特征。但我们找出的角点不一定符合我们的需求，比如在放大后发现其并不是角点或是经过旋转之后难以辨别出同一个角点。所以相对于朴素的角点，人工设计的特征点因具有以下特征：

1.可重复性，相同的区域可以在不同的图像识别出来。

2.可区别性，不同的区域有不同的表达。

3.高效率，特征点的数量应远小于图像的像素。

4.本地性，特征仅于一小片区有相关。

特征点主要由关键点与描述子两部分构成。关键点指的是特征点在图像中的位置（还可以包括朝向、大小等信息）。描述子通常是一个向量，描述了关键点周围像素的信息。描述子是按照“外观相似的特征应该有相同的描述子”这个原则设计的。所以两个特征点的描述子在向量空间上的距离相近，就能认为是同一特征点。

图形特征中SIFT（尺度不变特征变换）是最为经典的一种。充分考虑了在图像变换中出现的光照、尺度、旋转等变化，但随之而来的是极大的计算量。另外一种就是在考虑适当放弃精度以及鲁棒性，从而提升运行速度。就是FAST关键点（注意这里没有描述子，不能称之为特征点），而ORB特征是目前非常具有代表性的实时图像特征。它改进了FAST检测子不具有方向性的问题，并采用二进制描述子BRIEF，加速特征提取。

2.ORB特征

ORB特征也是由关键点和描述子组成。关键点是一种改进的FAST角点。描述子是BRIEF。FAST角点提取相较于FAST，ORB计算了特征点的主方向，为后续的BRIEF提供了旋转不变性。

FAST关键点主要是检测局部像素灰度明显的地方，以速度著称。思想是：如果一个像素与它周围的像素差距较大（过亮或过暗），那么它很可能是角点，FAST只关心像素的亮度。步骤如下：

1.在图像中选取像素p，设他的亮度为Ip

2.设置一个阈值T（比如Ip的20%）

3.以p为中心，选取半径为3的圆上的16个点

4.假如在选取的圆上，有连续N个点大于Ip+T后小于Ip-T，那么就能认为是特征点（N通常为12，即FAST-12，常用的有FAST-9，FAST-11）

5.对图像上面每个像素循环上述步骤

在FAST-12的算法中，为了更加高效可以增加一个测试，来排除不是角点的像素。就是直接检测1，5，9，13的像素亮度，只有这四个中有三个同时大于Ip+T或小于Ip-T，该像素才有可能是角点。FAST还可能出现"扎堆"现象，所以在第一次检测之后，还需要用非极大值抑制，在一定区域内仅保留响应最大值的角点。

FAST特征点的缺点，第一，就是FAST特征点数量大且不稳定，我们往往需要从图像中提取固定数量的特征，因此，在ORB中进行了改进，我们可以设定角点的数量N，对初始FAST角点分别计算Harris响应值，然后提取前N个角点。第二，FAST不具有方向信息，而且在它固定取半径为3的圆存在尺度问题，远处的角点到近处就可能不是角点了。ORB增加了尺度以及旋转的描述，尺度不变性由构建图像金字塔（对图像进行不同层次的降采样，以获得不同分辨率的图像），并在金字塔的每一次进行检测角点。特征的旋转是通过灰度质心法。质心是指以图像块灰度作为权重的中心。

通过上述方法，FAST角点具有了尺度与旋转的描述，增加了图像表述的鲁棒性。

BRIEF描述子是一种二进制描述子，他的描述由许多的0，1组成，这里的0，1编码了关键点附近两个像素（q和p）的大小关系：如果q>p,则为1，反之则为0.如果我们取了128对qp，最后得到128维由0，1组成的向量。qp的选取是依据某种概率分布，随机的挑选其位置。BRIEF使用了随机选点的方式，速度十分的快，并且使用二进制方式储存，适用于实时图像匹配。原始的BRIEF描述子不具有旋转不变性，当图像进行旋转的时候很容易数据丢失。ORB的FAST特征点提取阶段计算了关键点的方向，可以利用方向信息计算，计算旋转之后的“Steer BRIEF”特征，使得ORB具有好的旋转不变性（？这里不太明白）。

3.特征匹配

特征匹配是视觉SLAM关键的一步，因为特征匹配解决了SLAM中的数据关联问题，即当前看到的路标和之前看到的路标之间的对应关系。通过对比图像与图像，或图像与地图之间的描述子进行匹配，为我们后续的姿态估计，优化等操作减轻负担。但是，由于图像特征局部特征，误匹配的情况广泛存在，已经成为目前SLAM性能提升的一大瓶颈。部分原因是因为场景中经常存在大量的重复纹理，使得特征描述十分相似。在这种情况之下，单纯使用局部特征解决误匹配是十分困难的。

在获得两个时刻的特征点后，如何区找寻两个特征点集合元素之间的对应关系，最简单的方法就是暴力匹配（Brute-Force Matcher），即第一帧每一个特征点与第二帧所有特征点测量描述子的距离，然后排序取最近一个作为匹配点。描述子距离表示了两个特征之间的相似程度，在实际情况中可以寻找不同的距离度量范数，对于浮点类型的描述子，使用欧氏距离（m维中的两点之间的欧式距离就是两点距离，或者是向量的自然距离）进行度量，对于二进制的描述子，往往使用汉明距离（两个二进制串之间的汉明距离就是它们之间不同位数的个数）作为量度。

为了符合SLAM实时性需求，使用暴力匹配运算量会变得很大，此时快速近似最近邻（FLANN）算法更适合匹配点多的情况。

4.实践

先上代码。CMakeLists里面只需要包括OpenCV的库就行。就不多复述。提取部分代码讲解方法。报错方面有一点要注意的就是CV_LOAD_IMAGE_COLOR 要使用#include<opencv2/imgcodecs/legacy/constants_c.h>或者还有其他方法就不多赘述。

#include<iostream>
#include<opencv2/core/core.hpp>
#include<opencv2/features2d/features2d.hpp>
#include<opencv2/highgui/highgui.hpp>
#include<opencv2/imgcodecs/legacy/constants_c.h>

using namespace std;
using namespace cv;

int main(int argc, char** argv){
    if(argc!=3){
        cout<<"img"<<endl;
        return 1;
    }
    Mat img_1 = imread(argv[1], CV_LOAD_IMAGE_COLOR);
    Mat img_2 = imread(argv[2], CV_LOAD_IMAGE_COLOR);
    
    std::vector<KeyPoint> keyPoints_1, keyPoints_2;
    Mat descriptors_1, descriptors_2;
    Ptr<ORB> orb = ORB::create(500, 1.2f, 8, 31, 0, 2, ORB::HARRIS_SCORE, 31, 20);
    
    orb->detect(img_1, keyPoints_1);
    orb->detect(img_2, keyPoints_2);
    
    orb->compute(img_1, keyPoints_1, descriptors_1);
    orb->compute(img_2, keyPoints_2, descriptors_2);
    
    Mat outimg1;
    drawKeypoints(img_1, keyPoints_1, outimg1, Scalar::all(-1), DrawMatchesFlags::DEFAULT);
    imshow("ORB", outimg1);
    
    vector<DMatch> matches;
    BFMatcher matcher(NORM_HAMMING);
    matcher.match(descriptors_1, descriptors_2, matches);
    
    double min_dist = 1000, max_dist=0;
    for(int i=0; i<descriptors_1.rows; i++){
        double dist = matches[i].distance;
        if(dist<min_dist) min_dist=dist;
        if(dist<max_dist) max_dist=dist;
    }
    
    cout<<"min:"<<min_dist<<endl;
    cout<<"max:"<<max_dist<<endl;
    
    std::vector<DMatch> good_matches;
    for(int i=0; i<descriptors_1.rows; i++){
        if(matches[i].distance<=max(2*min_dist, 30.0)){
            good_matches.push_back(matches[i]);
        }
    }
    
    Mat img_match;
    Mat img_goodmatch;
    drawMatches(img_1, keyPoints_1, img_2, keyPoints_2, matches, img_match);
    drawMatches(img_1, keyPoints_1, img_2, keyPoints_2, good_matches, img_goodmatch);
    imshow("all", img_match);
    imshow("good", img_goodmatch);
    waitKey(0);
    
    return 0;
}

View Code

Ptr<ORB> orb = ORB::create(500, 1.2f, 8, 31, 0, 2, ORB::HARRIS_SCORE, 31, 20);
    
orb->detect(img_1, keyPoints_1);
orb->detect(img_2, keyPoints_2);
    
orb->compute(img_1, keyPoints_1, descriptors_1);
orb->compute(img_2, keyPoints_2, descriptors_2);
BFMatcher matcher(NORM_HAMMING);
matcher.match(descriptors_1, descriptors_2, matches);
    
double min_dist = 1000, max_dist=0;
for(int i=0; i<descriptors_1.rows; i++){
    double dist = matches[i].distance;
    if(dist<min_dist) min_dist=dist;
    if(dist<max_dist) max_dist=dist;
}

std::vector<DMatch> good_matches;
for(int i=0; i<descriptors_1.rows; i++){
   if(matches[i].distance<=max(2*min_dist, 30.0)){
        good_matches.push_back(matches[i]);
    }
}

ORB::Create()对应的是创建一个ORB的方法，输入对应的参数，比如最大特征点保留，图像金字塔比例，图像金字塔级别，为检测边框大小等等，可以在OpenCV官网中查找对应的内容。BFMatcher是使用暴力搜索的方式进行特征匹配，用汉明距离的方式，对应的到能够存在BFMatcher也可以存在DMatch，因为对应的结构体中有封装了一个查询集queryId 和一个训练集trainId，对应的就是descriptors_1, descriptors_2，检测次数和descriptors_1的个数，对应是查询集的每一行对应训练集中找到最匹配的那一行。后面对应的就是匹配点筛选，找出匹配点之间最大和最小距离即最相似和最不相似两组点之间的距离，当描述子距离大于两倍的最小距离当最小距离过小则设置一个经验值。

min:4
max:94

上面是对应结果。

二、2D-2D:对极几何

当相机为单目，我们只知道2d像素坐标，我们需要根据两组2D点估计运动。则需要对极几何。

1.对极约束

现在我们假设从两张图像中获得了一对匹配好的特征点，如果我们我们有若干对匹配点，就能通过直接二维匹配点的关系恢复两帧之间相机的运动过程。

比如说我们希望求L1，L2两帧之间的运动，设第一帧和第二帧之间的运动为R，t，两个相机中心分别为O1，O2。在L1内的一个特征点p1对应L2内的一个特征点p2，如果匹配正确，可以说他们是同一个空间点在两个成像平面上的投影。首先连线O1p1和O2p2会在三维空间内交于P点，此时O1，O2，P可以确定一个平面，我们称之为极平面，O1，O2和像平面L1，L2分别交于e1，e2称之为极点，O1O2为基线，称及平面与来两个像平面L1，L2之间的相交线l1，l2为极线。

从第一帧看过去射线O1p1上的某点是图像L1像素所在的空间位置——因为射线上的所有点都会投影到同一个像素点，同时，不知道P的位置，我们在L2上看到e2p2（第二个图像的极线）就是P可能出现的投影位置，也就是O1p1射线在第二个相机的投影。我们现在通过特征点匹配，确定p2的像素位置，才能推断正确的P点，以及相机运动。

对极约束检阅了给出了两个匹配点的空间位置关系，相机问题变成以下两步：1.根据配对点像素位置求出E或者F。（只差相机内参，而相机内参在SLAM问题中是已知的）2.根据E或F，求出R，t

2.本质矩阵

根据定义本质矩阵E=t^R，是一个3*3的矩阵，有六个未知数。特征：1.本质矩阵是对极约束定义的，由于对极约束是等式为零的约束，所有对E乘以任意非零常数，对极约束依旧成立。所以E在不同尺度下是等价的2.根据本质矩阵的式子，本质矩阵的奇异值必定是[σ,σ,0]^T的形式，称为本质矩阵的内在性质3.由于平移和旋转各有三个自由度，所以t^R有六个自由度，但由于尺度等价性，E有五个自由度。

E具有五个自由度的事实，需要有五对点来求解R。但是，E的内在性质是一种非线性性质，在求解方程的时候会带来麻烦，因此可以只考虑尺度等价性，使用八点法来估计E，只利用E的线性性质，可以在线性代数框架下求解。

上图通过图像的方式表示分解本质矩阵得到的四个解，我们已知空间点在相机上的投影，想求解相机的运动，在保持红点不变的情况，可以画出四种解，幸运的是只有第一种解中具有正的深度，只要把任意一点带入四个解中，检测该点的在两个相机下的深度，就能知道那个是正确的。

3.单应矩阵

除了基本矩阵以及本质矩阵，还有一种单应矩阵H，描述了两个平面之间的映射关系，若场景的特征点都落在同一平面上，则可以使用单应性进行运动估计。

单应性在SLAM中十分重要，当特征点共面或发生纯旋转的时候，基础矩阵的自由度下降，此时就发生了退化，此时继续使用八点法求解多余出来的自由度，就会被噪声所决定。为了防止退化现象的发生，我们会同时估计基础矩阵以及单应矩阵，选择重投影误差（一个特征点在归一化相机坐标系下估计值和观测值的差）小的作为最后运动估计矩阵。

4.实践

#include<iostream>
#include<opencv2/core/core.hpp>
#include<opencv2/features2d/features2d.hpp>
#include<opencv2/highgui/highgui.hpp>
#include<opencv2/calib3d/calib3d.hpp>
#include<opencv2/imgcodecs/legacy/constants_c.h>

using namespace std;
using namespace cv;

void find_feature_matches(const Mat &img_1, const Mat &img_2,
    std::vector<KeyPoint> &keypoints_1, std::vector<KeyPoint> &keypoints_2,
    std::vector<DMatch> &matches);
void pose_estimation_2d2d(std::vector<KeyPoint> keypoints_1,
    std::vector<KeyPoint> keypoints_2, std::vector<DMatch> matches,
    Mat &R, Mat &t);

Point2d pixel2cam(const Point2d &p, const Mat &K);

int main(int argc, char** argv){
    if(argc!=3){
        cout<<"img1 img2"<<endl;
        return 1;
    }
    Mat img_1 = imread(argv[1], CV_LOAD_IMAGE_COLOR);
    Mat img_2 = imread(argv[2], CV_LOAD_IMAGE_COLOR);
    assert(img_1.data && img_2.data && "Can not load images!");
    
    vector<KeyPoint> keypoints_1, keypoints_2;
    vector<DMatch> matches;
    find_feature_matches(img_1, img_2, keypoints_1, keypoints_2, matches);
    cout<<"find matches count:"<<matches.size()<<endl;
    
    Mat R, t;
    pose_estimation_2d2d(keypoints_1, keypoints_2, matches, R, t);
    
    Mat t_x=(Mat_<double>(3,3) << 0, -t.at<double>(2,0), t.at<double>(1,0),
        t.at<double>(2,0), 0, -t.at<double>(0,0),
        -t.at<double>(1,0), t.at<double>(0,0), 0);
    cout<<"t^R="<<endl<<t_x*R<<endl;
    
    Mat K = (Mat_<double>(3,3)<<520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1);
    for(DMatch m:matches){
        Point2d pt1 = pixel2cam(keypoints_1[m.queryIdx].pt, K);
        Mat y1 = (Mat_<double>(3,1) << pt1.x, pt1.y, 1);
        Point2d pt2 = pixel2cam(keypoints_2[m.trainIdx].pt, K);
        Mat y2 = (Mat_<double>(3,1) << pt2.x, pt2.y, 1);
        Mat d = y2.t() * t_x * R * y1;
        cout<<"constraint"<<d<<endl;
    }
    return 0;
}

void find_feature_matches(const Mat &img_1, const Mat &img_2,
    std::vector<KeyPoint> &keypoints_1, std::vector<KeyPoint> &keypoints_2,
    std::vector<DMatch> &matches){
    Mat descriptors_1, descriptors_2;
    Ptr<FeatureDetector> detector = ORB::create();
    Ptr<DescriptorExtractor> descriptor=ORB::create();
    Ptr<DescriptorMatcher> matcher=DescriptorMatcher::create("BruteForce-Hamming");
    
    detector->detect(img_1, keypoints_1);
    detector->detect(img_2, keypoints_2);
    descriptor->compute(img_1, keypoints_1, descriptors_1);
    descriptor->compute(img_2, keypoints_2, descriptors_2);
    vector<DMatch> match;
    matcher->match(descriptors_1, descriptors_2, match);
    
    double min_dist = 10000, max_dist = 0;
    for(int i=0; i<descriptors_1.rows; i++){
        double dist = match[i].distance;
        if(dist<min_dist) min_dist=dist;
        if(dist>max_dist) max_dist=dist;
    }
    
    cout<<"max dist:"<<max_dist<<endl;
    cout<<"min dist:"<<min_dist<<endl;
    
    for(int i=0; i<descriptors_1.rows; i++){
        if(match[i].distance <= max(2*min_dist, 30.0)){
            matches.push_back(match[i]);
        }
    }
}

Point2d pixel2cam(const Point2d &p, const Mat &K){
    return Point2d((p.x - K.at<double>(0,2))/K.at<double>(0,0),
        (p.y - K.at<double>(1,2))/K.at<double>(1,1)
    );
}

void pose_estimation_2d2d(std::vector<KeyPoint> keypoints_1,
    std::vector<KeyPoint> keypoints_2, std::vector<DMatch> matches,
    Mat &R, Mat &t){
    Mat K = (Mat_<double>(3,3)<<520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1);
    
    vector<Point2f> points1;
    vector<Point2f> points2;
    
    for(int i=0; i<(int)matches.size(); i++){
        points1.push_back(keypoints_1[matches[i].queryIdx].pt);
        points2.push_back(keypoints_2[matches[i].trainIdx].pt);
    }
    
    Mat fundamental_matrix;
    fundamental_matrix = findFundamentalMat(points1, points2, FM_8POINT);
    cout<< "fundamental_matrix is "<<endl<<fundamental_matrix<<endl;
    
    Point2d principal_point(325.1, 249.7);
    double focal_length = 521;
    Mat essential_matrix;
    essential_matrix = findEssentialMat(points1, points2, focal_length, principal_point);
    cout<< "essential_matrix is"<< endl<< essential_matrix<<endl;
    
    Mat homography_matrix;
    homography_matrix = findHomography(points1, points2, RANSAC, 3);
    cout<<"homography_matrix is "<<endl<<homography_matrix<<endl;
    
    recoverPose(essential_matrix, points1, points2, R, t, focal_length, principal_point);
    cout<<"R is"<<R<<endl;
    cout<<"t is"<<t<<endl;
}

View Code

Mat K = (Mat_<double>(3,3)<<520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1);//相机内参

for(int i=0; i<(int)matches.size(); i++){
    points1.push_back(keypoints_1[matches[i].queryIdx].pt);//matches[i].queryIdx对应keypoints_1数组内特征点index，因为是通过keypoints_1特征点的描述子得到的位置
    points2.push_back(keypoints_2[matches[i].trainIdx].pt);//matches[i].trainIdx对应keypoints_2数组内特征点index，因为是通过keypoints_2特征点的描述子得到的位置
}

Mat fundamental_matrix;
fundamental_matrix = findFundamentalMat(points1, points2, FM_8POINT);
cout<< "fundamental_matrix is "<<endl<<fundamental_matrix<<endl;
    
Point2d principal_point(325.1, 249.7);//相机光心
double focal_length = 521;//相机焦距
Mat essential_matrix;
essential_matrix = findEssentialMat(points1, points2, focal_length, principal_point);
cout<< "essential_matrix is"<< endl<< essential_matrix<<endl;
    
Mat homography_matrix;
homography_matrix = findHomography(points1, points2, RANSAC, 3);
cout<<"homography_matrix is "<<endl<<homography_matrix<<endl;
    
recoverPose(essential_matrix, points1, points2, R, t, focal_length, principal_point);

函数find_feature_matches内的过程就不多赘述了，和前面的过程基本一致，就是换了对应查找描述子以及特征点的类，但都是衍生类（？）。我们可以看一下pose_estimation_2d2d函数里面的内容，里面添加了对应的注释queryIdx，trainIdx在上面的代码中有提到是查询集以及训练集的关系。我们知道基础矩阵和本质矩阵之间相差内参K。函数在OpenCVwiki都可以查得到。http://wiki.opencv.org.cn/index.php/Cv%E7%85%A7%E7%9B%B8%E6%9C%BA%E5%AE%9A%E6%A0%87%E5%92%8C%E4%B8%89%E7%BB%B4%E9%87%8D%E5%BB%BA

在main函数内还有就是验证对极约束的内容。

下面是对应的结果

max dist:94
min dist:4
find matches count:79
fundamental_matrix is
[4.54443750398184e-06, 0.0001333855576992603, -0.01798499246479044;
-0.0001275657012964255, 2.266794804645652e-05, -0.01416678429206633;
0.01814994639971766, 0.004146055870980492, 1]
essential_matrix is
[-0.008455114492964278, 0.05451570701059781, 0.1546375809484052;
-0.008287154708445212, 0.03351311565984172, -0.6896472136971504;
-0.1153993974485718, 0.6945899967012867, 0.02159624094256633]
homography_matrix is
[0.9261214237658335, -0.1445322040802305, 33.26921164265664;
0.04535424230636757, 0.9386696658342905, 8.570980713233848;
-1.006198269424755e-05, -3.008140685985328e-05, 1]
R is[0.9956584940813579, -0.05615340406690447, 0.07423582945816433;
0.05268846331440004, 0.9974645001566195, 0.04783823534446425;
-0.07673388428334535, -0.0437191735855581, 0.9960926386957119]
t is[-0.9726703113454949;
-0.2153829834753195;
0.08673313009645391]
t^R=
[0.01195733758736675, -0.07709685221674556, -0.2186905642298021;
0.01171980658216709, -0.04739470268352609, 0.9753084428633267;
0.1631993929614534, -0.9822985936236425, -0.03054169683725466]

对极约束是从x2=Rx1+t得到的，T是R，t组成的变换矩阵，x2=T21x1

讨论

从E，F，H中都可以分解出运动，不过H要建立在平面上。由于E本身具有尺度等价性，它分解得到的t，R也有尺度等价性。在分解过程中，t乘以任意非零常数，分解是成立的，所以我们要对t进行归一化。

尺度不确定性

因为对t进行归一化，直接导致了单目视觉的尺度不确定性，因为对t乘以任意比例常数之后，对极约束依旧成立，在单目SLAM中，对轨迹以及地图同时缩放任意倍数后，得到的图像是一样的。在单目视觉中，我们对两张图像的t进行归一化后，相当于固定了尺寸，这被称为单目SLAM的初始化，在初始化后，就可以用3d-2d来计算相机运动了，初始化后轨迹和地图的单位就是初始化固定的尺度。因此，单目SLAM有一步不可避免的初始化，初始化的图像一定要有平移，此后的轨迹以及地图都以此步平移为单位。

除了对t初始化，还可以令初始化的特征点平均深度为1，也可以固定一个尺度。把深度归一化可以控制场景的规模大小。

初始化的纯旋转问题

单目初始化不能只有纯旋转，必须要有一定程度的平移。因为从E分解到R，t如果t为零，E也会为零，

三、三角测量

前面我们通过对极几何约束估计相机运动之后。我们需要相机的运动估计特征点的空间位置，在单目相机中我们需要通过一张以上图像来获得相机的深度信息。

三角测量是指，通过两处观察同一点的夹角，来确定该点的距离。同样使用上一节的几何模型，左图为参考，右图的变换矩阵为T。理论上O1p1与O2p2会相交于一点P。但因为噪声的影响，两条直线往往无法相交，又需要最小二乘来求解。根据对极几何里面的定义，x1、x2为两个特征点的归一化坐标，深度对应s1，s2满足：

s1x1=s2Rx2+t

s1x1^x1=0=s2x1^Rx2+x1^t

这样就能求出对应的深度，但因为噪声的存在，我们估计的R，t不一定能使上式等于零，更常见的做法是最小二乘而不是零解。

实践

#include<iostream>
#include<opencv2/opencv.hpp>
#include<opencv2/imgcodecs/legacy/constants_c.h>
using namespace std;
using namespace cv;

void find_feature_matches(const Mat &img_1, const Mat &img_2,
    std::vector<KeyPoint> &keypoints_1, std::vector<KeyPoint> &keypoint_2,
    std::vector<DMatch> &matches
);

void pose_estimation_2d2d(
    const std::vector<KeyPoint> &keypoints_1, const std::vector<KeyPoint> &keypoints_2,
    std::vector<DMatch> &matches, Mat &R, Mat &t
);

void triangulation(
    const vector<KeyPoint> &keypoint_1, const vector<KeyPoint> &keypoints_2,
    const std::vector<DMatch> &matches, const Mat &R, const Mat &t, vector<Point3d> &points
);

inline cv::Scalar get_color(float depth){
    float up_th = 50, low_th = 10, th_range = up_th - low_th;
    if(depth > up_th) depth = up_th;
    if(depth < low_th) depth = low_th;
    return cv::Scalar(255 * depth / th_range, 0, 255 * (1-depth/th_range));
}

Point2f pixel2cam(const Point2d &p, const Mat &K);

int main(int argc, char **argv){
    if(argc!=3){
        cout<<"img1 img2"<<endl;
        return 1;
    }
    
    Mat img_1 = imread(argv[1], CV_LOAD_IMAGE_COLOR);
    Mat img_2 = imread(argv[2], CV_LOAD_IMAGE_COLOR);
    
    vector<KeyPoint> keypoints_1, keypoints_2;
    vector<DMatch> matches;
    find_feature_matches(img_1, img_2, keypoints_1, keypoints_2, matches);
    cout<<matches.size()<<endl;
    
    Mat R, t;
    pose_estimation_2d2d(keypoints_1, keypoints_2, matches, R, t);
    
    vector<Point3d> points;
    triangulation(keypoints_1, keypoints_2, matches, R, t, points);
    
    Mat K = (Mat_<double>(3,3)<<520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1);
    Mat img1_plot = img_1.clone();
    Mat img2_plot = img_2.clone();
    for(int i=0; i<matches.size(); i++){
        float depth1= points[i].z;
        cout<<"depth:"<<depth1<<endl;
        Point2d pt1_cam = pixel2cam(keypoints_1[matches[i].queryIdx].pt, K);
        cv::circle(img1_plot, keypoints_1[matches[i].queryIdx].pt, 2, get_color(depth1), 2);

        Mat pt2_trans=R*(Mat_<double>(3,1)<<points[i].x, points[i].y, points[i].z) + t;
        float depth2= pt2_trans.at<double>(2,0);
        cv::circle(img2_plot, keypoints_2[matches[i].trainIdx].pt, 2, get_color(depth2), 2);
    }
    cv::imshow("img 1", img1_plot);
    cv::imshow("img 2", img2_plot);
    cv::waitKey();
    return 0;
}

void find_feature_matches(const Mat &img_1, const Mat &img_2,
    std::vector<KeyPoint> &keypoints_1, std::vector<KeyPoint> &keypoints_2,
    std::vector<DMatch> &matches
){
    Mat descriptors_1, descriptors_2;
    Ptr<FeatureDetector> detector = ORB::create();
    Ptr<DescriptorExtractor> descriptor = ORB::create();
    Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("BruteForce-Hamming");
    
    detector->detect(img_1, keypoints_1);
    detector->detect(img_2, keypoints_2);
    
    descriptor->compute(img_1, keypoints_1, descriptors_1);
    descriptor->compute(img_2, keypoints_2, descriptors_2);
    vector<DMatch> match;
    matcher->match(descriptors_1, descriptors_2, match);
    
    double min_dist = 10000, max_dist = 0;
    for(int i=0; i<descriptors_1.rows; i++){
        double dist = match[i].distance;
        if(dist<min_dist) min_dist=dist;
        if(dist>max_dist) max_dist=dist;
    }
    
    cout<<"max_dist :"<<max_dist<<endl;
    cout<<"min_dist :"<<min_dist<<endl;
    
    for(int i=0; i<descriptors_1.rows; i++){
        if(match[i].distance<=max(2*min_dist, 30.0)){
            matches.push_back(match[i]);
        }    
    }   
}

void pose_estimation_2d2d(
    const std::vector<KeyPoint> &keypoints_1, const std::vector<KeyPoint> &keypoints_2,
    std::vector<DMatch> &matches, Mat &R, Mat &t
){
    Mat K = (Mat_<double>(3,3)<<520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1);
    
    vector<Point2f> points1;
    vector<Point2f> points2;
    
    for(int i=0; i<(int)matches.size(); i++){
        points1.push_back(keypoints_1[matches[i].queryIdx].pt);
        points2.push_back(keypoints_2[matches[i].trainIdx].pt);
    }
    
    Point2d principal_point(325.1,249.7);
    int focal_length = 521;
    Mat essential_matrix;
    essential_matrix = findEssentialMat(points1, points2, focal_length, principal_point);
    recoverPose(essential_matrix, points1, points2, R, t, focal_length, principal_point);
}

void triangulation(
    const vector<KeyPoint> &keypoint_1, const vector<KeyPoint> &keypoint_2,
    const std::vector<DMatch> &matches, const Mat &R, const Mat &t, vector<Point3d> &points
){
    Mat T1 = (Mat_<float>(3,4)<<1,0,0,0,
              0,1,0,0,
              0,0,1,0
             );
    Mat T2 = (Mat_<float>(3,4)<< R.at<double>(0,0), R.at<double>(0,1), R.at<double>(0,2), R.at<double>(0,0),
        R.at<double>(1,0), R.at<double>(1,1), R.at<double>(1,2), R.at<double>(1,0),
        R.at<double>(2,0), R.at<double>(2,1), R.at<double>(2,2), R.at<double>(2,0)
    );
    
    Mat K = (Mat_<double>(3,3)<<520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1);
    vector<Point2f> pts_1, pts_2;
    for(DMatch m:matches){
        pts_1.push_back(pixel2cam(keypoint_1[m.queryIdx].pt, K));
        pts_2.push_back(pixel2cam(keypoint_2[m.trainIdx].pt, K));
    }
    
    Mat pts_4d;
    cv::triangulatePoints(T1, T2, pts_1, pts_2, pts_4d);
    
    for(int i=0; i<pts_4d.cols; i++){
        Mat x = pts_4d.col(i);
        x/= x.at<float>(3,0);
        Point3d p(x.at<float>(0,0), x.at<float>(1,0), x.at<float>(2,0));
        points.push_back(p);
    }
}

Point2f pixel2cam(const Point2d &p, const Mat &K){
    return Point2f((p.x-K.at<double>(0,2))/K.at<double>(0,0),
        (p.y-K.at<double>(1,2))/K.at<double>(1,1)
    );
}

View Code

cv::triangulatePoints(T1, T2, pts_1, pts_2, pts_4d);
    
for(int i=0; i<pts_4d.cols; i++){
    Mat x = pts_4d.col(i);
    x/= x.at<float>(3,0);
    Point3d p(x.at<float>(0,0), x.at<float>(1,0), x.at<float>(2,0));
    points.push_back(p);
}

inline cv::Scalar get_color(float depth,float up_th = 50, float low_th=10){
    float th_range = up_th - low_th;
    depth-=low_th;
    //if(depth > up_th) depth = up_th;
    //if(depth < low_th) depth = low_th;
    return cv::Scalar(255 * depth / th_range, 0, 255 *(1- depth / th_range));
}

这部分是三角测量部分，以及对得到特征点进行归一化处理。在main代码中使用颜色来表示深度，我对代码显示颜色稍作修改，映射到最大值和最小值区间内比较明显，可以得到以下结果：

讨论

三角测量是由平移得到的，有平移才有三角测量，当平移很小的时候，像素上的不确定性将导致较大的深度不确定性，也就是说，如果特征点运动一个像素,使得视角移动了一个角度，那么观测到深度值就有δd的变化，从上图来看，当t比较大的时候，δd将有明显变化，相同的相机分辨率下，三角化测量更加精确。因此要增加三角的精度，其一提高特征点的提取精度（分辨率），其二就是增大平移量，但又会导致图像外观发生明显变化。但只要我们愿意，有也能够定量的计算每个特征点和位置的不确定性。所以，如果假设特征点服从高斯分布，我们就能够期望他的方差不断减小乃至收敛，这就得到了一个深度滤波器。

三、3D-2D PnP

这是求解3D到2D点运动的方法，描述了当我们知道n个3D点的空间点以及他的投影位置时，如何估计相机所在的位姿。然而，如果两张图像中，其中一张的特征点已知，那么至少需要三个点对（需要一个附加点来验证）就能估计相机运动。特征点的3D位置可以通过三角化，或者RGB-D的深度图来确定。因此，在双目或RGB-D中我们能够直接通过PnP估计相机运动。在单目相机中则需要先初始化，才能使用。3D-2D方法不需要对极约束。

PnP有多种求解方法，P3P、DLT、EPnP、UPnP等。此外还可以使用非线性优化的方法，构建最小二乘并迭代求解，就是万金油的Bundle Adjustment。

1.直接线性变换

//空

2.P3P

输入数据为三对3D-2D匹配点，记A、B、C，2D为a，b，c，其中小写字母表示的点是大写字母在相机成像平面的投影。此外P3P还需要一对验证点D-d，相机光心为O。ABC是世界坐标的坐标。

式子内三个余弦值已知，且u=BC²/AB²，w=AC²/AB²在世界坐标的系计算出来的比例，变换到相机坐标系中不会改变，此式中x，y是未知的，会根据相机的移动发生变化。解析的求解该方程组是一个复杂的过程。需要使用吴消元法。类似与求解E的过程，有四个解，我们可以通过验证点，得到最有可能的的解，得到ABC在相机坐标系的3D坐标，再通过3D-3D的对点计算相机的运动R，t。但P3P存在一些问题：

1.P3P只利用三个点的信息，当信息多余三个点的时候，能以提供更多信息。

2.如果3D，2D点受噪声影响，或误匹配，则算法失败。

在SLAM中优先使用P3P/EPnP等方法估计相机位姿，然后构建最小二乘优化问题对估计值进行调整。

3.Bundle Adjustment

//空

实践

#include<iostream>
#include<opencv2/core/core.hpp>
#include<opencv2/features2d/features2d.hpp>
#include<opencv2/highgui/highgui.hpp>
#include<opencv2/calib3d/calib3d.hpp>
#include<opencv2/imgcodecs/legacy/constants_c.h>
#include<Eigen/Core>
#include<g2o/core/base_vertex.h>
#include<g2o/core/base_unary_edge.h>
#include<g2o/core/sparse_optimizer.h>
#include<g2o/core/block_solver.h>
#include<g2o/core/solver.h>
#include<g2o/core/optimization_algorithm_gauss_newton.h>
#include<g2o/solvers/dense/linear_solver_dense.h>
#include<sophus/se3.hpp>
#include<chrono>

using namespace std;
using namespace cv;

void find_feature_matches(
    const Mat &img_1, const Mat &img_2,
    std::vector<KeyPoint> &keypoints_1,
    std::vector<KeyPoint> &keypoints_2,
    std::vector<DMatch> &matches);

Point2d pixel2cam(const Point2d &p, const Mat &K);

typedef vector<Eigen::Vector2d, Eigen::aligned_allocator<Eigen::Vector2d>> VecVector2d;
typedef vector<Eigen::Vector3d, Eigen::aligned_allocator<Eigen::Vector3d>> VecVector3d;

void bundleAdjustmentG2O(const VecVector3d &points_3d,
    const VecVector2d &points_2d, const Mat &K, 
    Sophus::SE3d &pose);

void bundleAdjustmentGaussNewton(const VecVector3d &points_3d, const VecVector2d &points_2d, const Mat &K,
    Sophus::SE3d &pose);

int main(int argc, char **argv){
    if(argc!=5){
        cout<<"img_1, img_2, depth1, depth2"<<endl;
        return 1;
    }
    Mat img_1 = imread(argv[1], CV_LOAD_IMAGE_COLOR);
    Mat img_2 = imread(argv[2], CV_LOAD_IMAGE_COLOR);
    assert(img_1.data && img_2.data && "Can not load images!");
    
    vector<KeyPoint> keypoints_1, keypoints_2;
    vector<DMatch> matches;
    find_feature_matches(img_1, img_2, keypoints_1, keypoints_2, matches);
    cout<<"matches"<<matches.size()<<endl;
    
    Mat d1 = imread(argv[3], CV_LOAD_IMAGE_UNCHANGED);
    Mat K = (Mat_<double>(3,3)<<520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1);
    vector<Point3f> pts_3d;
    vector<Point2f> pts_2d;
    for(DMatch m:matches){
        ushort d = d1.ptr<unsigned short>(int(keypoints_1[m.queryIdx].pt.y))[int(keypoints_1[m.queryIdx].pt.x)];
        if(d==0) continue;
        float dd=d/5000.0;
        Point2d p1=pixel2cam(keypoints_1[m.queryIdx].pt, K);
        pts_3d.push_back(Point3f(p1.x*dd, p1.y * dd, dd));
        pts_2d.push_back(keypoints_2[m.trainIdx].pt);
    }
    
    cout<<"3d-2d pair:"<<pts_3d.size()<<endl;
    
    chrono::steady_clock::time_point t1 = chrono::steady_clock::now();
    Mat r, t;
    solvePnP(pts_3d, pts_2d, K, Mat(), r, t, false);
    Mat R;
    cv::Rodrigues(r,R);
    chrono::steady_clock::time_point t2 = chrono::steady_clock::now();
    chrono::duration<double> time_used = chrono::duration_cast<chrono::duration<double>>(t2-t1);
    cout<<"pnp in opencv time:"<<time_used.count()<<endl;
    cout<<"R="<<endl<<R<<endl;
    cout<<"t="<<endl<<t<<endl;
    
    VecVector3d pts_3d_eigen;
    VecVector2d pts_2d_eigen;
    for(size_t i=0; i<pts_3d.size(); ++i){
        pts_3d_eigen.push_back(Eigen::Vector3d(pts_3d[i].x, pts_3d[i].y, pts_3d[i].z));
        pts_2d_eigen.push_back(Eigen::Vector2d(pts_2d[i].x, pts_2d[i].y));
    }
    
    cout<<"call BA by gauss newton "<<endl;
    
    Sophus::SE3d pose_gn;
    t1 = chrono::steady_clock::now();
    bundleAdjustmentGaussNewton(pts_3d_eigen, pts_2d_eigen, K, pose_gn);
    t2 = chrono::steady_clock::now();
    time_used = chrono::duration_cast<chrono::duration<double>>(t2-t1);
    cout<<"pnp by gauss newton time:"<<time_used.count()<<endl;
    
    cout<<"call BA by g2o";
    Sophus::SE3d pose_g2o;
    t1=chrono::steady_clock::now();
    bundleAdjustmentG2O(pts_3d_eigen, pts_2d_eigen, K, pose_g2o);
    t2=chrono::steady_clock::now();
    time_used = chrono::duration_cast<chrono::duration<double>>(t2-t1);
    cout<<"pnp by g2o time:"<<time_used.count()<<endl;
    return 0;
}

void find_feature_matches(const cv::Mat& img_1, const cv::Mat& img_2, std::vector<KeyPoint>& keypoints_1, std::vector<KeyPoint>& keypoints_2, std::vector<DMatch>& matches)
{
    Mat descriptors_1, descriptors_2;
    Ptr<FeatureDetector> detector = ORB::create();
    Ptr<DescriptorExtractor> descriptor = ORB::create();
    Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("BruteForce-Hamming");
    detector->detect(img_1, keypoints_1);
    detector->detect(img_2, keypoints_2);
    descriptor->compute(img_1, keypoints_1, descriptors_1);
    descriptor->compute(img_2, keypoints_2, descriptors_2);
    
    vector<DMatch> match;
    matcher->match(descriptors_1, descriptors_2, match);
    double min_dist= 10000, max_dist = 0;
    for(int i=0; i<descriptors_1.rows; i++){
        double dist = match[i].distance;
        if(dist<min_dist) min_dist=dist;
        if(dist>max_dist) max_dist=dist;
    }
    
    cout<<"max_dist"<<max_dist<<endl;
    cout<<"min_dist"<<min_dist<<endl;
    
    for(int i=0; i<descriptors_1.rows; i++){
        if(match[i].distance <= max(2 * min_dist, 30.0)){
            matches.push_back(match[i]);
        }
    }
}

Point2d pixel2cam(const cv::Point2d& p, const cv::Mat& K)
{
    return Point2d(
        (p.x-K.at<double>(0,2))/K.at<double>(0,0),
        (p.y-K.at<double>(1,2))/K.at<double>(1,1)
    );
}

void bundleAdjustmentGaussNewton(const VecVector3d& points_3d, const VecVector2d& points_2d, const cv::Mat& K, Sophus::SE3d& pose)
{
    typedef Eigen::Matrix<double, 6, 1> Vector6d;
    const int iterations = 10;
    double cost = 0, lastCost = 0;
    double fx = K.at<double>(0,0);
    double fy = K.at<double>(1,1);
    double cx = K.at<double>(0,2);
    double cy = K.at<double>(1,2);
    
    for(int iter=0; iter<iterations; iter++){
        Eigen::Matrix<double, 6, 6> H = Eigen::Matrix<double, 6, 6>::Zero();
        Vector6d b = Vector6d::Zero();
        
        cost = 0;
        for(int i=0; i<points_3d.size(); i++){
            Eigen::Vector3d pc = pose * points_3d[i];
            double inv_z = 1.0/pc[2];
            double inv_z2 = inv_z * inv_z;
            Eigen::Vector2d proj(fx * pc[0]/pc[2] + cx, fy * pc[1]/pc[2]+cy);
            Eigen::Vector2d e = points_2d[i] - proj;
            cost += e.squaredNorm();
            Eigen::Matrix<double, 2, 6> J;
            J<<-fx * inv_z,
               0,
               fx*pc[0]*inv_z2,
               fx * pc[0]*pc[1]*inv_z2,
               -fx - fx * pc[0]*pc[0] * inv_z2,
               fx * pc[1] * inv_z,
               0,
               -fy * inv_z,
               fy * pc[1] *inv_z2,
               fy + fy *pc[1]*pc[1]*inv_z2,
               -fy * pc[0]*pc[1]*inv_z2,
               -fy * pc[0] * inv_z;
            
            H+=J.transpose() * J;
            b+=-J.transpose() * e;
        }
        Vector6d dx;
        dx = H.ldlt().solve(b);
        
        if(isnan(dx[0])){
            cout<<"result is nan"<<endl;
            break;
        }
        
        if(iter>0 && cost>=lastCost){
            cout<<"cost:"<<cost<<" lastCost:"<<lastCost<<endl;
            break;
        }
        
        pose = Sophus::SE3d::exp(dx) * pose;
        lastCost = cost;
        
        cout<<"iterations"<<iter<<" cost="<<std::setprecision(12)<<cost<<endl;
        
        if(dx.norm() < 1e-6){
            break;
        }
    }
    
    cout<<"pose by g-n"<<endl<<pose.matrix()<<endl;
}

class VertexPose : public g2o::BaseVertex<6, Sophus::SE3d>{
public:
    EIGEN_MAKE_ALIGNED_OPERATOR_NEW;
    
    virtual void setToOriginImpl() override{
        _estimate = Sophus::SE3d();
    }
    
    virtual void oplusImpl(const double *update) override{
        Eigen::Matrix<double, 6, 1> update_eigen;
        update_eigen<<update[0], update[1], update[2], update[3], update[4], update[5];
        _estimate = Sophus::SE3d::exp(update_eigen) * _estimate;
    }
    
    virtual bool read(std::istream & in) override {}
    virtual bool write(std::ostream & out) const override{}
};

class EdgeProjection : public g2o::BaseUnaryEdge<2, Eigen::Vector2d, VertexPose> {
public:
    EIGEN_MAKE_ALIGNED_OPERATOR_NEW;
    
    EdgeProjection(const Eigen::Vector3d &pos, const Eigen::Matrix3d &K) : _pos3d(pos), _K(K) {}
    
    virtual void computeError() override{
        const VertexPose *v = static_cast<VertexPose *> (_vertices[0]);
        Sophus::SE3d T = v->estimate();
        Sophus::Vector3d pos_pixel = _K * (T * _pos3d);
        pos_pixel /= pos_pixel[2];
        _error = _measurement - pos_pixel.head<2>();
    }
    
    virtual void linearizeOplus() override{
        const VertexPose *v = static_cast<VertexPose *> (_vertices[0]);
        Sophus::SE3d T = v->estimate();
        Eigen::Vector3d pos_cam = T * _pos3d;
        double fx = _K(0,0);
        double fy = _K(1,1);
        double cx = _K(0,2);
        double cy = _K(1,2);
        double X = pos_cam[0];
        double Y = pos_cam[1];
        double Z = pos_cam[2];
        double Z2 = Z*Z;
        _jacobianOplusXi << -fx/Z, 0, fx*X/Z2, fx*X*Y/Z2, -fx - fx * X * X/Z2, fx * Y/Z, 0, -fy / Z, fy*Y/(Z * Z), fy + fy * Y * Y/Z2  ,-fy*X*Y/Z2, -fy*X/Z;
    }
    
    virtual bool read(istream &in) override {}
    virtual bool write(std::ostream & os) const override {}

private:
    Eigen::Vector3d _pos3d;
    Eigen::Matrix3d _K;
};

void bundleAdjustmentG2O(const VecVector3d& points_3d, const VecVector2d& points_2d, const cv::Mat& K, Sophus::SE3d& pose)
{
    typedef g2o::BlockSolver<g2o::BlockSolverTraits<6,3>> BlockSolverType;
    typedef g2o::LinearSolverDense<BlockSolverType::PoseMatrixType> LinearSolverType;
    auto solver = new g2o::OptimizationAlgorithmGaussNewton(g2o::make_unique<BlockSolverType>(g2o::make_unique<LinearSolverType>()));
    g2o::SparseOptimizer optimizer;
    optimizer.setAlgorithm(solver);
    optimizer.setVerbose(true);
    
    VertexPose *vertex_pose = new VertexPose();
    vertex_pose->setId(0);
    vertex_pose->setEstimate(Sophus::SE3d());
    optimizer.addVertex(vertex_pose);
    
    Eigen::Matrix3d K_eigen;
    K_eigen<<K.at<double>(0,0), K.at<double>(0,1), K.at<double>(0,2), K.at<double>(1,0), K.at<double>(1,1), K.at<double>(1,2), K.at<double>(2,0), K.at<double>(2,1), K.at<double>(2,2);
    
    int index = 1;
    for(size_t i = 0;i<points_2d.size(); ++i){
        auto p2d = points_2d[i];
        auto p3d = points_3d[i];
        EdgeProjection *edge = new EdgeProjection(p3d, K_eigen);
        edge->setId(index);
        edge->setVertex(0, vertex_pose);
        edge->setMeasurement(p2d);
        edge->setInformation(Eigen::Matrix2d::Identity());
        optimizer.addEdge(edge);
        index++;
    }
    
    chrono::steady_clock::time_point t1 = chrono::steady_clock::now();
    optimizer.setVerbose(true);
    optimizer.initializeOptimization();
    optimizer.optimize(10);
    chrono::steady_clock::time_point t2 = chrono::steady_clock::now();
    chrono::duration<double> time_used = chrono::duration_cast<chrono::duration<double>>(t2-t1);
    cout<<"cost time:"<<time_used.count()<<endl;
    cout<<"pose by g2o=\n"<<vertex_pose->estimate().matrix()<<endl;
    pose=vertex_pose->estimate();
}

View Code

    Mat d1 = imread(argv[3], CV_LOAD_IMAGE_UNCHANGED);
    Mat K = (Mat_<double>(3,3)<<520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1);
    vector<Point3f> pts_3d;
    vector<Point2f> pts_2d;
    for(DMatch m:matches){
        ushort d = d1.ptr<unsigned short>(int(keypoints_1[m.queryIdx].pt.y))[int(keypoints_1[m.queryIdx].pt.x)];
        if(d==0) continue;
        float dd=d/5000.0;
        Point2d p1=pixel2cam(keypoints_1[m.queryIdx].pt, K);
        pts_3d.push_back(Point3f(p1.x*dd, p1.y * dd, dd));
        pts_2d.push_back(keypoints_2[m.trainIdx].pt);
    }
    
    Mat r, t;
    solvePnP(pts_3d, pts_2d, K, Mat(), r, t, false);
    Mat R;
    cv::Rodrigues(r,R);

Point2d pixel2cam(const cv::Point2d& p, const cv::Mat& K)
{
    return Point2d(
        (p.x-K.at<double>(0,2))/K.at<double>(0,0),
        (p.y-K.at<double>(1,2))/K.at<double>(1,1)
    );
}

我们可以先看pixel2cam这个函数，这个是将像素坐标系，转化到图像坐标系。[u,v,1]^T=K[x,y,1]^T.对应的u=xfu + uo V=yfv + Vo。然后从图像坐标系转到世界坐标需要乘以深度Z。然后使用solvePnP函数得到旋转向量以及平移向量。通过Rodrigues函数转成矩阵。

max_dist94
min_dist4
matches79
3d-2d pair:75
pnp in opencv time:0.0300834
R=
[0.9979059095501289, -0.05091940089111062, 0.03988747043647115;
0.04981866254254162, 0.9983623157438141, 0.02812094175381178;
-0.04125404886071617, -0.02607491352889358, 0.9988083912027663]
t=
[-0.1267821389556796;
-0.008439496817594587;
0.06034935748886031]

可以与2d2d的结果进行比较，可以发现差距不大。

代码中同样使用了非线性优化，L-M法以及G-N。后面有需要我会再详细写。

四、3D-3D：ICP

假设我们有一组配对好的3D点。我们需要找到一个欧氏变换R，t，使得：

这个问题可以用迭代最近点（ICP），在3D-3D相机位姿估计中没有出现相机模型，仅考虑两组3D点之间的变换。认为距离最近的两个点为一个就是迭代最近点。

1.SVD

2.非线性优化

实践

#include<iostream>
#include<opencv2/core/core.hpp>
#include<opencv2/features2d/features2d.hpp>
#include<opencv2/highgui/highgui.hpp>
#include<opencv2/calib3d/calib3d.hpp>
#include<opencv2/imgcodecs/legacy/constants_c.h>
#include<Eigen/Core>
#include<Eigen/Dense>
#include<Eigen/Geometry>
#include<Eigen/SVD>
#include<g2o/core/base_vertex.h>
#include<g2o/core/base_unary_edge.h>
#include<g2o/core/block_solver.h>
#include<g2o/core/optimization_algorithm_gauss_newton.h>
#include<g2o/core/optimization_algorithm_levenberg.h>
#include<g2o/solvers/dense/linear_solver_dense.h>
#include<chrono>
#include<sophus/se3.hpp>
using namespace std;
using namespace cv;

void find_feature_matches(
    const Mat &img_1, const Mat &img_2,
    std::vector<KeyPoint> &keypoints_1,
    std::vector<KeyPoint> &keypoints_2,
    std::vector<DMatch> &matches);

Point2d pixel2cam(const Point2d &p, const Mat &K);

void pose_estimation_3d3d(
    const vector<Point3f> &pts1,
    const vector<Point3f> &pts2,
    Mat &R, Mat &t);

void bundleAdjustment(
    const vector<Point3f> &points_3d,
    const vector<Point3f> &points_2d,
    Mat &R, Mat &t);

class VertexPose: public g2o::BaseVertex<6, Sophus::SE3d>{
public:
    EIGEN_MAKE_ALIGNED_OPERATOR_NEW;
    
    virtual void setToOriginImpl() override{
        _estimate = Sophus::SE3d();
    }
    
    virtual void oplusImpl(const double *update) override{
        Eigen::Matrix<double, 6, 1> update_eigen;
        update_eigen<<update[0], update[1], update[2], update[3], update[4], update[5];
        _estimate = Sophus::SE3d::exp(update_eigen) * _estimate;
    }
    
    virtual bool read(istream & in) override{}
    virtual bool write(ostream & out) const override{}
};

class EdgeProjectXYZRGBDPoseOnly : public g2o::BaseUnaryEdge<3, Eigen::Vector3d, VertexPose>{
public:
    EIGEN_MAKE_ALIGNED_OPERATOR_NEW;
    
    EdgeProjectXYZRGBDPoseOnly(const Eigen::Vector3d &point):_point(point) {}
    
    virtual void computeError() override{
        const VertexPose *pose = static_cast<const VertexPose *>(_vertices[0]);
        _error = _measurement - pose->estimate() * _point;
    }
    
    virtual void linearizeOplus() override{
        VertexPose *pose = static_cast<VertexPose *>(_vertices[0]);
        Sophus::SE3d T = pose->estimate();
        Eigen::Vector3d xyz_trans = T * _point;
        _jacobianOplusXi.block<3,3>(0,0) = -Eigen::Matrix3d::Identity();
        _jacobianOplusXi.block<3,3>(0,3) = Sophus::SO3d::hat(xyz_trans);
    }
    
    bool read(std::istream &in) {}
    bool write(std::ostream & out) const {}
protected:
    Eigen::Vector3d _point;
};

int main(int argc, char **argv){
    if(argc != 5){
        cout<<"img1 img2 depth1 depth2"<<endl;
        return 1;
    }
    
    Mat img_1 = imread(argv[1],CV_LOAD_IMAGE_COLOR);
    Mat img_2 = imread(argv[2], CV_LOAD_IMAGE_COLOR);
    vector<KeyPoint> keypoints_1, keypoints_2;
    vector<DMatch> matches;
    find_feature_matches(img_1, img_2, keypoints_1, keypoints_2, matches);
    cout<<"matchs:"<<matches.size()<<endl;
    Mat depth1 = imread(argv[3], CV_LOAD_IMAGE_UNCHANGED);
    Mat depth2 = imread(argv[4], CV_LOAD_IMAGE_UNCHANGED);
    Mat K = (Mat_<double>(3,3)<<520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1);
    vector<Point3f> pts1, pts2;
    
    for(DMatch m:matches){
        ushort d1 = depth1.ptr<unsigned short>(int(keypoints_1[m.queryIdx].pt.y))[int(keypoints_1[m.queryIdx].pt.x)];
        ushort d2 = depth2.ptr<unsigned short>(int(keypoints_2[m.trainIdx].pt.y))[int(keypoints_2[m.trainIdx].pt.x)];
        if(d1==0||d2==0) continue;
        Point2d p1 = pixel2cam(keypoints_1[m.queryIdx].pt, K);
        Point2d p2 = pixel2cam(keypoints_2[m.trainIdx].pt, K);
        float dd1 = float(d1)/5000.0;
        float dd2 = float(d2)/5000.0;
        pts1.push_back(Point3f(p1.x * dd1, p1.y * dd1, dd1));
        pts2.push_back(Point3f(p2.x * dd2, p2.y * dd2, dd2));
    }
    cout<<"3d-3d pairs: "<<pts1.size()<<endl;
    Mat R,t;
    pose_estimation_3d3d(pts1, pts2, R, t);
    cout<<"ICP via SVD result: "<<endl;
    cout<<"R = "<<R<<endl;
    cout<<"t = "<<t<<endl;
    cout<<"R_inv = "<<R.t()<<endl;
    cout<<"t_inv = "<<-R.t() * t<<endl;
    
    cout<<"calling bundle adjustment"<<endl;
    
    bundleAdjustment(pts1, pts2, R, t);
    for(int i=0; i<5; i++){
        cout<<"p1="<<pts1[i]<<endl;
        cout<<"p2="<<pts2[i]<<endl;
        cout<<"(R*p2+t)="<<R*(Mat_<double>(3,1)<<pts2[i].x, pts2[i].y, pts2[i].z) + t<<endl<<endl;
    }
}

void find_feature_matches(const cv::Mat& img_1, const cv::Mat& img_2, std::vector<KeyPoint>& keypoints_1, std::vector<KeyPoint>& keypoints_2, std::vector<DMatch>& matches)
{
    Mat descriptors_1,descriptors_2;
    Ptr<FeatureDetector> detector = ORB::create();
    Ptr<DescriptorExtractor> descriptor = ORB::create();
    Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("BruteForce-Hamming");
    detector->detect(img_1, keypoints_1);
    detector->detect(img_2, keypoints_2);
    descriptor->compute(img_1, keypoints_1, descriptors_1);
    descriptor->compute(img_2, keypoints_2, descriptors_2);
    vector<DMatch> match;
    matcher->match(descriptors_1, descriptors_2, match);
    double min_dist = 10000, max_dist = 0;
    for(int i=0;i<descriptors_1.rows; i++){
        double dist = match[i].distance;
        if(dist>max_dist) max_dist=dist;
        if(dist<min_dist) min_dist=dist;
    }
    cout<<"max_dist = "<<max_dist<<endl;
    cout<<"min_dist = "<<min_dist<<endl;
    
    for(int i=0; i<descriptors_1.rows; i++){
        if(match[i].distance<=max(2*min_dist, 30.0)){
            matches.push_back(match[i]);
        }
    }
}

cv::Point2d pixel2cam(const cv::Point2d& p, const cv::Mat& K)
{
    return Point2d((p.x - K.at<double>(0,2))/K.at<double>(0,0),
        (p.y-K.at<double>(1,2))/K.at<double>(1,1)
    );
}

void pose_estimation_3d3d(const vector<cv::Point3f>& pts1, const vector<cv::Point3f>& pts2, cv::Mat& R, cv::Mat& t)
{
    Point3f p1, p2;
    int N = pts1.size();
    for(int i=0; i<N; i++){
        p1+=pts1[i];
        p2+=pts2[i];
    }
    p1=Point3f(Vec3f(p1)/N);
    p2=Point3f(Vec3f(p2)/N);
    vector<Point3f> q1(N), q2(N);
    for(int i=0; i<N; i++){
        q1[i] = pts1[i] - p1;
        q2[i] = pts2[i] - p2;
    }
    
    Eigen::Matrix3d W = Eigen::Matrix3d::Zero();
    for(int i=0;i<N;i++){
        W+=Eigen::Vector3d(q1[i].x, q1[i].y, q1[i].z) * Eigen::Vector3d(q2[i].x, q2[i].y, q2[i].z).transpose();
    }
    cout<<"W="<<W<<endl;
    
    //SVD
    Eigen::JacobiSVD<Eigen::Matrix3d> svd(W, Eigen::ComputeFullU | Eigen::ComputeFullV);
    Eigen::Matrix3d U = svd.matrixU();
    Eigen::Matrix3d V = svd.matrixV();
    cout<<"U="<<U<<endl;
    cout<<"V="<<V<<endl;
    Eigen::Matrix3d R_ = U * (V.transpose());
    if(R_.determinant()<0){
        R_ = -R_;
    }
    Eigen::Vector3d t_ = Eigen::Vector3d(p1.x, p1.y, p1.z) - R_ * Eigen::Vector3d(p2.x, p2.y, p2.z);
    
    R = (Mat_<double>(3,3)<< R_(0,0), R_(0,1), R_(0,2),
        R_(1,0), R_(1,1), R_(1,2),
        R_(2,0), R_(2,1), R_(2,2)
    );
    t=(Mat_<double>(3,1)<<t_(0,0), t_(1,0), t_(2,0));
}

void bundleAdjustment(const vector<cv::Point3f>& pts1, const vector<cv::Point3f>& pts2, cv::Mat& R, cv::Mat& t)
{
    typedef g2o::BlockSolverX BlockSolverType;
    typedef g2o::LinearSolverDense<BlockSolverType::PoseMatrixType> LinearSolverType;
    auto solver = new g2o::OptimizationAlgorithmLevenberg(g2o::make_unique<BlockSolverType>(g2o::make_unique<LinearSolverType>()));
    g2o::SparseOptimizer optimizer;
    optimizer.setAlgorithm(solver);
    optimizer.setVerbose(true);
    
    VertexPose *pose = new VertexPose();
    pose->setId(0);
    pose->setEstimate(Sophus::SE3d());
    optimizer.addVertex(pose);
    
    for(size_t i=0; i<pts1.size(); i++){
        EdgeProjectXYZRGBDPoseOnly *edge = new EdgeProjectXYZRGBDPoseOnly(Eigen::Vector3d(pts2[i].x, pts2[i].y, pts2[i].z));
        edge->setVertex(0, pose);
        edge->setMeasurement(Eigen::Vector3d(pts1[i].x, pts1[i].y, pts1[i].z));
        edge->setInformation(Eigen::Matrix3d::Identity());
        optimizer.addEdge(edge);
    }
    
    chrono::steady_clock::time_point t1 = chrono::steady_clock::now();
    optimizer.initializeOptimization();
    optimizer.optimize(10);
    chrono::steady_clock::time_point t2 = chrono::steady_clock::now();
    chrono::duration<double> time_used = chrono::duration_cast<chrono::duration<double>>(t2-t1);
    cout<<"optimization time :"<<time_used.count()<<endl;
    
    Eigen::Matrix3d R_ = pose->estimate().rotationMatrix();
    Eigen::Vector3d t_ = pose->estimate().translation();
    R=(Mat_<double>(3,3) << 
        R_(0,0), R_(0,1), R_(0,2),
        R_(1,0), R_(1,1), R_(1,2),
        R_(2,0), R_(2,1), R_(2,2)
    );
    
    t=(Mat_<double>(3,1) << t_(0,0), t_(1,0), t_(2,0));
}

View Code

下面是对应的结果，ICP问题是从第二步推回第一步。这里作了逆变换。

max_dist = 94
min_dist = 4
matchs:79
3d-3d pairs: 72
W= 10.871 -1.01948 2.54771
-2.16033 3.85307 -5.77742
3.94738 -5.79979 9.62203
U= 0.558087 -0.829399 -0.0252034
-0.428009 -0.313755   0.847565
0.710878   0.462228   0.530093
V= 0.617887 -0.784771 -0.0484806
-0.399894 -0.366747   0.839989
0.676979   0.499631   0.540434
ICP via SVD result:
R = [0.9969452351705235, 0.0598334759429696, -0.05020112774999549;
-0.05932607556034211, 0.9981719680327525, 0.01153858709846634;
0.05079975225724825, -0.008525103530306, 0.9986724727258676]
t = [0.1441598281917405;
-0.06667849447794799;
-0.03009747343724256]
R_inv = [0.9969452351705235, -0.05932607556034211, 0.05079975225724825;
0.0598334759429696, 0.9981719680327525, -0.008525103530306;
-0.05020112774999549, 0.01153858709846634, 0.9986724727258676]
t_inv = [-0.1461462830262246;
0.0576744363694081;
0.03806387978797152]
calling bundle adjustment
iteration= 0   chi2= 1.816112   time= 6.0297e-05   cumTime= 6.0297e-05   edges= 72   schur= 0   lambda= 0.000758   levenbergIter= 1
iteration= 1   chi2= 1.815514   time= 3.1219e-05   cumTime= 9.1516e-05   edges= 72   schur= 0   lambda= 0.000505   levenbergIter= 1
iteration= 2   chi2= 1.815514   time= 2.7632e-05   cumTime= 0.000119148   edges= 72   schur= 0   lambda= 0.000337   levenbergIter= 1
iteration= 3   chi2= 1.815514   time= 2.6113e-05   cumTime= 0.000145261   edges= 72   schur= 0   lambda= 0.000225   levenbergIter= 1
iteration= 4   chi2= 1.815514   time= 2.6208e-05   cumTime= 0.000171469   edges= 72   schur= 0   lambda= 0.000150   levenbergIter= 1
iteration= 5   chi2= 1.815514   time= 2.7833e-05   cumTime= 0.000199302   edges= 72   schur= 0   lambda= 0.000299   levenbergIter= 1
optimization time :0.000497761
p1=[-0.243698, -0.117719, 1.5848]
p2=[-0.297211, -0.0956614, 1.6558]
(R*p2+t)=[-0.2409901495364604;
-0.1254270500587826;
1.609221205029395]

p1=[0.402045, -0.341821, 2.2068]
p2=[0.378811, -0.262859, 2.2196]
(R*p2+t)=[0.3946591022539743;
-0.3259188829495218;
2.20803983035825]

p1=[-0.522843, -0.214436, 1.4956]
p2=[-0.58581, -0.208584, 1.6052]
(R*p2+t)=[-0.532923946912698;
-0.2216052393093164;
1.54499035805527]

p1=[-0.627753, 0.160186, 1.3396]
p2=[-0.709645, 0.159033, 1.4212]
(R*p2+t)=[-0.6251478068660965;
0.1505624195985039;
1.351809862638435]

p1=[0.594266, -0.0256024, 1.5332]
p2=[0.514795, 0.0391393, 1.5332]
(R*p2+t)=[0.5827556962439571;
-0.04046060384335358;
1.526884519595548]

感谢你看到这里，这部分的内容很多，有一些我看的不是很懂，还是带入实际的项目中，从实践里面出真理，再返过来研究。Cheers！

posted @ 2020-12-13 20:15 xkyl 阅读(434) 评论(0) 编辑收藏举报

刷新页面返回顶部

xkyl

视觉里程计1-SLAMCP7

公告