图像拼接1 opencv stitcher

1. 绪言

图像拼接算是传统计算机视觉领域集大成者的一个方向，涉及的步骤主要有：特征点提取、特征匹配、图像配准、图像融合等。如下图1.1 是opencv图像拼接的流程图，图像拼接方向涉及的研究方向众多，如特征提取方向就有常用的SIFT、SURF、ORB等，这些特征提取方法在slam方向也有非常广的应用，所以有余力的话弄清楚这些实现细节，对建立自身的知识体系还是非常有必要的。
图1.1 opencv 拼接流程图

图1.1 opencv 拼接流程图

2. opencv stitcher

opencv当中有直接封装的拼接类 Stitcher，基本是调用一个接口就可以完成所有拼接步骤，得到拼接图像。测试用例图片参考。

2.1 示例代码

下面是调用接口的示例代码：

#include "opencv2/opencv.hpp"
#include "logging.hpp"
#include <string>

void stitchImg(const std::vector<cv::Mat>& imgs, cv::Mat& pano)
{
    //设置拼接图像 warp 模式，有PANORAMA与SCANS两种模式
    //panorama: 图像会投影到球面或者柱面进行拼接
    //scans: 默认没有光照补偿与柱面投影，直接经过仿射变换进行拼接
    cv::Stitcher::Mode mode = cv::Stitcher::PANORAMA;
    cv::Ptr<cv::Stitcher> stitcher = cv::Stitcher::create(mode);
    cv::Stitcher::Status status = stitcher->stitch(imgs, pano);
    if(cv::Stitcher::OK != status){
        LOG(INFO) << "failed to stitch images, err code: " << (int)status;
    }
}

int main(int argc, char* argv[])
{
    std::string pic_path = "data/img/*";
    std::string pic_pattern = ".jpg";

    if(2 == argc){
        pic_path = std::string(argv[1]);
    }else if(3 == argc){
        pic_path = std::string(argv[1]);
        pic_pattern = std::string(argv[2]);
    }else{
        LOG(INFO) << "default value";
    }
    std::vector<cv::String> img_names;
    std::vector<cv::Mat> imgs;
    pic_pattern = pic_path + pic_pattern;
    cv::glob(pic_pattern, img_names);
    if(img_names.empty()){
        LOG(INFO) << "no images";
        return -1;
    }
    for(size_t i = 0; i < img_names.size(); ++i){
        cv::Mat img = cv::imread(img_names[i]);
        imgs.push_back(img.clone());
    }
    cv::Mat pano;
    stitchImg(imgs, pano);
    if(!pano.empty()){
        cv::imshow("pano", pano);
        cv::waitKey(0);
    }
    return 0;
}

2.2 示例效果

mode = panorama

CMU场景拼接 1
mode=scans

CMU场景拼接 2

上面的两组CMU场景对比图说明了PANORAMA与SCANS的区别，前者会将图像进行柱面投影，得到的全景图会有弯曲的现象，而SCANS只有仿射变换，所以拼接图基本都保留了原图的直线平行关系。

3. 简化的拼接

这一节准备挖一些坑。在看opencv stitcher里面的细节时，先简单模仿实现一下scans模式的拼接，看看拼接的效果。基本思路是：

特征提取与匹配，找到图像间的匹配关系；
估算图像的变换矩阵，以便图像对齐；选取十个匹配程度最高的特征点，绘制这十个特征点，找到正确匹配的三个点估算仿射变换矩阵；
设置一个画布，宽度是所有图像的宽度之和，高度为所有图像高度的最大值，默认值为0
将匹配程度最高的点投影到画布上，作为左右拼接图像的中心
以右边的图像为参考图像，即将左边的图像进行变换然后与右边的图像进行融合

3.1 特征提取

常用的特征提取主要有SIFT 、SURF、ORB，ORB速度较快，再其他视觉任务中用的也比较多，但是精度没有前两者高。

void featureExtract(const std::vector<cv::Mat> &imgs,
                    std::vector<std::vector<cv::KeyPoint>> &keyPoints,
                    std::vector<cv::Mat> &imageDescs)
{
    keyPoints.clear();
    imageDescs.clear();
    //提取特征点
    int minHessian       = 800;
    cv::Ptr<cv::ORB> orbDetector = cv::ORB::create(minHessian);
    for (int i = 0; i < imgs.size(); ++i) {
        std::vector<cv::KeyPoint> keyPoint;
        //灰度图转换
        cv::Mat image;
        cvtColor(imgs[i], image, cv::COLOR_BGR2GRAY);
        orbDetector->detect(image, keyPoint);
        keyPoints.push_back(keyPoint);
        cv::Mat imageDesc1;
        orbDetector->compute(image, keyPoint, imageDesc1);
        /*需要将imageDesc转成浮点型，不然会出错
       **Unsupported format or combination of formats 
       **in buildIndex using FLANN algorithm
       */
        imageDesc1.convertTo(imageDesc1, CV_32F);
        imageDescs.push_back(imageDesc1.clone());
    }
}

3.2 特征匹配

这一步根据图像的特征点确定图像之间特征点的配对关系，从而求取变换矩阵H 。此H是对整幅图像进行的变换，现在为了解决一些视差问题，有人在图像上划分网格，然后对每个网格单独计算变换矩阵H。

void featureMatching(const std::vector<cv::Mat> &imgs,
                     const std::vector<std::vector<cv::KeyPoint>> &keyPoints,
                     const std::vector<cv::Mat> &imageDescs,
                     std::vector<std::vector<cv::Point2f>> &optimalMatchePoint)
{
    optimalMatchePoint.clear();
    //获得匹配特征点，并提取最优配对,此处假设是顺序输入，测试使用假设是两张图
    cv::FlannBasedMatcher matcher;
    std::vector<cv::DMatch> matchePoints;
    matcher.match(imageDescs[0], imageDescs[1], matchePoints, cv::Mat());

    sort(matchePoints.begin(), matchePoints.end());//特征点排序
    //获取排在前N个的最优匹配特征点
    std::vector<cv::Point2f> imagePoints1, imagePoints2;
    for (int i = 0; i < MAX_OPTIMAL_POINT_NUM; i++) {
        imagePoints1.push_back(keyPoints[0][matchePoints[i].queryIdx].pt);
        imagePoints2.push_back(keyPoints[1][matchePoints[i].trainIdx].pt);
    }
       optimalMatchePoint.push_back(std::vector<cv::Point2f>{
            imagePoints1[0], imagePoints1[3], imagePoints1[6]});
    optimalMatchePoint.push_back(std::vector<cv::Point2f>{
            imagePoints2[0], imagePoints2[3], imagePoints2[6]});
}

使用orb特征提取的时候，这里有很多误匹配的点，上面三个点是根据显示出来匹配正确的点，将用来估算仿射变换矩阵H。opencv 内部处理是使用 RANSAC 算法进行估计的，此处我省略了这个步骤。

3.3 估算仿射变换矩阵

上一步得到了最强匹配的三个点，这一步可以直接计算得到H。在计算之前，先将右边的图像移到画布的右边

void getAffineMat(std::vector<std::vector<cv::Point2f>>& optimalMatchePoint,
                  int left_cols, std::vector<cv::Mat>& Hs)
{
    std::vector<cv::Point2f> newMatchingPt;
    for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
        cv::Point2f pt = optimalMatchePoint[1][i];
        pt.x += left_cols;
        newMatchingPt.push_back(pt);
    }
    //左边图像的变换矩阵，右图的特征点经过移动，左图需要变换到画布上右图的特征点位置
    cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
    //右边图像的变换矩阵，即将右图移到画布右侧
    cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);

    Hs.push_back(homo1);
    Hs.push_back(homo2);
}

3.4 拼接图像

确定了变换矩阵以后，取最强响应的特征点作为两幅图像的融合中心，中心左右两边分别对应各自两幅图像。这种拼接处理方式非常粗暴，对于只有平移变化拍摄的图像，尚且还能拼接到一起，但是若加上旋转或者拍摄时光心不对起的情况，拼接错位非常严重。另外一点是图像融合，此处直接选用一条分界线作为选取原图像素的依据，过渡不够平滑，也会有错位。

void getPano2(std::vector<cv::Mat> &imgs, const std::vector<cv::Mat> &H, 
			  cv::Point2f &optimalPt, cv::Mat &pano)
{
    //以右边图像为参考，将left的图像经过仿射变换变到与右边图像重合,取最强响应特征点作为两幅图像融合的中心
    //默认的全景图画布尺寸为：
   //	width=left.width + right.width, 
   //	height = std::max(left.height, right.height)
    int pano_width  = imgs[0].cols + imgs[1].cols;
    int pano_height = std::max(imgs[0].rows, imgs[1].rows);
    pano            = cv::Mat::zeros(cv::Size(pano_width, pano_height), CV_8UC3);
    cv::Mat img_trans0, img_trans1;
    img_trans0 = cv::Mat::zeros(pano.size(), CV_8UC3);
    img_trans1 = cv::Mat::zeros(pano.size(), CV_8UC3);
    //原图经过仿射变化后已经位于全景图对应的位置
    cv::warpAffine(imgs[0], img_trans0, H[0], pano.size());
    cv::warpAffine(imgs[1], img_trans1, H[1], pano.size());

    //最强响应特征点
    cv::Mat trans_pt = (cv::Mat_<double>(3, 1) << optimalPt.x, optimalPt.y, 1.0f);
    //最强响应特征点在画布上的位置
    trans_pt = H[0]*trans_pt;

    //确定两幅图像需要选取的区域
    cv::Rect left_roi  = cv::Rect(0, 0, trans_pt.at<double>(0, 0), pano_height);
    cv::Rect right_roi = cv::Rect(trans_pt.at<double>(0, 0), 0,
            pano_width - trans_pt.at<double>(0, 0) + 1, pano_height);
    //将选取的区域像素复制到画布上
    img_trans0(left_roi).copyTo(pano(left_roi));
    img_trans1(right_roi).copyTo(pano(right_roi));
    cv::imshow("pano", pano);
    cv::waitKey(0);
}

int main(int argc, char *argv[])
{
    cv::Mat image01 = cv::imread("data/img/medium11.jpg");
    cv::resize(image01, image01, cv::Size(image01.cols, image01.rows + 1));
    cv::Mat image02 = cv::imread("data/img/medium12.jpg");
    cv::resize(image02, image02, cv::Size(image02.cols, image02.rows + 1));
    std::vector<cv::Mat> imgs = {image01, image02};
    std::vector<std::vector<cv::KeyPoint>> keyPoints;
    std::vector<std::vector<cv::Point2f>> optimalMatchePoint;
    std::vector<cv::Mat> imageDescs;
    featureExtract(imgs, keyPoints, imageDescs);
    featureMatching(imgs, keyPoints, imageDescs, optimalMatchePoint);

    std::vector<cv::Point2f> newMatchingPt;
    for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
        cv::Point2f pt = optimalMatchePoint[1][i];
        pt.x += imgs[0].cols;
        newMatchingPt.push_back(pt);
    }
    cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
    cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);

    std::vector<cv::Mat> Hs = {homo1, homo2};
    cv::Mat pano;
    //getPano1(imgs, Hs, pano);
    getPano2(imgs, Hs, optimalMatchePoint[0][0], pano);
    return 0;
}

3.5 简化拼接效果

只有平移变化的图像拼接效果

图 3.5.1 雪地场景仿射变化拼接

有旋转变化的图像拼接

图 3.5.2 CMU仿射变化拼接

算不上啥效果吧，图3.5.2可以清晰的看到错位，而且整个拼接图左侧有明显的倾斜，左侧红框为左图区域，中间绘制的红线表示左右图分界线。错位有多方面原因，没有好的融合过渡算法，没有考虑到相机的旋转变化，拼接缝位置找的不好。画面有倾斜，不够自然，则是单一选择某张图片作为参考图片，将其它图像变换到其所在坐标系导致。

4. opencv stitcher 模块

opencv在示例代码中有提供 stitching_detailed.cpp 示例，里面包含了各个模块的实现步骤。我们在实际使用的时候一般都是要求实时拼接，直接调接口基本是没法达到这个要求的，特别是在arm嵌入式端，这就需要我们弄清楚实现细节找到优化点。我这里只对 stitching_detailed.cpp 中的部分细节感兴趣，所以将耗时统计、缩放选找融合区域这些都去掉了。

4.1 参数预览

opencv的stitching_detailed.cpp中有非常多的配置参数，由图1.1 opencv 拼接流程图可知，opencv stitcher中的主要步骤有：

registration
- 特征提取
- 特征匹配
- 图像配准
- 相机内参估算
- 波形矫正
compositing
- 图像变换
- 光照补偿
- 查找拼接缝
- 图像融合

registration部分主要是用来获取图像间的匹配关系，估算相机的内外参，并使用BA算法对参数进行优化，此模块主要是对图像的拼接顺序和变换矩阵估算。compositing部分则是在获取到参数以后进行图像变换、融合，并使用光照补偿等算法进行画面一致性的改善。参数预览如下：

static void printUsage(char** argv)
{
    cout <<
         "Rotation model images stitcher.\n\n"
         << argv[0] << " img1 img2 [...imgN] [flags]\n\n"
                       "Flags:\n"
                       "  --preview\n"
                       "      Run stitching in the preview mode. Works faster than usual mode,\n"
                       "      but output image will have lower resolution.\n"
                       "  --try_cuda (yes|no)\n"
                       "      Try to use CUDA. The default value is 'no'. All default values\n"
                       "      are for CPU mode.\n"
                       "\nMotion Estimation Flags:\n"
                       "  --work_megapix <float>\n"
                       "      Resolution for image registration step. The default is 0.6 Mpx.\n"
                       "  --features (surf|orb|sift|akaze)\n"
                       "      Type of features used for images matching.\n"
                       "      The default is surf if available, orb otherwise.\n"
                       "  --matcher (homography|affine)\n"
                       "      Matcher used for pairwise image matching.\n"
                       "  --estimator (homography|affine)\n"
                       "      Type of estimator used for transformation estimation.\n"
                       "  --match_conf <float>\n"
                       "      Confidence for feature matching step. The default is 0.65 for surf and 0.3 for orb.\n"
                       "  --conf_thresh <float>\n"
                       "      Threshold for two images are from the same panorama confidence.\n"
                       "      The default is 1.0.\n"
                       "  --ba (no|reproj|ray|affine)\n"
                       "      Bundle adjustment cost function. The default is ray.\n"
                       "  --ba_refine_mask (mask)\n"
                       "      Set refinement mask for bundle adjustment. It looks like 'x_xxx',\n"
                       "      where 'x' means refine respective parameter and '_' means don't\n"
                       "      refine one, and has the following format:\n"
                       "      <fx><skew><ppx><aspect><ppy>. The default mask is 'xxxxx'. If bundle\n"
                       "      adjustment doesn't support estimation of selected parameter then\n"
                       "      the respective flag is ignored.\n"
                       "  --wave_correct (no|horiz|vert)\n"
                       "      Perform wave effect correction. The default is 'horiz'.\n"
                       "  --save_graph <file_name>\n"
                       "      Save matches graph represented in DOT language to <file_name> file.\n"
                       "      Labels description: Nm is number of matches, Ni is number of inliers,\n"
                       "      C is confidence.\n"
                       "\nCompositing Flags:\n"
                       "  --warp (affine|plane|cylindrical|spherical|fisheye|stereographic|"
                       "     compressedPlaneA2B1|compressedPlaneA1.5B1|compressedPlanePortraitA2B1|"
                       "      compressedPlanePortraitA1.5B1|paniniA2B1|paniniA1.5B1|paniniPortraitA2B1|"
                       "      paniniPortraitA1.5B1|mercator|transverseMercator)\n"
                       "      Warp surface type. The default is 'spherical'.\n"
                       "  --seam_megapix <float>\n"
                       "      Resolution for seam estimation step. The default is 0.1 Mpx.\n"
                       "  --seam (no|voronoi|gc_color|gc_colorgrad)\n"
                       "      Seam estimation method. The default is 'gc_color'.\n"
                       "  --compose_megapix <float>\n"
                       "      Resolution for compositing step. Use -1 for original resolution.\n"
                       "      The default is -1.\n"
                       "  --expos_comp (no|gain|gain_blocks|channels|channels_blocks)\n"
                       "      Exposure compensation method. The default is 'gain_blocks'.\n"
                       "  --expos_comp_nr_feeds <int>\n"
                       "      Number of exposure compensation feed. The default is 1.\n"
                       "  --expos_comp_nr_filtering <int>\n"
                       "      Number of filtering iterations of the exposure compensation gains.\n"
                       "      Only used when using a block exposure compensation method.\n"
                       "      The default is 2.\n"
                       "  --expos_comp_block_size <int>\n"
                       "      BLock size in pixels used by the exposure compensator.\n"
                       "      Only used when using a block exposure compensation method.\n"
                       "      The default is 32.\n"
                       "  --blend (no|feather|multiband)\n"
                       "      Blending method. The default is 'multiband'.\n"
                       "  --blend_strength <float>\n"
                       "      Blending strength from [0,100] range. The default is 5.\n"
                       "  --output <result_img>\n"
                       "      The default is 'result.jpg'.\n"
                       "  --timelapse (as_is|crop) \n"
                       "      Output warped images separately as frames of a time lapse movie, "
                       "      with 'fixed_' prepended to input file names.\n"
                       "  --rangewidth <int>\n"
                       "      uses range_width to limit number of images to match with.\n";
}

4.2 Motion Estimation Flags 参数含义

work_megapix ：在特征提取等 registration过程中，为了减小耗时，会将图像进行缩放，这就需要一个缩放比例；
features : 表示选用的提取的特征，（SURF|ORB|SIFT|akaze）
matcher : 特征匹配方法，（homography | affine）,单应性变换与仿射变换方法，分别对应BestOf2NearestMatcher、AffineBestOf2NearestMatcher，后者会找到两幅图仿射变换的最佳匹配点；
estimator : （homography | affine）,相机参数评估方法；
match_conf : 浮点型数据，表示匹配阶段内点判断的阈值；
conf_thresh : 两幅图片是来自同一全景的阈值：
ba : BA优化相机参数的代价函数，（no|reproj|ray|affine）;
ba_refine_mask : BA优化的时候，可以固定某些参数不动，通过指定mask实现。'x'表示需要优化，'_'表示固定参数，对应的顺序是fx,skew,ppx,aspect,ppy；
wave_correct : 波形矫正标志，有（no|horiz|vert）三种类型，可以将拼接图像约束在水平方向，或者垂直方向，避免出现“大鹏展翅”的情况；
save_graph : 以DOT语言格式保存图像之间的匹配关系；

4.3 Compositing Flags 参数含义

warp ：图像变换方法，包括球面投影、柱面投影等，opencv支持的投影方法比较多；
seam_megapix : 寻找拼接缝的时候，会将图像进行缩放，此参数与 work_scale 可以用来控制缩放比例；
seam : 接缝寻找的方法；
compose_megapix : 预览时用于设置拼接过程中以及拼接图的分辨率；
expos_comp : 光照补偿方法；
blend : 图像融合方法，常用的有（feather|multibend）；

4.4 小结

如果输入的图片数量、分辨率不是太大，源码中一些分辨率缩放的步骤，还有耗时测试的步骤都可以去除，以简化拼接实现流程，在实际的拼接应用过程中，一般也不会直接采用这个流程进行实时拼接。流程中每一个配置参数涉及的算法原理有助于我们理解更多细节，也是后面我想逐步介绍的内容

posted @ 2022-01-25 17:10 wangnb 阅读(2364) 评论(0) 编辑收藏举报

刷新页面返回顶部

依栏望月