图像拼接1 opencv stitcher

1. 绪言

  图像拼接算是传统计算机视觉领域集大成者的一个方向,涉及的步骤主要有:特征点提取、特征匹配、图像配准、图像融合等。如下图1.1 是opencv图像拼接的流程图,图像拼接方向涉及的研究方向众多,如特征提取方向就有常用的SIFT、SURF、ORB等,这些特征提取方法在slam方向也有非常广的应用,所以有余力的话弄清楚这些实现细节,对建立自身的知识体系还是非常有必要的。
图1.1 opencv 拼接流程图

2. opencv stitcher

  opencv当中有直接封装的拼接类 Stitcher,基本是调用一个接口就可以完成所有拼接步骤,得到拼接图像。测试用例图片参考

2.1 示例代码


#include "opencv2/opencv.hpp"
#include "logging.hpp"
#include <string>

void stitchImg(const std::vector<cv::Mat>& imgs, cv::Mat& pano)
    //设置拼接图像 warp 模式,有PANORAMA与SCANS两种模式
    //panorama: 图像会投影到球面或者柱面进行拼接
    //scans: 默认没有光照补偿与柱面投影,直接经过仿射变换进行拼接
    cv::Stitcher::Mode mode = cv::Stitcher::PANORAMA;
    cv::Ptr<cv::Stitcher> stitcher = cv::Stitcher::create(mode);
    cv::Stitcher::Status status = stitcher->stitch(imgs, pano);
    if(cv::Stitcher::OK != status){
        LOG(INFO) << "failed to stitch images, err code: " << (int)status;

int main(int argc, char* argv[])
    std::string pic_path = "data/img/*";
    std::string pic_pattern = ".jpg";

    if(2 == argc){
        pic_path = std::string(argv[1]);
    }else if(3 == argc){
        pic_path = std::string(argv[1]);
        pic_pattern = std::string(argv[2]);
        LOG(INFO) << "default value";
    std::vector<cv::String> img_names;
    std::vector<cv::Mat> imgs;
    pic_pattern = pic_path + pic_pattern;
    cv::glob(pic_pattern, img_names);
        LOG(INFO) << "no images";
        return -1;
    for(size_t i = 0; i < img_names.size(); ++i){
        cv::Mat img = cv::imread(img_names[i]);
    cv::Mat pano;
    stitchImg(imgs, pano);
        cv::imshow("pano", pano);
    return 0;

2.2 示例效果

  • mode = panorama

    CMU场景拼接 1
  • mode=scans

    CMU场景拼接 2


3. 简化的拼接

  这一节准备挖一些坑。在看opencv stitcher里面的细节时,先简单模仿实现一下scans模式的拼接,看看拼接的效果。基本思路是:

  • 特征提取与匹配,找到图像间的匹配关系;
  • 估算图像的变换矩阵,以便图像对齐;选取十个匹配程度最高的特征点,绘制这十个特征点,找到正确匹配的三个点估算仿射变换矩阵;
  • 设置一个画布,宽度是所有图像的宽度之和,高度为所有图像高度的最大值,默认值为0
  • 将匹配程度最高的点投影到画布上,作为左右拼接图像的中心
  • 以右边的图像为参考图像,即将左边的图像进行变换然后与右边的图像进行融合

3.1 特征提取

  常用的特征提取主要有SIFT 、SURF、ORB,ORB速度较快,再其他视觉任务中用的也比较多,但是精度没有前两者高。

void featureExtract(const std::vector<cv::Mat> &imgs,
                    std::vector<std::vector<cv::KeyPoint>> &keyPoints,
                    std::vector<cv::Mat> &imageDescs)
    int minHessian       = 800;
    cv::Ptr<cv::ORB> orbDetector = cv::ORB::create(minHessian);
    for (int i = 0; i < imgs.size(); ++i) {
        std::vector<cv::KeyPoint> keyPoint;
        cv::Mat image;
        cvtColor(imgs[i], image, cv::COLOR_BGR2GRAY);
        orbDetector->detect(image, keyPoint);
        cv::Mat imageDesc1;
        orbDetector->compute(image, keyPoint, imageDesc1);
       **Unsupported format or combination of formats 
       **in buildIndex using FLANN algorithm
        imageDesc1.convertTo(imageDesc1, CV_32F);

3.2 特征匹配

  这一步根据图像的特征点确定图像之间特征点的配对关系,从而求取变换矩阵H 。此H是对整幅图像进行的变换,现在为了解决一些视差问题,有人在图像上划分网格,然后对每个网格单独计算变换矩阵H。

void featureMatching(const std::vector<cv::Mat> &imgs,
                     const std::vector<std::vector<cv::KeyPoint>> &keyPoints,
                     const std::vector<cv::Mat> &imageDescs,
                     std::vector<std::vector<cv::Point2f>> &optimalMatchePoint)
    cv::FlannBasedMatcher matcher;
    std::vector<cv::DMatch> matchePoints;
    matcher.match(imageDescs[0], imageDescs[1], matchePoints, cv::Mat());

    sort(matchePoints.begin(), matchePoints.end());//特征点排序
    std::vector<cv::Point2f> imagePoints1, imagePoints2;
    for (int i = 0; i < MAX_OPTIMAL_POINT_NUM; i++) {
            imagePoints1[0], imagePoints1[3], imagePoints1[6]});
            imagePoints2[0], imagePoints2[3], imagePoints2[6]});

  使用orb特征提取的时候,这里有很多误匹配的点,上面三个点是根据显示出来匹配正确的点,将用来估算仿射变换矩阵H。opencv 内部处理是使用 RANSAC 算法进行估计的,此处我省略了这个步骤。

3.3 估算仿射变换矩阵


void getAffineMat(std::vector<std::vector<cv::Point2f>>& optimalMatchePoint,
                  int left_cols, std::vector<cv::Mat>& Hs)
    std::vector<cv::Point2f> newMatchingPt;
    for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
        cv::Point2f pt = optimalMatchePoint[1][i];
        pt.x += left_cols;
    cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
    cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);


3.4 拼接图像


void getPano2(std::vector<cv::Mat> &imgs, const std::vector<cv::Mat> &H, 
			  cv::Point2f &optimalPt, cv::Mat &pano)
   //	width=left.width + right.width, 
   //	height = std::max(left.height, right.height)
    int pano_width  = imgs[0].cols + imgs[1].cols;
    int pano_height = std::max(imgs[0].rows, imgs[1].rows);
    pano            = cv::Mat::zeros(cv::Size(pano_width, pano_height), CV_8UC3);
    cv::Mat img_trans0, img_trans1;
    img_trans0 = cv::Mat::zeros(pano.size(), CV_8UC3);
    img_trans1 = cv::Mat::zeros(pano.size(), CV_8UC3);
    cv::warpAffine(imgs[0], img_trans0, H[0], pano.size());
    cv::warpAffine(imgs[1], img_trans1, H[1], pano.size());

    cv::Mat trans_pt = (cv::Mat_<double>(3, 1) << optimalPt.x, optimalPt.y, 1.0f);
    trans_pt = H[0]*trans_pt;

    cv::Rect left_roi  = cv::Rect(0, 0, trans_pt.at<double>(0, 0), pano_height);
    cv::Rect right_roi = cv::Rect(trans_pt.at<double>(0, 0), 0,
            pano_width - trans_pt.at<double>(0, 0) + 1, pano_height);
    cv::imshow("pano", pano);

int main(int argc, char *argv[])
    cv::Mat image01 = cv::imread("data/img/medium11.jpg");
    cv::resize(image01, image01, cv::Size(image01.cols, image01.rows + 1));
    cv::Mat image02 = cv::imread("data/img/medium12.jpg");
    cv::resize(image02, image02, cv::Size(image02.cols, image02.rows + 1));
    std::vector<cv::Mat> imgs = {image01, image02};
    std::vector<std::vector<cv::KeyPoint>> keyPoints;
    std::vector<std::vector<cv::Point2f>> optimalMatchePoint;
    std::vector<cv::Mat> imageDescs;
    featureExtract(imgs, keyPoints, imageDescs);
    featureMatching(imgs, keyPoints, imageDescs, optimalMatchePoint);

    std::vector<cv::Point2f> newMatchingPt;
    for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
        cv::Point2f pt = optimalMatchePoint[1][i];
        pt.x += imgs[0].cols;
    cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
    cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);

    std::vector<cv::Mat> Hs = {homo1, homo2};
    cv::Mat pano;
    //getPano1(imgs, Hs, pano);
    getPano2(imgs, Hs, optimalMatchePoint[0][0], pano);
    return 0;

3.5 简化拼接效果

  • 只有平移变化的图像拼接效果
图 3.5.1 雪地场景仿射变化拼接
  • 有旋转变化的图像拼接
图 3.5.2 CMU仿射变化拼接


4. opencv stitcher 模块

  opencv在示例代码中有提供 stitching_detailed.cpp 示例,里面包含了各个模块的实现步骤。我们在实际使用的时候一般都是要求实时拼接,直接调接口基本是没法达到这个要求的,特别是在arm嵌入式端,这就需要我们弄清楚实现细节找到优化点。我这里只对 stitching_detailed.cpp 中的部分细节感兴趣,所以将耗时统计、缩放选找融合区域这些都去掉了。

4.1 参数预览

opencv的stitching_detailed.cpp中有非常多的配置参数,  由图1.1 opencv 拼接流程图可知,opencv stitcher中的主要步骤有:

  • registration
    • 特征提取
    • 特征匹配
    • 图像配准
    • 相机内参估算
    • 波形矫正
  • compositing
    • 图像变换
    • 光照补偿
    • 查找拼接缝
    • 图像融合


static void printUsage(char** argv)
    cout <<
         "Rotation model images stitcher.\n\n"
         << argv[0] << " img1 img2 [...imgN] [flags]\n\n"
                       "  --preview\n"
                       "      Run stitching in the preview mode. Works faster than usual mode,\n"
                       "      but output image will have lower resolution.\n"
                       "  --try_cuda (yes|no)\n"
                       "      Try to use CUDA. The default value is 'no'. All default values\n"
                       "      are for CPU mode.\n"
                       "\nMotion Estimation Flags:\n"
                       "  --work_megapix <float>\n"
                       "      Resolution for image registration step. The default is 0.6 Mpx.\n"
                       "  --features (surf|orb|sift|akaze)\n"
                       "      Type of features used for images matching.\n"
                       "      The default is surf if available, orb otherwise.\n"
                       "  --matcher (homography|affine)\n"
                       "      Matcher used for pairwise image matching.\n"
                       "  --estimator (homography|affine)\n"
                       "      Type of estimator used for transformation estimation.\n"
                       "  --match_conf <float>\n"
                       "      Confidence for feature matching step. The default is 0.65 for surf and 0.3 for orb.\n"
                       "  --conf_thresh <float>\n"
                       "      Threshold for two images are from the same panorama confidence.\n"
                       "      The default is 1.0.\n"
                       "  --ba (no|reproj|ray|affine)\n"
                       "      Bundle adjustment cost function. The default is ray.\n"
                       "  --ba_refine_mask (mask)\n"
                       "      Set refinement mask for bundle adjustment. It looks like 'x_xxx',\n"
                       "      where 'x' means refine respective parameter and '_' means don't\n"
                       "      refine one, and has the following format:\n"
                       "      <fx><skew><ppx><aspect><ppy>. The default mask is 'xxxxx'. If bundle\n"
                       "      adjustment doesn't support estimation of selected parameter then\n"
                       "      the respective flag is ignored.\n"
                       "  --wave_correct (no|horiz|vert)\n"
                       "      Perform wave effect correction. The default is 'horiz'.\n"
                       "  --save_graph <file_name>\n"
                       "      Save matches graph represented in DOT language to <file_name> file.\n"
                       "      Labels description: Nm is number of matches, Ni is number of inliers,\n"
                       "      C is confidence.\n"
                       "\nCompositing Flags:\n"
                       "  --warp (affine|plane|cylindrical|spherical|fisheye|stereographic|"
                       "     compressedPlaneA2B1|compressedPlaneA1.5B1|compressedPlanePortraitA2B1|"
                       "      compressedPlanePortraitA1.5B1|paniniA2B1|paniniA1.5B1|paniniPortraitA2B1|"
                       "      paniniPortraitA1.5B1|mercator|transverseMercator)\n"
                       "      Warp surface type. The default is 'spherical'.\n"
                       "  --seam_megapix <float>\n"
                       "      Resolution for seam estimation step. The default is 0.1 Mpx.\n"
                       "  --seam (no|voronoi|gc_color|gc_colorgrad)\n"
                       "      Seam estimation method. The default is 'gc_color'.\n"
                       "  --compose_megapix <float>\n"
                       "      Resolution for compositing step. Use -1 for original resolution.\n"
                       "      The default is -1.\n"
                       "  --expos_comp (no|gain|gain_blocks|channels|channels_blocks)\n"
                       "      Exposure compensation method. The default is 'gain_blocks'.\n"
                       "  --expos_comp_nr_feeds <int>\n"
                       "      Number of exposure compensation feed. The default is 1.\n"
                       "  --expos_comp_nr_filtering <int>\n"
                       "      Number of filtering iterations of the exposure compensation gains.\n"
                       "      Only used when using a block exposure compensation method.\n"
                       "      The default is 2.\n"
                       "  --expos_comp_block_size <int>\n"
                       "      BLock size in pixels used by the exposure compensator.\n"
                       "      Only used when using a block exposure compensation method.\n"
                       "      The default is 32.\n"
                       "  --blend (no|feather|multiband)\n"
                       "      Blending method. The default is 'multiband'.\n"
                       "  --blend_strength <float>\n"
                       "      Blending strength from [0,100] range. The default is 5.\n"
                       "  --output <result_img>\n"
                       "      The default is 'result.jpg'.\n"
                       "  --timelapse (as_is|crop) \n"
                       "      Output warped images separately as frames of a time lapse movie, "
                       "      with 'fixed_' prepended to input file names.\n"
                       "  --rangewidth <int>\n"
                       "      uses range_width to limit number of images to match with.\n";

4.2 Motion Estimation Flags 参数含义

  • work_megapix :在特征提取等 registration过程中,为了减小耗时,会将图像进行缩放,这就需要一个缩放比例;
  • features : 表示选用的提取的特征,(SURF|ORB|SIFT|akaze)
  • matcher : 特征匹配方法,(homography | affine),单应性变换与仿射变换方法,分别对应BestOf2NearestMatcher、AffineBestOf2NearestMatcher,后者会找到两幅图仿射变换的最佳匹配点;
  • estimator : (homography | affine),相机参数评估方法;
  • match_conf : 浮点型数据,表示匹配阶段内点判断的阈值;
  • conf_thresh : 两幅图片是来自同一全景的阈值:
  • ba : BA优化相机参数的代价函数,(no|reproj|ray|affine);
  • ba_refine_mask : BA优化的时候,可以固定某些参数不动,通过指定mask实现。'x'表示需要优化,'_'表示固定参数,对应的顺序是fx,skew,ppx,aspect,ppy;
  • wave_correct : 波形矫正标志,有(no|horiz|vert)三种类型,可以将拼接图像约束在水平方向,或者垂直方向,避免出现“大鹏展翅”的情况;
  • save_graph : 以DOT语言格式保存图像之间的匹配关系;

4.3 Compositing Flags 参数含义

  • warp : 图像变换方法,包括球面投影、柱面投影等,opencv支持的投影方法比较多;
  • seam_megapix : 寻找拼接缝的时候,会将图像进行缩放,此参数与 work_scale 可以用来控制缩放比例;
  • seam : 接缝寻找的方法;
  • compose_megapix : 预览时用于设置拼接过程中以及拼接图的分辨率;
  • expos_comp : 光照补偿方法;
  • blend : 图像融合方法,常用的有(feather|multibend);

4.4 小结


