图像拼接1 opencv stitcher

1. 绪言

  图像拼接算是传统计算机视觉领域集大成者的一个方向,涉及的步骤主要有:特征点提取、特征匹配、图像配准、图像融合等。如下图1.1 是opencv图像拼接的流程图,图像拼接方向涉及的研究方向众多,如特征提取方向就有常用的SIFT、SURF、ORB等,这些特征提取方法在slam方向也有非常广的应用,所以有余力的话弄清楚这些实现细节,对建立自身的知识体系还是非常有必要的。
图1.1 opencv 拼接流程图

图1.1 opencv 拼接流程图

2. opencv stitcher

  opencv当中有直接封装的拼接类 Stitcher,基本是调用一个接口就可以完成所有拼接步骤,得到拼接图像。测试用例图片参考

2.1 示例代码

下面是调用接口的示例代码:

#include "opencv2/opencv.hpp"
#include "logging.hpp"
#include <string>

void stitchImg(const std::vector<cv::Mat>& imgs, cv::Mat& pano)
{
    //设置拼接图像 warp 模式,有PANORAMA与SCANS两种模式
    //panorama: 图像会投影到球面或者柱面进行拼接
    //scans: 默认没有光照补偿与柱面投影,直接经过仿射变换进行拼接
    cv::Stitcher::Mode mode = cv::Stitcher::PANORAMA;
    cv::Ptr<cv::Stitcher> stitcher = cv::Stitcher::create(mode);
    cv::Stitcher::Status status = stitcher->stitch(imgs, pano);
    if(cv::Stitcher::OK != status){
        LOG(INFO) << "failed to stitch images, err code: " << (int)status;
    }
}

int main(int argc, char* argv[])
{
    std::string pic_path = "data/img/*";
    std::string pic_pattern = ".jpg";

    if(2 == argc){
        pic_path = std::string(argv[1]);
    }else if(3 == argc){
        pic_path = std::string(argv[1]);
        pic_pattern = std::string(argv[2]);
    }else{
        LOG(INFO) << "default value";
    }
    std::vector<cv::String> img_names;
    std::vector<cv::Mat> imgs;
    pic_pattern = pic_path + pic_pattern;
    cv::glob(pic_pattern, img_names);
    if(img_names.empty()){
        LOG(INFO) << "no images";
        return -1;
    }
    for(size_t i = 0; i < img_names.size(); ++i){
        cv::Mat img = cv::imread(img_names[i]);
        imgs.push_back(img.clone());
    }
    cv::Mat pano;
    stitchImg(imgs, pano);
    if(!pano.empty()){
        cv::imshow("pano", pano);
        cv::waitKey(0);
    }
    return 0;
}

2.2 示例效果

  • mode = panorama
    image

    CMU场景拼接 1
  • mode=scans
    image

    CMU场景拼接 2

  上面的两组CMU场景对比图说明了PANORAMA与SCANS的区别,前者会将图像进行柱面投影,得到的全景图会有弯曲的现象,而SCANS只有仿射变换,所以拼接图基本都保留了原图的直线平行关系。

3. 简化的拼接

  这一节准备挖一些坑。在看opencv stitcher里面的细节时,先简单模仿实现一下scans模式的拼接,看看拼接的效果。基本思路是:

  • 特征提取与匹配,找到图像间的匹配关系;
  • 估算图像的变换矩阵,以便图像对齐;选取十个匹配程度最高的特征点,绘制这十个特征点,找到正确匹配的三个点估算仿射变换矩阵;
  • 设置一个画布,宽度是所有图像的宽度之和,高度为所有图像高度的最大值,默认值为0
  • 将匹配程度最高的点投影到画布上,作为左右拼接图像的中心
  • 以右边的图像为参考图像,即将左边的图像进行变换然后与右边的图像进行融合

3.1 特征提取

  常用的特征提取主要有SIFT 、SURF、ORB,ORB速度较快,再其他视觉任务中用的也比较多,但是精度没有前两者高。

void featureExtract(const std::vector<cv::Mat> &imgs,
                    std::vector<std::vector<cv::KeyPoint>> &keyPoints,
                    std::vector<cv::Mat> &imageDescs)
{
    keyPoints.clear();
    imageDescs.clear();
    //提取特征点
    int minHessian       = 800;
    cv::Ptr<cv::ORB> orbDetector = cv::ORB::create(minHessian);
    for (int i = 0; i < imgs.size(); ++i) {
        std::vector<cv::KeyPoint> keyPoint;
        //灰度图转换
        cv::Mat image;
        cvtColor(imgs[i], image, cv::COLOR_BGR2GRAY);
        orbDetector->detect(image, keyPoint);
        keyPoints.push_back(keyPoint);
        cv::Mat imageDesc1;
        orbDetector->compute(image, keyPoint, imageDesc1);
        /*需要将imageDesc转成浮点型,不然会出错
       **Unsupported format or combination of formats 
       **in buildIndex using FLANN algorithm
       */
        imageDesc1.convertTo(imageDesc1, CV_32F);
        imageDescs.push_back(imageDesc1.clone());
    }
}

3.2 特征匹配

  这一步根据图像的特征点确定图像之间特征点的配对关系,从而求取变换矩阵H 。此H是对整幅图像进行的变换,现在为了解决一些视差问题,有人在图像上划分网格,然后对每个网格单独计算变换矩阵H。

void featureMatching(const std::vector<cv::Mat> &imgs,
                     const std::vector<std::vector<cv::KeyPoint>> &keyPoints,
                     const std::vector<cv::Mat> &imageDescs,
                     std::vector<std::vector<cv::Point2f>> &optimalMatchePoint)
{
    optimalMatchePoint.clear();
    //获得匹配特征点,并提取最优配对,此处假设是顺序输入,测试使用假设是两张图
    cv::FlannBasedMatcher matcher;
    std::vector<cv::DMatch> matchePoints;
    matcher.match(imageDescs[0], imageDescs[1], matchePoints, cv::Mat());

    sort(matchePoints.begin(), matchePoints.end());//特征点排序
    //获取排在前N个的最优匹配特征点
    std::vector<cv::Point2f> imagePoints1, imagePoints2;
    for (int i = 0; i < MAX_OPTIMAL_POINT_NUM; i++) {
        imagePoints1.push_back(keyPoints[0][matchePoints[i].queryIdx].pt);
        imagePoints2.push_back(keyPoints[1][matchePoints[i].trainIdx].pt);
    }
       optimalMatchePoint.push_back(std::vector<cv::Point2f>{
            imagePoints1[0], imagePoints1[3], imagePoints1[6]});
    optimalMatchePoint.push_back(std::vector<cv::Point2f>{
            imagePoints2[0], imagePoints2[3], imagePoints2[6]});
}

  使用orb特征提取的时候,这里有很多误匹配的点,上面三个点是根据显示出来匹配正确的点,将用来估算仿射变换矩阵H。opencv 内部处理是使用 RANSAC 算法进行估计的,此处我省略了这个步骤。

3.3 估算仿射变换矩阵

  上一步得到了最强匹配的三个点,这一步可以直接计算得到H。在计算之前,先将右边的图像移到画布的右边

void getAffineMat(std::vector<std::vector<cv::Point2f>>& optimalMatchePoint,
                  int left_cols, std::vector<cv::Mat>& Hs)
{
    std::vector<cv::Point2f> newMatchingPt;
    for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
        cv::Point2f pt = optimalMatchePoint[1][i];
        pt.x += left_cols;
        newMatchingPt.push_back(pt);
    }
    //左边图像的变换矩阵,右图的特征点经过移动,左图需要变换到画布上右图的特征点位置
    cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
    //右边图像的变换矩阵,即将右图移到画布右侧
    cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);

    Hs.push_back(homo1);
    Hs.push_back(homo2);
}

3.4 拼接图像

  确定了变换矩阵以后,取最强响应的特征点作为两幅图像的融合中心,中心左右两边分别对应各自两幅图像。这种拼接处理方式非常粗暴,对于只有平移变化拍摄的图像,尚且还能拼接到一起,但是若加上旋转或者拍摄时光心不对起的情况,拼接错位非常严重。另外一点是图像融合,此处直接选用一条分界线作为选取原图像素的依据,过渡不够平滑,也会有错位。

void getPano2(std::vector<cv::Mat> &imgs, const std::vector<cv::Mat> &H, 
			  cv::Point2f &optimalPt, cv::Mat &pano)
{
    //以右边图像为参考,将left的图像经过仿射变换变到与右边图像重合,取最强响应特征点作为两幅图像融合的中心
    //默认的全景图画布尺寸为:
   //	width=left.width + right.width, 
   //	height = std::max(left.height, right.height)
    int pano_width  = imgs[0].cols + imgs[1].cols;
    int pano_height = std::max(imgs[0].rows, imgs[1].rows);
    pano            = cv::Mat::zeros(cv::Size(pano_width, pano_height), CV_8UC3);
    cv::Mat img_trans0, img_trans1;
    img_trans0 = cv::Mat::zeros(pano.size(), CV_8UC3);
    img_trans1 = cv::Mat::zeros(pano.size(), CV_8UC3);
    //原图经过仿射变化后已经位于全景图对应的位置
    cv::warpAffine(imgs[0], img_trans0, H[0], pano.size());
    cv::warpAffine(imgs[1], img_trans1, H[1], pano.size());

    //最强响应特征点
    cv::Mat trans_pt = (cv::Mat_<double>(3, 1) << optimalPt.x, optimalPt.y, 1.0f);
    //最强响应特征点在画布上的位置
    trans_pt = H[0]*trans_pt;

    //确定两幅图像需要选取的区域
    cv::Rect left_roi  = cv::Rect(0, 0, trans_pt.at<double>(0, 0), pano_height);
    cv::Rect right_roi = cv::Rect(trans_pt.at<double>(0, 0), 0,
            pano_width - trans_pt.at<double>(0, 0) + 1, pano_height);
    //将选取的区域像素复制到画布上
    img_trans0(left_roi).copyTo(pano(left_roi));
    img_trans1(right_roi).copyTo(pano(right_roi));
    cv::imshow("pano", pano);
    cv::waitKey(0);
}

int main(int argc, char *argv[])
{
    cv::Mat image01 = cv::imread("data/img/medium11.jpg");
    cv::resize(image01, image01, cv::Size(image01.cols, image01.rows + 1));
    cv::Mat image02 = cv::imread("data/img/medium12.jpg");
    cv::resize(image02, image02, cv::Size(image02.cols, image02.rows + 1));
    std::vector<cv::Mat> imgs = {image01, image02};
    std::vector<std::vector<cv::KeyPoint>> keyPoints;
    std::vector<std::vector<cv::Point2f>> optimalMatchePoint;
    std::vector<cv::Mat> imageDescs;
    featureExtract(imgs, keyPoints, imageDescs);
    featureMatching(imgs, keyPoints, imageDescs, optimalMatchePoint);

    std::vector<cv::Point2f> newMatchingPt;
    for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
        cv::Point2f pt = optimalMatchePoint[1][i];
        pt.x += imgs[0].cols;
        newMatchingPt.push_back(pt);
    }
    cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
    cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);

    std::vector<cv::Mat> Hs = {homo1, homo2};
    cv::Mat pano;
    //getPano1(imgs, Hs, pano);
    getPano2(imgs, Hs, optimalMatchePoint[0][0], pano);
    return 0;
}

3.5 简化拼接效果

  • 只有平移变化的图像拼接效果
    image
图 3.5.1 雪地场景仿射变化拼接
  • 有旋转变化的图像拼接
    image
图 3.5.2 CMU仿射变化拼接

  算不上啥效果吧,图3.5.2可以清晰的看到错位,而且整个拼接图左侧有明显的倾斜,左侧红框为左图区域,中间绘制的红线表示左右图分界线。错位有多方面原因,没有好的融合过渡算法,没有考虑到相机的旋转变化,拼接缝位置找的不好。画面有倾斜,不够自然,则是单一选择某张图片作为参考图片,将其它图像变换到其所在坐标系导致。

4. opencv stitcher 模块

  opencv在示例代码中有提供 stitching_detailed.cpp 示例,里面包含了各个模块的实现步骤。我们在实际使用的时候一般都是要求实时拼接,直接调接口基本是没法达到这个要求的,特别是在arm嵌入式端,这就需要我们弄清楚实现细节找到优化点。我这里只对 stitching_detailed.cpp 中的部分细节感兴趣,所以将耗时统计、缩放选找融合区域这些都去掉了。

4.1 参数预览

opencv的stitching_detailed.cpp中有非常多的配置参数,  由图1.1 opencv 拼接流程图可知,opencv stitcher中的主要步骤有:

  • registration
    • 特征提取
    • 特征匹配
    • 图像配准
    • 相机内参估算
    • 波形矫正
  • compositing
    • 图像变换
    • 光照补偿
    • 查找拼接缝
    • 图像融合

  registration部分主要是用来获取图像间的匹配关系,估算相机的内外参,并使用BA算法对参数进行优化,此模块主要是对图像的拼接顺序和变换矩阵估算。compositing部分则是在获取到参数以后进行图像变换、融合,并使用光照补偿等算法进行画面一致性的改善。参数预览如下:

static void printUsage(char** argv)
{
    cout <<
         "Rotation model images stitcher.\n\n"
         << argv[0] << " img1 img2 [...imgN] [flags]\n\n"
                       "Flags:\n"
                       "  --preview\n"
                       "      Run stitching in the preview mode. Works faster than usual mode,\n"
                       "      but output image will have lower resolution.\n"
                       "  --try_cuda (yes|no)\n"
                       "      Try to use CUDA. The default value is 'no'. All default values\n"
                       "      are for CPU mode.\n"
                       "\nMotion Estimation Flags:\n"
                       "  --work_megapix <float>\n"
                       "      Resolution for image registration step. The default is 0.6 Mpx.\n"
                       "  --features (surf|orb|sift|akaze)\n"
                       "      Type of features used for images matching.\n"
                       "      The default is surf if available, orb otherwise.\n"
                       "  --matcher (homography|affine)\n"
                       "      Matcher used for pairwise image matching.\n"
                       "  --estimator (homography|affine)\n"
                       "      Type of estimator used for transformation estimation.\n"
                       "  --match_conf <float>\n"
                       "      Confidence for feature matching step. The default is 0.65 for surf and 0.3 for orb.\n"
                       "  --conf_thresh <float>\n"
                       "      Threshold for two images are from the same panorama confidence.\n"
                       "      The default is 1.0.\n"
                       "  --ba (no|reproj|ray|affine)\n"
                       "      Bundle adjustment cost function. The default is ray.\n"
                       "  --ba_refine_mask (mask)\n"
                       "      Set refinement mask for bundle adjustment. It looks like 'x_xxx',\n"
                       "      where 'x' means refine respective parameter and '_' means don't\n"
                       "      refine one, and has the following format:\n"
                       "      <fx><skew><ppx><aspect><ppy>. The default mask is 'xxxxx'. If bundle\n"
                       "      adjustment doesn't support estimation of selected parameter then\n"
                       "      the respective flag is ignored.\n"
                       "  --wave_correct (no|horiz|vert)\n"
                       "      Perform wave effect correction. The default is 'horiz'.\n"
                       "  --save_graph <file_name>\n"
                       "      Save matches graph represented in DOT language to <file_name> file.\n"
                       "      Labels description: Nm is number of matches, Ni is number of inliers,\n"
                       "      C is confidence.\n"
                       "\nCompositing Flags:\n"
                       "  --warp (affine|plane|cylindrical|spherical|fisheye|stereographic|"
                       "     compressedPlaneA2B1|compressedPlaneA1.5B1|compressedPlanePortraitA2B1|"
                       "      compressedPlanePortraitA1.5B1|paniniA2B1|paniniA1.5B1|paniniPortraitA2B1|"
                       "      paniniPortraitA1.5B1|mercator|transverseMercator)\n"
                       "      Warp surface type. The default is 'spherical'.\n"
                       "  --seam_megapix <float>\n"
                       "      Resolution for seam estimation step. The default is 0.1 Mpx.\n"
                       "  --seam (no|voronoi|gc_color|gc_colorgrad)\n"
                       "      Seam estimation method. The default is 'gc_color'.\n"
                       "  --compose_megapix <float>\n"
                       "      Resolution for compositing step. Use -1 for original resolution.\n"
                       "      The default is -1.\n"
                       "  --expos_comp (no|gain|gain_blocks|channels|channels_blocks)\n"
                       "      Exposure compensation method. The default is 'gain_blocks'.\n"
                       "  --expos_comp_nr_feeds <int>\n"
                       "      Number of exposure compensation feed. The default is 1.\n"
                       "  --expos_comp_nr_filtering <int>\n"
                       "      Number of filtering iterations of the exposure compensation gains.\n"
                       "      Only used when using a block exposure compensation method.\n"
                       "      The default is 2.\n"
                       "  --expos_comp_block_size <int>\n"
                       "      BLock size in pixels used by the exposure compensator.\n"
                       "      Only used when using a block exposure compensation method.\n"
                       "      The default is 32.\n"
                       "  --blend (no|feather|multiband)\n"
                       "      Blending method. The default is 'multiband'.\n"
                       "  --blend_strength <float>\n"
                       "      Blending strength from [0,100] range. The default is 5.\n"
                       "  --output <result_img>\n"
                       "      The default is 'result.jpg'.\n"
                       "  --timelapse (as_is|crop) \n"
                       "      Output warped images separately as frames of a time lapse movie, "
                       "      with 'fixed_' prepended to input file names.\n"
                       "  --rangewidth <int>\n"
                       "      uses range_width to limit number of images to match with.\n";
}

4.2 Motion Estimation Flags 参数含义

  • work_megapix :在特征提取等 registration过程中,为了减小耗时,会将图像进行缩放,这就需要一个缩放比例;
  • features : 表示选用的提取的特征,(SURF|ORB|SIFT|akaze)
  • matcher : 特征匹配方法,(homography | affine),单应性变换与仿射变换方法,分别对应BestOf2NearestMatcher、AffineBestOf2NearestMatcher,后者会找到两幅图仿射变换的最佳匹配点;
  • estimator : (homography | affine),相机参数评估方法;
  • match_conf : 浮点型数据,表示匹配阶段内点判断的阈值;
  • conf_thresh : 两幅图片是来自同一全景的阈值:
  • ba : BA优化相机参数的代价函数,(no|reproj|ray|affine);
  • ba_refine_mask : BA优化的时候,可以固定某些参数不动,通过指定mask实现。'x'表示需要优化,'_'表示固定参数,对应的顺序是fx,skew,ppx,aspect,ppy;
  • wave_correct : 波形矫正标志,有(no|horiz|vert)三种类型,可以将拼接图像约束在水平方向,或者垂直方向,避免出现“大鹏展翅”的情况;
    image
  • save_graph : 以DOT语言格式保存图像之间的匹配关系;

4.3 Compositing Flags 参数含义

  • warp : 图像变换方法,包括球面投影、柱面投影等,opencv支持的投影方法比较多;
  • seam_megapix : 寻找拼接缝的时候,会将图像进行缩放,此参数与 work_scale 可以用来控制缩放比例;
  • seam : 接缝寻找的方法;
  • compose_megapix : 预览时用于设置拼接过程中以及拼接图的分辨率;
  • expos_comp : 光照补偿方法;
  • blend : 图像融合方法,常用的有(feather|multibend);

4.4 小结

  如果输入的图片数量、分辨率不是太大,源码中一些分辨率缩放的步骤,还有耗时测试的步骤都可以去除,以简化拼接实现流程,在实际的拼接应用过程中,一般也不会直接采用这个流程进行实时拼接。流程中每一个配置参数涉及的算法原理有助于我们理解更多细节,也是后面我想逐步介绍的内容

posted @ 2022-01-25 17:10  wangnb  阅读(2364)  评论(0编辑  收藏  举报