图像拼接1 opencv stitcher
1. 绪言
图像拼接算是传统计算机视觉领域集大成者的一个方向,涉及的步骤主要有:特征点提取、特征匹配、图像配准、图像融合等。如下图1.1 是opencv图像拼接的流程图,图像拼接方向涉及的研究方向众多,如特征提取方向就有常用的SIFT、SURF、ORB等,这些特征提取方法在slam方向也有非常广的应用,所以有余力的话弄清楚这些实现细节,对建立自身的知识体系还是非常有必要的。
2. opencv stitcher
opencv当中有直接封装的拼接类 Stitcher,基本是调用一个接口就可以完成所有拼接步骤,得到拼接图像。测试用例图片参考。
2.1 示例代码
下面是调用接口的示例代码:
#include "opencv2/opencv.hpp"
#include "logging.hpp"
#include <string>
void stitchImg(const std::vector<cv::Mat>& imgs, cv::Mat& pano)
{
//设置拼接图像 warp 模式,有PANORAMA与SCANS两种模式
//panorama: 图像会投影到球面或者柱面进行拼接
//scans: 默认没有光照补偿与柱面投影,直接经过仿射变换进行拼接
cv::Stitcher::Mode mode = cv::Stitcher::PANORAMA;
cv::Ptr<cv::Stitcher> stitcher = cv::Stitcher::create(mode);
cv::Stitcher::Status status = stitcher->stitch(imgs, pano);
if(cv::Stitcher::OK != status){
LOG(INFO) << "failed to stitch images, err code: " << (int)status;
}
}
int main(int argc, char* argv[])
{
std::string pic_path = "data/img/*";
std::string pic_pattern = ".jpg";
if(2 == argc){
pic_path = std::string(argv[1]);
}else if(3 == argc){
pic_path = std::string(argv[1]);
pic_pattern = std::string(argv[2]);
}else{
LOG(INFO) << "default value";
}
std::vector<cv::String> img_names;
std::vector<cv::Mat> imgs;
pic_pattern = pic_path + pic_pattern;
cv::glob(pic_pattern, img_names);
if(img_names.empty()){
LOG(INFO) << "no images";
return -1;
}
for(size_t i = 0; i < img_names.size(); ++i){
cv::Mat img = cv::imread(img_names[i]);
imgs.push_back(img.clone());
}
cv::Mat pano;
stitchImg(imgs, pano);
if(!pano.empty()){
cv::imshow("pano", pano);
cv::waitKey(0);
}
return 0;
}
2.2 示例效果
-
mode = panorama
CMU场景拼接 1 -
mode=scans
CMU场景拼接 2
上面的两组CMU场景对比图说明了PANORAMA与SCANS的区别,前者会将图像进行柱面投影,得到的全景图会有弯曲的现象,而SCANS只有仿射变换,所以拼接图基本都保留了原图的直线平行关系。
3. 简化的拼接
这一节准备挖一些坑。在看opencv stitcher里面的细节时,先简单模仿实现一下scans模式的拼接,看看拼接的效果。基本思路是:
- 特征提取与匹配,找到图像间的匹配关系;
- 估算图像的变换矩阵,以便图像对齐;选取十个匹配程度最高的特征点,绘制这十个特征点,找到正确匹配的三个点估算仿射变换矩阵;
- 设置一个画布,宽度是所有图像的宽度之和,高度为所有图像高度的最大值,默认值为0
- 将匹配程度最高的点投影到画布上,作为左右拼接图像的中心
- 以右边的图像为参考图像,即将左边的图像进行变换然后与右边的图像进行融合
3.1 特征提取
常用的特征提取主要有SIFT 、SURF、ORB,ORB速度较快,再其他视觉任务中用的也比较多,但是精度没有前两者高。
void featureExtract(const std::vector<cv::Mat> &imgs,
std::vector<std::vector<cv::KeyPoint>> &keyPoints,
std::vector<cv::Mat> &imageDescs)
{
keyPoints.clear();
imageDescs.clear();
//提取特征点
int minHessian = 800;
cv::Ptr<cv::ORB> orbDetector = cv::ORB::create(minHessian);
for (int i = 0; i < imgs.size(); ++i) {
std::vector<cv::KeyPoint> keyPoint;
//灰度图转换
cv::Mat image;
cvtColor(imgs[i], image, cv::COLOR_BGR2GRAY);
orbDetector->detect(image, keyPoint);
keyPoints.push_back(keyPoint);
cv::Mat imageDesc1;
orbDetector->compute(image, keyPoint, imageDesc1);
/*需要将imageDesc转成浮点型,不然会出错
**Unsupported format or combination of formats
**in buildIndex using FLANN algorithm
*/
imageDesc1.convertTo(imageDesc1, CV_32F);
imageDescs.push_back(imageDesc1.clone());
}
}
3.2 特征匹配
这一步根据图像的特征点确定图像之间特征点的配对关系,从而求取变换矩阵H 。此H是对整幅图像进行的变换,现在为了解决一些视差问题,有人在图像上划分网格,然后对每个网格单独计算变换矩阵H。
void featureMatching(const std::vector<cv::Mat> &imgs,
const std::vector<std::vector<cv::KeyPoint>> &keyPoints,
const std::vector<cv::Mat> &imageDescs,
std::vector<std::vector<cv::Point2f>> &optimalMatchePoint)
{
optimalMatchePoint.clear();
//获得匹配特征点,并提取最优配对,此处假设是顺序输入,测试使用假设是两张图
cv::FlannBasedMatcher matcher;
std::vector<cv::DMatch> matchePoints;
matcher.match(imageDescs[0], imageDescs[1], matchePoints, cv::Mat());
sort(matchePoints.begin(), matchePoints.end());//特征点排序
//获取排在前N个的最优匹配特征点
std::vector<cv::Point2f> imagePoints1, imagePoints2;
for (int i = 0; i < MAX_OPTIMAL_POINT_NUM; i++) {
imagePoints1.push_back(keyPoints[0][matchePoints[i].queryIdx].pt);
imagePoints2.push_back(keyPoints[1][matchePoints[i].trainIdx].pt);
}
optimalMatchePoint.push_back(std::vector<cv::Point2f>{
imagePoints1[0], imagePoints1[3], imagePoints1[6]});
optimalMatchePoint.push_back(std::vector<cv::Point2f>{
imagePoints2[0], imagePoints2[3], imagePoints2[6]});
}
使用orb特征提取的时候,这里有很多误匹配的点,上面三个点是根据显示出来匹配正确的点,将用来估算仿射变换矩阵H。opencv 内部处理是使用 RANSAC 算法进行估计的,此处我省略了这个步骤。
3.3 估算仿射变换矩阵
上一步得到了最强匹配的三个点,这一步可以直接计算得到H。在计算之前,先将右边的图像移到画布的右边
void getAffineMat(std::vector<std::vector<cv::Point2f>>& optimalMatchePoint,
int left_cols, std::vector<cv::Mat>& Hs)
{
std::vector<cv::Point2f> newMatchingPt;
for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
cv::Point2f pt = optimalMatchePoint[1][i];
pt.x += left_cols;
newMatchingPt.push_back(pt);
}
//左边图像的变换矩阵,右图的特征点经过移动,左图需要变换到画布上右图的特征点位置
cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
//右边图像的变换矩阵,即将右图移到画布右侧
cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);
Hs.push_back(homo1);
Hs.push_back(homo2);
}
3.4 拼接图像
确定了变换矩阵以后,取最强响应的特征点作为两幅图像的融合中心,中心左右两边分别对应各自两幅图像。这种拼接处理方式非常粗暴,对于只有平移变化拍摄的图像,尚且还能拼接到一起,但是若加上旋转或者拍摄时光心不对起的情况,拼接错位非常严重。另外一点是图像融合,此处直接选用一条分界线作为选取原图像素的依据,过渡不够平滑,也会有错位。
void getPano2(std::vector<cv::Mat> &imgs, const std::vector<cv::Mat> &H,
cv::Point2f &optimalPt, cv::Mat &pano)
{
//以右边图像为参考,将left的图像经过仿射变换变到与右边图像重合,取最强响应特征点作为两幅图像融合的中心
//默认的全景图画布尺寸为:
// width=left.width + right.width,
// height = std::max(left.height, right.height)
int pano_width = imgs[0].cols + imgs[1].cols;
int pano_height = std::max(imgs[0].rows, imgs[1].rows);
pano = cv::Mat::zeros(cv::Size(pano_width, pano_height), CV_8UC3);
cv::Mat img_trans0, img_trans1;
img_trans0 = cv::Mat::zeros(pano.size(), CV_8UC3);
img_trans1 = cv::Mat::zeros(pano.size(), CV_8UC3);
//原图经过仿射变化后已经位于全景图对应的位置
cv::warpAffine(imgs[0], img_trans0, H[0], pano.size());
cv::warpAffine(imgs[1], img_trans1, H[1], pano.size());
//最强响应特征点
cv::Mat trans_pt = (cv::Mat_<double>(3, 1) << optimalPt.x, optimalPt.y, 1.0f);
//最强响应特征点在画布上的位置
trans_pt = H[0]*trans_pt;
//确定两幅图像需要选取的区域
cv::Rect left_roi = cv::Rect(0, 0, trans_pt.at<double>(0, 0), pano_height);
cv::Rect right_roi = cv::Rect(trans_pt.at<double>(0, 0), 0,
pano_width - trans_pt.at<double>(0, 0) + 1, pano_height);
//将选取的区域像素复制到画布上
img_trans0(left_roi).copyTo(pano(left_roi));
img_trans1(right_roi).copyTo(pano(right_roi));
cv::imshow("pano", pano);
cv::waitKey(0);
}
int main(int argc, char *argv[])
{
cv::Mat image01 = cv::imread("data/img/medium11.jpg");
cv::resize(image01, image01, cv::Size(image01.cols, image01.rows + 1));
cv::Mat image02 = cv::imread("data/img/medium12.jpg");
cv::resize(image02, image02, cv::Size(image02.cols, image02.rows + 1));
std::vector<cv::Mat> imgs = {image01, image02};
std::vector<std::vector<cv::KeyPoint>> keyPoints;
std::vector<std::vector<cv::Point2f>> optimalMatchePoint;
std::vector<cv::Mat> imageDescs;
featureExtract(imgs, keyPoints, imageDescs);
featureMatching(imgs, keyPoints, imageDescs, optimalMatchePoint);
std::vector<cv::Point2f> newMatchingPt;
for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
cv::Point2f pt = optimalMatchePoint[1][i];
pt.x += imgs[0].cols;
newMatchingPt.push_back(pt);
}
cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);
std::vector<cv::Mat> Hs = {homo1, homo2};
cv::Mat pano;
//getPano1(imgs, Hs, pano);
getPano2(imgs, Hs, optimalMatchePoint[0][0], pano);
return 0;
}
3.5 简化拼接效果
- 只有平移变化的图像拼接效果
- 有旋转变化的图像拼接
算不上啥效果吧,图3.5.2可以清晰的看到错位,而且整个拼接图左侧有明显的倾斜,左侧红框为左图区域,中间绘制的红线表示左右图分界线。错位有多方面原因,没有好的融合过渡算法,没有考虑到相机的旋转变化,拼接缝位置找的不好。画面有倾斜,不够自然,则是单一选择某张图片作为参考图片,将其它图像变换到其所在坐标系导致。
4. opencv stitcher 模块
opencv在示例代码中有提供 stitching_detailed.cpp 示例,里面包含了各个模块的实现步骤。我们在实际使用的时候一般都是要求实时拼接,直接调接口基本是没法达到这个要求的,特别是在arm嵌入式端,这就需要我们弄清楚实现细节找到优化点。我这里只对 stitching_detailed.cpp 中的部分细节感兴趣,所以将耗时统计、缩放选找融合区域这些都去掉了。
4.1 参数预览
opencv的stitching_detailed.cpp中有非常多的配置参数, 由图1.1 opencv 拼接流程图可知,opencv stitcher中的主要步骤有:
- registration
- 特征提取
- 特征匹配
- 图像配准
- 相机内参估算
- 波形矫正
- compositing
- 图像变换
- 光照补偿
- 查找拼接缝
- 图像融合
registration部分主要是用来获取图像间的匹配关系,估算相机的内外参,并使用BA算法对参数进行优化,此模块主要是对图像的拼接顺序和变换矩阵估算。compositing部分则是在获取到参数以后进行图像变换、融合,并使用光照补偿等算法进行画面一致性的改善。参数预览如下:
static void printUsage(char** argv)
{
cout <<
"Rotation model images stitcher.\n\n"
<< argv[0] << " img1 img2 [...imgN] [flags]\n\n"
"Flags:\n"
" --preview\n"
" Run stitching in the preview mode. Works faster than usual mode,\n"
" but output image will have lower resolution.\n"
" --try_cuda (yes|no)\n"
" Try to use CUDA. The default value is 'no'. All default values\n"
" are for CPU mode.\n"
"\nMotion Estimation Flags:\n"
" --work_megapix <float>\n"
" Resolution for image registration step. The default is 0.6 Mpx.\n"
" --features (surf|orb|sift|akaze)\n"
" Type of features used for images matching.\n"
" The default is surf if available, orb otherwise.\n"
" --matcher (homography|affine)\n"
" Matcher used for pairwise image matching.\n"
" --estimator (homography|affine)\n"
" Type of estimator used for transformation estimation.\n"
" --match_conf <float>\n"
" Confidence for feature matching step. The default is 0.65 for surf and 0.3 for orb.\n"
" --conf_thresh <float>\n"
" Threshold for two images are from the same panorama confidence.\n"
" The default is 1.0.\n"
" --ba (no|reproj|ray|affine)\n"
" Bundle adjustment cost function. The default is ray.\n"
" --ba_refine_mask (mask)\n"
" Set refinement mask for bundle adjustment. It looks like 'x_xxx',\n"
" where 'x' means refine respective parameter and '_' means don't\n"
" refine one, and has the following format:\n"
" <fx><skew><ppx><aspect><ppy>. The default mask is 'xxxxx'. If bundle\n"
" adjustment doesn't support estimation of selected parameter then\n"
" the respective flag is ignored.\n"
" --wave_correct (no|horiz|vert)\n"
" Perform wave effect correction. The default is 'horiz'.\n"
" --save_graph <file_name>\n"
" Save matches graph represented in DOT language to <file_name> file.\n"
" Labels description: Nm is number of matches, Ni is number of inliers,\n"
" C is confidence.\n"
"\nCompositing Flags:\n"
" --warp (affine|plane|cylindrical|spherical|fisheye|stereographic|"
" compressedPlaneA2B1|compressedPlaneA1.5B1|compressedPlanePortraitA2B1|"
" compressedPlanePortraitA1.5B1|paniniA2B1|paniniA1.5B1|paniniPortraitA2B1|"
" paniniPortraitA1.5B1|mercator|transverseMercator)\n"
" Warp surface type. The default is 'spherical'.\n"
" --seam_megapix <float>\n"
" Resolution for seam estimation step. The default is 0.1 Mpx.\n"
" --seam (no|voronoi|gc_color|gc_colorgrad)\n"
" Seam estimation method. The default is 'gc_color'.\n"
" --compose_megapix <float>\n"
" Resolution for compositing step. Use -1 for original resolution.\n"
" The default is -1.\n"
" --expos_comp (no|gain|gain_blocks|channels|channels_blocks)\n"
" Exposure compensation method. The default is 'gain_blocks'.\n"
" --expos_comp_nr_feeds <int>\n"
" Number of exposure compensation feed. The default is 1.\n"
" --expos_comp_nr_filtering <int>\n"
" Number of filtering iterations of the exposure compensation gains.\n"
" Only used when using a block exposure compensation method.\n"
" The default is 2.\n"
" --expos_comp_block_size <int>\n"
" BLock size in pixels used by the exposure compensator.\n"
" Only used when using a block exposure compensation method.\n"
" The default is 32.\n"
" --blend (no|feather|multiband)\n"
" Blending method. The default is 'multiband'.\n"
" --blend_strength <float>\n"
" Blending strength from [0,100] range. The default is 5.\n"
" --output <result_img>\n"
" The default is 'result.jpg'.\n"
" --timelapse (as_is|crop) \n"
" Output warped images separately as frames of a time lapse movie, "
" with 'fixed_' prepended to input file names.\n"
" --rangewidth <int>\n"
" uses range_width to limit number of images to match with.\n";
}
4.2 Motion Estimation Flags 参数含义
- work_megapix :在特征提取等 registration过程中,为了减小耗时,会将图像进行缩放,这就需要一个缩放比例;
- features : 表示选用的提取的特征,(SURF|ORB|SIFT|akaze)
- matcher : 特征匹配方法,(homography | affine),单应性变换与仿射变换方法,分别对应BestOf2NearestMatcher、AffineBestOf2NearestMatcher,后者会找到两幅图仿射变换的最佳匹配点;
- estimator : (homography | affine),相机参数评估方法;
- match_conf : 浮点型数据,表示匹配阶段内点判断的阈值;
- conf_thresh : 两幅图片是来自同一全景的阈值:
- ba : BA优化相机参数的代价函数,(no|reproj|ray|affine);
- ba_refine_mask : BA优化的时候,可以固定某些参数不动,通过指定mask实现。'x'表示需要优化,'_'表示固定参数,对应的顺序是fx,skew,ppx,aspect,ppy;
- wave_correct : 波形矫正标志,有(no|horiz|vert)三种类型,可以将拼接图像约束在水平方向,或者垂直方向,避免出现“大鹏展翅”的情况;
- save_graph : 以DOT语言格式保存图像之间的匹配关系;
4.3 Compositing Flags 参数含义
- warp : 图像变换方法,包括球面投影、柱面投影等,opencv支持的投影方法比较多;
- seam_megapix : 寻找拼接缝的时候,会将图像进行缩放,此参数与 work_scale 可以用来控制缩放比例;
- seam : 接缝寻找的方法;
- compose_megapix : 预览时用于设置拼接过程中以及拼接图的分辨率;
- expos_comp : 光照补偿方法;
- blend : 图像融合方法,常用的有(feather|multibend);
4.4 小结
如果输入的图片数量、分辨率不是太大,源码中一些分辨率缩放的步骤,还有耗时测试的步骤都可以去除,以简化拼接实现流程,在实际的拼接应用过程中,一般也不会直接采用这个流程进行实时拼接。流程中每一个配置参数涉及的算法原理有助于我们理解更多细节,也是后面我想逐步介绍的内容