从图像融合谈起

图像融合将多个源图像(可能来自不同的传感器、不同视角、不同时间点)的信息整合到单一的输出融合图像中，这一过程旨在通过有效合并各源图像中的互补、冗余或独特信息，生成一个既包含所有关键信息又具有增强特性的融合图像。根据不同的融合任务需要制定不同的准则，然后对每张图像计算一个权重图。根据原始图像和融合权重图像就可以根据融合算法计算融合结果。图像融合算法要保证结果自然过渡，清晰度不出现丢失。其余如色彩自然度、细节保留、边缘过渡、伪影抑制等方面也要尽善尽美。

假如需要融合的图像有两幅图像，例如同一场景的红外图像和可见光图像，通常红外图像反映场景的温度辐射信息，而可见光图像反映场景的细节纹理信息，因此如果需要一幅融合的图像既要显示整个场景的温度信息，同时要兼顾场景的细节，则需要将两者进行融合。通常，两者图像的简单权重相加就可以得到不错的融合效果，因为就像日本料理的基本原则一样，对食材简单烹饪，甚至是生吃，才能保留食材最原始的鲜美的味道。

下面的公式即为按照权重相加的融合方式，对于参与融合的两幅图像，对应像素灰度值按照权重比例进行相加，其中参数alpha控制两者之间的比例，最终输出融合结果：

这样就完成了两幅不同成像方式的图像的简单融合，即两幅图像的简单加权融合，就可以达到一个很不错的融合效果。

但是，如果仅仅是将两幅图像的简单加权融合的结果实际应用到具体的项目中时，难免带来融合结果图像中细节的模糊，因此有必要得到不同成像来源图像中的细节信息。通常的简单思路是，采用一些图像模糊的算法对原始图像做图像模糊，然后使用原始图像与模糊图像相减，相减得到的图像就可以认为是细节图像，photoshop中的USM锐化就是采用这样的思路，即使用高斯模糊对图像做模糊处理，然后使用原始图像与模糊图像相减，得到边缘部分等细节信息。下面以两篇论文为例，介绍一些简单的图像融合算法，在达到较好的融合的效果的同时，能够保证算法的实时有效性。

Fusion of Infrared and Visible Sensor Images Based on Anisotropic Diffusion and Karhunen-Loeve Transform

Durga Prasad Bavirisetti and Ravindra Dhuli

文章使用各向异性扩散算法对两幅原始图像做模糊，得到两幅图像对应的base layers，然后将两幅原始图像与其对应的base layers图像相减，得到detail layers，分别将base layers和detail layers做融合，分别得到base layers fusion和detail layers fusion，最后将base layers fusion和detail layers fusion做线性叠加(linear combination)，即可得到最终的融合结果，算法流程图如下图所示：

文中选择各向异性扩散模糊算法对原始图像做模糊操作，但是因为各向异性模糊算法需要多次迭代，才能得到一个比较好的模糊效果，因此实时性较差，而且此处想要得到的是图像的模糊效果，高斯模糊就是一个非常不错的图像模糊算法，所以在此使用高斯模糊代替各向异性模糊。

设原始两幅待融合图像为，n=1,2，分别对两幅图像做高斯模糊，得到两幅图像对应的base layers为，将两者相减，可以得到两幅图像对应的detail layers为：

Base Layer Fusion Using Weighted Superposition：对于每一幅base layer图像，其前面均放置一个系数，然后将加权的base layer图像相加，即可得到base layer的融合图像：

若源图像分别为一幅红外图像和一幅可见光图像，则简单地设置两个系数均为0.5即可：

以一组红外图像和可见光图像作为源数据，示例如下：

使用高斯模糊，分别对两幅图像做模糊操作，得到的模糊效果如下：

将两幅源图像与模糊得到的图像相减，可以得到两幅图像对应的细节图像：

Detail Layer Fusion Based on KL-Transform：不同于base layers融合的系数是两者都是0.5的简单加权平均，使用KL变换确定detail layers融合时的加权系数：(1)将两幅图像的细节层detail layers变为m*n(m和n分别为图像的长和宽方向上的像素个数)行，两列的矩阵；(2)计算矩阵的协方差矩阵；(3)计算协方差矩阵的两个特征值和，并得到这两个特征值对应的特征向量和；(4)取特征值和两者之间的较大值作为，且其对应的特征向量为；(5)中有两个元素，分别两个元素数值与两者数值之和相除，即可得到detail layers融合时的加权系数和；(6)将两个系数分别与两个细节层相乘并相加，即可得到细节层的融合结果。

现在将本文算法和简单的源图像加权相加的效果进行对比，说明本文算法在细节处理方面的有效性：左图为源图像的简单加权融合结果，右图为本文算法结果。对比来看，本文算法在整体清晰度和对比度上更具优势，在房屋边缘等一些细节处理上凸显了细节边缘，使其更加清晰可辨。

使用C/ C++实线本文算法的代码如下(为了计算矩阵的特征值和特征向量，需要配置Eigen库；为了显示融合图像效果，需要配置OpenCV库)：

#include <iostream>
#include <stdlib.h>
#include <assert.h>
#include <time.h>
#include <cmath>
#include <algorithm>
#include <opencv2/opencv.hpp>

#include <Eigen/Dense>
#include <Eigen/Eigenvalues>

using namespace std;
using namespace cv;
using namespace Eigen;

#define PI 3.14159265358979323846

/********************************************* the functions for gaussian filtering *************************************************/
float gaussian(float x, float sigma);
void gaussian_kernel(float* kernel, int size, float sigma);
void gaussian_filter(float* input, float* output, int width, int height, int mask_size, float sigma);
/********************************************* the functions for gaussian filtering *************************************************/

/*********************************************** the functions for KL-transform ***************************************************/
float mean(float* arr, int size);
float* covariance(float** data, int rows, int cols, float* means);
/*********************************************** the functions for KL-transform ***************************************************/

/******************************************** the functions for calculating layers ************************************************/
void array_add(float* a_array, float* b_array, float* output, int width, int height);
void array_add_weighted(float* a_array, float* b_array, float* output, int width, int height, float a_weight, float b_weight);
void array_sub(float* a_array, float* b_array, float* output, int width, int height);
void array_sub_abs(float* a_array, float* b_array, float* output, int width, int height);
/******************************************** the functions for calculating layers ************************************************/

//高斯函数
float gaussian(float x, float sigma) {
    return exp(-x * x / (2 * sigma * sigma)) / (sigma * sqrt(2 * PI));
}

//高斯模板生成
void gaussian_kernel(float* kernel, int size, float sigma) {
    int center = size / 2; float sum = 0.0;

    for (int i = 0; i < size; i++) {
        kernel[i] = gaussian(i - center, sigma);
        sum += kernel[i];
    }

    for (int i = 0; i < size; i++) {
        kernel[i] /= sum;
    }
}

//高斯模糊
void gaussian_filter(float* input, float* output, int width, int height, int mask_size, float sigma)
{
    /* ensure the mask_size be odd number greater than 3 */
    if (mask_size < 3 || mask_size % 2 == 0) {
        std::cout << "invalid mask size!" << std::endl;
        return;
    }

    int radius = (mask_size - 1) / 2; int center_pos = mask_size / 2;

    float* kernel = (float*)malloc(mask_size * sizeof(float));
    gaussian_kernel(kernel, mask_size, sigma); //生成高斯模板

    float* temp = (float*)malloc(width * height * sizeof(float)); //临时数组

    //水平方向模糊
    for (int h = 0; h < height; h++)
    {
        for (int w = 0; w < width; w++)
        {
            float row_sum = 0.0;
            for (int i = -radius; i <= radius; i++) {
                int w_shift = w + i;
                if (w_shift < 0) w_shift = 0;
                if (w_shift > width - 1) w_shift = width - 1;
                row_sum += kernel[center_pos + i] * input[h * width + w_shift];
            }
            temp[h * width + w] = row_sum;
        }
    }

    //垂直方向模糊
    for (int h = 0; h < height; h++)
    {
        for (int w = 0; w < width; w++)
        {
            float col_sum = 0.0;
            for (int i = -radius; i <= radius; i++) {
                int h_shift = h + i;
                if (h_shift < 0) h_shift = 0;
                if (h_shift > height - 1) h_shift = height - 1;
                col_sum += kernel[center_pos + i] * temp[h_shift * width + w];
            }
            output[h * width + w] = col_sum;
        }
    }

    free(kernel);
    free(temp);
}

void array_add(float* a_array, float* b_array, float* output, int width, int height)
{
    for (int h = 0; h < height; h++)
    {
        float* a_arrayPtr = a_array + h * width;
        float* b_arrayPtr = b_array + h * width;
        float* outputPtr = output + h * width;

        for (int w = 0; w < width; w++) {
            *outputPtr = (*a_arrayPtr) + (*b_arrayPtr);
            a_arrayPtr++; b_arrayPtr++; outputPtr++;
        }
    }
}

void array_add_weighted(float* a_array, float* b_array, float* output, int width, int height, float a_weight, float b_weight)
{
    for (int h = 0; h < height; h++)
    {
        float* a_arrayPtr = a_array + h * width;
        float* b_arrayPtr = b_array + h * width;
        float* outputPtr = output + h * width;

        for (int w = 0; w < width; w++) {
            *outputPtr = (*a_arrayPtr) * a_weight + (*b_arrayPtr) * b_weight;
            a_arrayPtr++; b_arrayPtr++; outputPtr++;
        }
    }
}

void array_sub(float* a_array, float* b_array, float* output, int width, int height)
{
    for (int h = 0; h < height; h++)
    {
        float* a_arrayPtr = a_array + h * width;
        float* b_arrayPtr = b_array + h * width;
        float* outputPtr = output + h * width;

        for (int w = 0; w < width; w++) {
            *outputPtr = (*a_arrayPtr) - (*b_arrayPtr);
            a_arrayPtr++; b_arrayPtr++; outputPtr++;
        }
    }
}


void array_sub_abs(float* a_array, float* b_array, float* output, int width, int height)
{
    for (int h = 0; h < height; h++)
    {
        float* a_arrayPtr = a_array + h * width;
        float* b_arrayPtr = b_array + h * width;
        float* outputPtr = output + h * width;

        for (int w = 0; w < width; w++) {
            //abs求取整数绝对值，fabs求取浮点数绝对值
            *outputPtr = fabs((*a_arrayPtr) - (*b_arrayPtr));
            a_arrayPtr++; b_arrayPtr++;
            outputPtr++;
        }

    }
}


//计算一维数组的均值
float mean(float* arr, int size)
{
    float sum = 0.0;
    for (int i = 0; i < size; i++) {
        sum += arr[i];
    }
    return sum / size;
}

//计算一个阵列的协方差矩阵
float* covariance(float** data, int rows, int cols, float* means)
{
    float* cov = (float*)malloc(sizeof(float) * cols * cols);
    for (int i = 0; i < cols; i++)
    {
        for (int j = 0; j < cols; j++)
        {
            float sum = 0.0;
            for (int k = 0; k < rows; k++) {
                sum += (data[k][i] - means[i]) * (data[k][j] - means[j]);
            }
            cov[i * cols + j] = sum / rows;
        }
    }
    return cov;
}


int main()
{
    // 读取红外灰度图像和可见光灰度图像
    cv::Mat image_IR = cv::imread("Marne_IR.png", cv::IMREAD_GRAYSCALE);
    cv::Mat image_Vis = cv::imread("Marne_Vis.png", cv::IMREAD_GRAYSCALE);

    // 检查图像是否成功读取
    if (image_IR.empty() || image_Vis.empty()) {
        std::printf("无法读取图像文件\n");
        return -1;
    }

    // 检查红外图像和可见光图像的尺寸是否一致
    if ((image_IR.cols != image_Vis.cols) || (image_IR.rows != image_Vis.rows)) {
        std::printf("红外图像和可见光图像尺寸不一致\n");
        return -1;
    }

    // 获取图像宽度和高度
    int width = image_IR.cols; int height = image_IR.rows;

    /******************************* convert the image data into float type ********************************/
    //转换原始红外图像和可见光图像数据
    cv::Mat image_IR_float(height, width, CV_32FC1);
    image_IR.convertTo(image_IR_float, CV_32FC1, 1 / 255.0);

    cv::Mat image_Vis_float(height, width, CV_32FC1);
    image_Vis.convertTo(image_Vis_float, CV_32FC1, 1 / 255.0);

    float* image_IR_data = new float[width * height * sizeof(float)];
    float* image_Vis_data = new float[width * height * sizeof(float)];

    std::memcpy(image_IR_data, image_IR_float.data, width * height * sizeof(float));
    std::memcpy(image_Vis_data, image_Vis_float.data, width * height * sizeof(float));
    /******************************* convert the image data into float type ********************************/

    /*************************** the gaussian-blur for infrared and visible images **************************/
    //对红外图像和可见光图像进行高斯滤波
    float* image_IR_base = new float[width * height * sizeof(float)];
    float* image_Vis_base = new float[width * height * sizeof(float)];

    int64 st = cv::getTickCount();

    float sigma = 2.4; int mask_size = 6 * sigma + 1;
    gaussian_filter(image_IR_data, image_IR_base, width, height, mask_size, sigma);
    gaussian_filter(image_Vis_data, image_Vis_base, width, height, mask_size, sigma);

    double duration = (cv::getTickCount() - st) / cv::getTickFrequency() * 1000;
    std::cout << "gaussian filtering time delay: " << duration << "ms;" << std::endl;

    cv::Mat IR_anis_blur(height, width, CV_32FC1, image_IR_base);
    IR_anis_blur.convertTo(IR_anis_blur, CV_8UC1, 255.0);
    cv::imwrite("IR_anis_blur.png", IR_anis_blur);

    cv::Mat Vis_anis_blur(height, width, CV_32FC1, image_Vis_base);
    Vis_anis_blur.convertTo(Vis_anis_blur, CV_8UC1, 255.0);
    cv::imwrite("Vis_anis_blur.png", Vis_anis_blur);
    /*************************** the gaussian-blur for infrared and visible images **************************/

    /********************************* define the original detail layers **********************************/
    float* image_IR_detail = new float[width * height * sizeof(float)];
    float* image_Vis_detail = new float[width * height * sizeof(float)];

    // 此处只能是array_sub，而不能是array_sub_abs，否则融合结果不自然
    array_sub(image_IR_data, image_IR_base, image_IR_detail, width, height);
    array_sub(image_Vis_data, image_Vis_base, image_Vis_detail, width, height);

    cv::Mat IR_detail(height, width, CV_32FC1, image_IR_detail);
    IR_detail.convertTo(IR_detail, CV_8UC1, 255.0);
    cv::imwrite("IR_detail.png", IR_detail);

    cv::Mat Vis_detail(height, width, CV_32FC1, image_Vis_detail);
    Vis_detail.convertTo(Vis_detail, CV_8UC1, 255.0);
    cv::imwrite("Vis_detail.png", Vis_detail);
    /********************************* define the original detail layers **********************************/

    /******************************* calculate the coefficients kl1 and kl2 ********************************/
    //将两幅detail图像转换为行数为width* height，列数为2的矩阵
    float** detail_data = (float**)malloc(sizeof(float*) * width * height);
    for (int h = 0; h < height; h++)
    {
        for (int w = 0; w < width; w++)
        {
            detail_data[h * width + w] = (float*)malloc(sizeof(float) * 2); 
            detail_data[h * width + w][0] = image_IR_detail[h * width + w];
            detail_data[h * width + w][1] = image_Vis_detail[h * width + w];
        }
    }

    float detail_IR_mean = mean(image_IR_detail, width * height);
    float detail_Vis_mean = mean(image_Vis_detail, width * height);

    float* means = (float*)malloc(sizeof(float) * 2);
    means[0] = detail_IR_mean; means[1] = detail_Vis_mean;

    float* cov_data = (float*)malloc(sizeof(float) * 2 * 2);
    cov_data = covariance(detail_data, width * height, 2, means);

    //std::cout << cov_data[0] << "---" << cov_data[1] << "---" << cov_data[2] << "---" << cov_data[3] << std::endl;

    Eigen::Matrix2f cov_matrix(2, 2);
    cov_matrix << cov_data[0], cov_data[1], cov_data[2], cov_data[3];

    Eigen::EigenSolver<Eigen::Matrix2f> solver(cov_matrix);

    Eigen::Matrix2f D = solver.pseudoEigenvalueMatrix();
    Eigen::Matrix2f V = solver.pseudoEigenvectors();

    std::cout << "The covariance matrix is: " << std::endl << cov_matrix << std::endl;
    std::cout << "The pseudo-eigenvalue matrix D is:" << std::endl << D << std::endl;
    std::cout << "The pseudo-eigenvector matrix V is:" << std::endl << V << std::endl;
    std::cout << "Finally, V * D * V^(-1) = " << std::endl << V * D * V.inverse() << std::endl;

    int col_index, row_index;
    D.maxCoeff(&row_index, &col_index);

    std::cout << "the row_index for the max-eigenvalue is: "<< std::endl << row_index << std::endl \
        << "the col_index for the max-eigenvalue is: " << col_index << std::endl;

    Eigen::Vector2f vec_max = V.col(col_index);
    cout << "the max eigenvalue's vector is: " << std::endl << vec_max << std::endl;

    float kl1 = vec_max[0] / vec_max.sum();
    float kl2 = vec_max[1] / vec_max.sum();

    std::cout << "the number for kl1 is: " << kl1 << std::endl << "the number for kl2 is: " << kl2 << std::endl;
    /******************************* calculate the coefficients kl1 and kl2 ********************************/
    
    /******************* calculate the fusion map for both base layer and detail layer ********************/
    //计算base layers的fusion map
    float* base_layer = new float[width * height * sizeof(float)];
    array_add_weighted(image_IR_base, image_Vis_base, base_layer, width, height, 0.5, 0.5);
    //计算detail layers的fusion map
    float* detail_layer = new float[width * height * sizeof(float)];
    array_add_weighted(image_IR_detail, image_Vis_detail, detail_layer, width, height, kl1, kl2);
    /******************* calculate the fusion map for both base layer and detail layer ********************/

    /********************************** calculate the fusion map finally **********************************/
    float* fusion_map = new float[width * height * sizeof(float)];
    array_add(base_layer, detail_layer, fusion_map, width, height);
    /********************************** calculate the fusion map finally **********************************/

    cv::Mat fusion_result(height, width, CV_32FC1, fusion_map);
    fusion_result.convertTo(fusion_result, CV_8UC1, 255.0);
    cv::imwrite("fusion_result.png", fusion_result);

    //原图按照0.5--0.5的比例直接相加，就能达到不错的融合效果，但是比较模糊，损失了不少细节信息
    float* simple_fusion_data = new float[width * height * sizeof(float)];
    array_add_weighted(image_IR_data, image_Vis_data, simple_fusion_data, width, height, 0.5, 0.5);

    cv::Mat simple_fusion(height, width, CV_32FC1, simple_fusion_data);
    simple_fusion.convertTo(simple_fusion, CV_8UC1, 255.0);
    cv::imwrite("simple_fusion.png", simple_fusion);

    return 0;
}

上述融合算法可以得到不错的融合效果，但是如果将其应用到实际的工程项目之中，不一定具有很高的运行效率。为此，下面介绍和上述算法一致的融合算法--TIF融合算法，该算法具有简单易行的特点，可将其直接应用到实际的工程项目中，对于需要实时运行图像融合算法的需求来说，是一种不错的选择。

Two-scale image fusion of visible and infrared images using saliency detection

Durga Prasad Bavirisetti，Ravindra Dhuli

School of Electronics Engineering, VIT University, India

TIF融合算法主要通过以下三个步骤实线融合：

1，图像分解/分析：图像分解使用的是均值滤波(盒子滤波)以获取图像的基础层(base layer)和细节层(detail layer)；

2，图像融合：对待基础层和细节层分别使用不同的融合算法；

3，图像重建：以融合后的基础层和细节层重建图像。

考虑到原始的同样大小的(宽和高)的两幅图像和，将这两幅图像分解成基础层(base layer)和细节层(detail layer)。作者在文中使用均值滤波器来对原始图像进行模糊处理，来生成基础层：

在得到基础层后，通过原始图像与基础层图像相减，得到细节层图像：

基于可视显著检测理论，作者在文中使用了一种很有效的可视显著检测方法，就是使用均值滤波得到的图像减去中值滤波得到的图像(作者通过验证，中值滤波时使用的窗口大小为3比较合适)，并取绝对值：

对于原始的两幅输入图像和，通过分别计算可以得到和，通过这两个矩阵，得到细节层的融合矩阵系数。

基础层的融合，使用平均的策略进行融合：

细节层的融合，使用加权平均的策略进行融合，而加权矩阵就是之前求得的和：

在分别得到基础层和细节层的融合结果后，通过简单的加法得到最终的融合结果：

总之，TIF融合算法所使用的公式和思想都是数字图像处理技术中最基础的内容，作者通过相关评价指标证实了算法的有效性，而且实现起来也比较方便，算法运行速度很快，没有非常耗时的操作，因此TIF融合算法适合应用于对实时性有较高要求的应用上。如果是直接在PC端运行该融合算法的代码，则只要CPU处理速度够快，基本上是可以满足每秒25帧的帧率，实时以视频流的形式展示融合效果是完全没有问题的。但是，如果是在海思等嵌入式平台上运行时，大概每秒25帧的帧率就难以保证，因此需要对算法的原始C/C++代码进行优化加速，以达到实时以视频流的形式显示融合效果。

TIF算法使用盒子滤波对原始图像进行模糊，因此需要对盒子滤波这个关键的模糊函数进行优化。通常在进行盒子滤波，即均值滤波时，往往是计算滑动矩形窗口中所有像素灰度值的平均值，以这个平均值代替中心位置像素的灰度值。在对盒子滤波进行优化时，采取行列分离的方法，即对每一行的每一个像素，计算每一个像素的行方向邻域的灰度值之和，将该和的数值存储在每一个像素位置处，然后计算每一列的每一个像素列方向的邻域灰度值之和，这样就计算出了整个窗口所有像素灰度值之和。在进行行方向或列方向上的邻域像素灰度值和的计算时，采取一个更为快速简便的计算方法：像素每向右(行方向)移动一位时，减去邻域中最左边像素的灰度值，加上邻域右侧一个新的像素的灰度值；像素每向下(列方向)移动一位时，减去邻域中最上方像素的灰度值，加上邻域下侧一个新的像素的灰度值。

除了使用算法层面上的优化方法之外，SSE指令集加速也是一个不错的优化手段，在此贴出一些常见的SSE指令集：

////关于指令集的含义解释及其使用方法，可以在如下VS的官方网站搜寻：
https://learn.microsoft.com/zh-cn/previous-versions/visualstudio/visual-studio-2010

基本知识及定义：
__m128    包含4个float类型数字的向量
__m128d    包含2个double类型数字的向量
__m128i    包含若干个整型数字的向量
__m256    包含8个float类型数字的向量
__m256d    包含4个double类型数字的向量
__m256i    包含若干个整型数字的向量

ps 包含float类型的向量 s--single
pd 包含double类型的向量 d--double
epi8/epi16/epi32/epi64 包含8位/16位/32位/64位的有符号整数
epu8/epu16/epu32/epu64 包含8位/16位/32位/64位的无符号整数
si128/si256 未指定的128位或者256位向量

//// 整型变量运算指令集：
__m128i表示加载128个bit，若函数名称epi后面数字是8，则说明有128/8=16个元素；若函数名称epi后面数字是16，则说明有128/16=8个元素；若函数名称epi后面数字是32，则说明有128/32=4个元素

//// __m128i _mm_sub_epi32 (__m128i a, __m128i b); Subtracts the 4 signed or unsigned 32-bit integers of b from the 4 signed or unsigned 32-bit integers of a. r0 := a0 - b0 r1 := a1 - b1 r2 := a2 - b2 r3 := a3 - b3
//// __m128i _mm_set1_epi32 (int i); Sets the 4 signed 32-bit integer values to i. r0 := i r1 := i r2 := i r3 := i

////__m128i _mm_mullo_epi16 (__m128i a, __m128i b); Multiplies the 8 signed or unsigned 16-bit integers from a by the 8 signed or unsigned 16-bit integers from b. r0 := (a0 * b0)[15:0] r1 := (a1 * b1)[15:0] ... r7 := (a7 * b7)[15:0] 2个16bit数据相乘，取低16位

//// __m128i _mm_srli_epi16 (__m128i a, int count); Shifts the 8 signed or unsigned 16-bit integers in a right by count bits while shifting in zeros. r0 := srl(a0, count) r1 := srl(a1, count) ... r7 := srl(a7, count)

使用两个__m128i的数值，交替到结果中，也是16个元素，原来每一个元素是16bit，压缩至8bit
//// __m128i _mm_packus_epi16 (__m128i a, __m128i b); Packs the 16 signed 16-bit integers from a and b into 8-bit unsigned integers and saturates. 
r0 := UnsignedSaturate(a0)   ////UnsignedSaturate表示将16bit截断至8bit，即取16bit的低8位为止
r1 := UnsignedSaturate(a1)
...
r7 := UnsignedSaturate(a7)
r8 := UnsignedSaturate(b0)
r9 := UnsignedSaturate(b1)
...
r15 := UnsignedSaturate(b7)

//// __m128i _mm_shuffle_epi8(__m128i a, __m128i mask); This instruction shuffles 16-byte parameters from a 128-bit parameter.
给定mask中16个数字，除以16，余数为索引，然后到a中取对应索引的数值在该位置上 例如mask中第一个元素为0，则取a中第一个元素作为结果中的第一个元素

//// __m128i _mm_setr_epi8 (char b0, char b1, char b2, char b3, char b4, char b5, char b6, char b7, char b8, char b9, char b10, char b11, char b12, char b13, char b14, char b15); Sets the 16 signed 8-bit integer values in reverse order.
r0 := b0 r1 := b1 ... r15 := b15

//// __m128i _mm_loadu_si128 (__m128i *p); Loads 128-bit value.
//// __m128i _mm_loadl_epi64(__m128i const*p); Load the lower 64 bits of the value pointed to by p into the lower 64 bits of the result, zeroing the upper 64 bits of the result. r0:= *p[63:0] r1:=0x0

//// __m128i _mm_subs_epu8 (__m128i a, __m128i b); Subtracts the 16 unsigned 8-bit integers of b from the 16 unsigned 8-bit integers of a and saturates. 
//// r0 := UnsignedSaturate(a0 - b0) r1: = UnsignedSaturate(a1 - b1) ... r15 : = UnsignedSaturate(a15 - b15)

//// __m128i _mm_set1_epi8 (char b); Sets the 16 signed 8-bit integer values to b. r0 := b r1: = b ... r15 : = b
//// __m128i _mm_setzero_si128 (); Sets the 128-bit value to zero. r := 0x0

//// __m128i _mm_unpacklo_epi8 (__m128i a, __m128i b); Interleaves the lower 8 signed or unsigned 8-bit integers in a with the lower 8 signed or unsigned 8-bit integers in b.
//// r0 := a0 ; r1 := b1 r2: = a1; r3: = b1 ... r14 : = a7; r15: = b7
//// __m128i _mm_unpackhi_epi8 (__m128i a, __m128i b); Interleaves the upper 8 signed or unsigned 8-bit integers in a with the upper 8 signed or unsigned 8-bit integers in b.
//// r0 := a8 ; r1 := b8 r2: = a9; r3: = b9 ... r14 : = a15; r15: = b15

//// __m128i _mm_unpacklo_epi16 (__m128i a, __m128i b); Interleaves the lower 4 signed or unsigned 16-bit integers in a with the lower 4 signed or unsigned 16-bit integers in b. //取值时，每个元素为16-bit
//// r0 := a0 ; r1 := b0 r2 := a1 ; r3 := b1 r4 := a2 ; r5 := b2 r6 := a3 ; r7 := b3
//// __m128i _mm_unpackhi_epi16 (__m128i a, __m128i b); Interleaves the upper 4 signed or unsigned 16-bit integers in a with the upper 4 signed or unsigned 16-bit integers in b.
//// r0 := a4 ; r1 := b4 r2 := a5 ; r3 := b5 r4 := a6 ; r5 := b6 r6 := a7 ; r7 := b7

//// __m128i _mm_max_epu8 (__m128i a, __m128i b);Computes the pairwise maximum of the 16 unsigned 8-bit integers from a and the 16 unsigned 8-bit integers from b.
//// r0 := max(a0, b0) r1 := max(a1, b1) ... r15 := max(a15, b15)

//// __m128i _mm_min_epu8 (__m128i a, __m128i b); Computes the pairwise minimum of the 16 unsigned 8-bit integers from a and the 16 unsigned 8-bit integers from b.
//// r0 := min(a0, b0) r1 := min(a1, b1) ... r15 := min(a15, b15)


//// __m128i _mm_unpackhi_epi64 (__m128i a, __m128i b); Interleaves the upper signed or unsigned 64-bit integer in a with the upper signed or unsigned 64-bit integer in b. r0 := a1 ; r1 := b1
//// __m128i _mm_unpacklo_epi64 (__m128i a, __m128i b); Interleaves the lower signed or unsigned 64-bit integer in a with the lower signed or unsigned 64-bit integer in b. r0 := a0 ; r1 := b0

//// __m128i _mm_packus_epi32(__m128i a, __m128i b); This instruction converts four signed 32-bit integers into 8 unsigned 16-bit integers by using unsigned saturation.
r0 := (a0 < 0) ? 0 : ((a0 > 0xffff) ? 0xffff : a0)
r1 := (a1 < 0) ? 0 : ((a1 > 0xffff) ? 0xffff : a1)
r2 := (a2 < 0) ? 0 : ((a2 > 0xffff) ? 0xffff : a2)
r3 := (a3 < 0) ? 0 : ((a3 > 0xffff) ? 0xffff : a3)
r4 := (b0 < 0) ? 0 : ((b0 > 0xffff) ? 0xffff : b0)
r5 := (b1 < 0) ? 0 : ((b1 > 0xffff) ? 0xffff : b1)
r6 := (b2 < 0) ? 0 : ((b2 > 0xffff) ? 0xffff : b2)
r7 := (b3 < 0) ? 0 : ((b3 > 0xffff) ? 0xffff : b3)


//// __m128i _mm_packus_epi16 (__m128i a, __m128i b); Packs the 16 signed 16-bit integers from a and b into 8-bit unsigned integers and saturates.
r0 := UnsignedSaturate(a0)
r1 := UnsignedSaturate(a1)
...
r7 := UnsignedSaturate(a7)
r8 := UnsignedSaturate(b0)
r9 := UnsignedSaturate(b1)
...
r15 := UnsignedSaturate(b7)


整型和浮点型之间相互转换的指令集：
//// __m128 _mm_cvtepi32_ps (__m128i a); Converts the four signed 32-bit integer values of a to single-precision, floating-point values. r0 := (float) a0 r1 := (float) a1 r2 := (float) a2 r3 := (float) a3
//// __m128i _mm_cvtps_epi32 (__m128 a); Converts the four single-precision, floating-point values of a to signed 32-bit integer values. r0 := (int) a0 r1 := (int) a1 r2 := (int) a2 r3 := (int) a3


一些常见的浮点型变量运算指令集：
//// __m128 _mm_loadu_ps(float * p); r0 := p[0] r1: = p[1] r2 : = p[2] r3 : = p[3] 128 / 4 = 32
//// void _mm_store_ps(float *p, __m128 a ); Stores four single-precision, floating-point values. p[0] := a0 p[1] : = a1 p[2] : = a2 p[3] : = a3

//// __m128 _mm_add_ps(__m128 a , __m128 b ); r0 := a0 + b0 r1: = a1 + b1 r2 : = a2 + b2 r3 : = a3 + b3
//// __m128 _mm_sub_ps(__m128 a , __m128 b ); r0 := a0 - b0 r1: = a1 - b1 r2 : = a2 - b2 r3 : = a3 - b3 Subtracts the four single-precision, floating-point values of a and b.
//// __m128 _mm_mul_ps(__m128 a , __m128 b ); Multiplies the four single-precision, floating-point values of a and b. r0 := a0 * b0 r1 := a1 * b1 r2 := a2 * b2 r3 := a3 * b3
//// __m128 _mm_div_ps(__m128 a, __m128 b ); Divides the four single-precision, floating-point values of a and b. r0 := a0 / b0 r1 := a1 / b1 r2 := a2 / b2 r3 := a3 / b3

//// __m128 _mm_set1_ps(float w ); Sets the four single-precision, floating-point values to w. r0 := r1 := r2 := r3 := w
//// __m128 _mm_shuffle_ps(__m128 a , __m128 b , int i ); Selects four specific single-precision, floating-point values from a and b, based on the mask i. 

//// __m128 _mm_and_ps(__m128 a , __m128 b ); Computes the bitwise AND of the four single-precision, floating-point values of a and b. r0 := a0 & b0 r1 := a1 & b1 r2 := a2 & b2 r3 := a3 & b3
//// __m128 _mm_or_ps(__m128 a , __m128 b ); Computes the bitwise OR of the four single-precision, floating-point values of a and b. r0 := a0 | b0 r1 := a1 | b1 r2 := a2 | b2 r3 := a3 | b3
//// __m128 _mm_andnot_ps(__m128 a , __m128 b ); Computes the bitwise AND-NOT of the four single-precision, floating-point values of a and b. r0 := ~a0 & b0 r1 := ~a1 & b1 r2 := ~a2 & b2 r3 := ~a3 & b3
//// __m128 _mm_xor_ps(__m128 a , __m128 b ); Computes bitwise EXOR (exclusive-or) of the four single-precision, floating-point values of a and b. r0 := a0 ^ b0 r1 := a1 ^ b1 r2 := a2 ^ b2 r3 := a3 ^ b3
//// a ^ b中，^表示异或，即两个输入相同时为0，不同则为1

//// __m128 _mm_max_ps(__m128 a , __m128 b ); Computes the maximums of the four single-precision, floating-point values of a and b. r0 := max(a0, b0) r1 := max(a1, b1) r2 := max(a2, b2) r3 := max(a3, b3)
//// __m128 _mm_min_ps(__m128 a , __m128 b ); Computes the minima of the four single-precision, floating-point values of a and b. r0 := min(a0, b0) r1 := min(a1, b1) r2 := min(a2, b2) r3 := min(a3, b3)

//// __m128 _mm_exp_ps(__m128 x); 返回指数运算结果 VS2019及以上版本自带这个函数


sse中计算某一个数值的绝对值：浮点数字0x 7fff ffff 表示int的最大值，0x表示是16进制，7表示二进制0111，f表示二进制1111
也可以表示为二进制的：0111 1111 1111 1111 1111 1111 1111 1111(32bit)
其中第一个数字为0，表示为正数，若第一个数字为1，表示为负数
////浮点数据的绝对值计算
inline __m128 _mm_abs_ps(__m128 v) {
    static const int i = 0x7fffffff;
    float mask = *(float*)&i;
    return _mm_and_ps(v, _mm_set_ps1(mask));
}

在此给出C/C++对TIF融合算法的实现代码(只需要配置OpenCV库即可)，关闭__SSE2__的宏定义即运行纯粹的C语言实现代码，打开这个宏定义，即可使用SSE指令集对代码进行加速：

#include <iostream>
#include <stdlib.h>
#include <assert.h>
#include <time.h>
#include <cmath>
#include <algorithm>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

#define __SSE2__

#if defined(__SSE2__)
#include <mmintrin.h>  //mmx
#include <xmmintrin.h>  //sse
#include <emmintrin.h>  //sse2
#endif

/*************************** the basic functions for the fusion of infrared and visible images ***************************/
void boxfilter(float* input, float* output, int width, int height, int mask_size);
void array_add(float* a_array, float* b_array, float* output, int width, int height);
void array_add_weighted(float* a_array, float* b_array, float* output, int width, int height, float a_weight, float b_weight);
void array_sub(float* a_array, float* b_array, float* output, int width, int height);
void array_sub_abs(float* a_array, float* b_array, float* output, int width, int height);
void array_dot_mul(float* a_array, float* b_array, float* output, int width, int height);
void array_dot_div(float* a_array, float* b_array, float* output, int width, int height, float eps);
/*************************** the basic functions for the fusion of infrared and visible images ***************************/

/// <summary>
/// boxfilter的sse指令集优化，一次处理8个像素(测试数据长和宽都是8的倍数)
/// </summary>
/// <param name="input"></param> 输入图像地址
/// <param name="output"></param> 输出图像地址
/// <param name="width"></param> 图像的列数
/// <param name="height"></param> 图像的行数
/// <param name="mask_size"></param> 滤波矩形尺寸，大小通常为mask_size * mask_size
void boxfilter(float* input, float* output, int width, int height, int mask_size)
{
    /* ensure the mask_size be odd number greater than 3 */
    if (mask_size < 3 || mask_size % 2 == 0) {
        std::cout << "invalid mask size!" << std::endl;
        return;
    }

    int radius = (mask_size - 1) / 2;
    float InvertAmount = 1.0f / (mask_size * mask_size);

    std::vector<float> cache(width * height);
    std::vector<float> colSum(width);
    float* cachePtr = &(cache[0]);

    // sum horizonal
    for (int h = 0; h < height; ++h) {
        //int shift = h * width;

        // 每一行第一个像素初始化为：左侧半径相同值求和 + 右侧半径像素求和
        float tmp = (radius + 1) * input[h * width + 0];

        for (int w = 0; w < radius; ++w) {
            tmp += input[h * width + w];
        }

        // 处理左侧radius个像素的边界
        for (int w = 0; w <= radius; ++w) {

            tmp += input[h * width + w + radius]; // 新加的一个像素
            tmp -= input[h * width + 0]; // 减去的左边像素，这里是左侧边界像素值
            cachePtr[h * width + w] = tmp;
        }

        int start = radius + 1;
        int end = width - 1 - radius;

        for (int w = start; w <= end; ++w) {
            tmp += input[h * width + w + radius];
            tmp -= input[h * width + w - radius - 1];
            cachePtr[h * width + w] = tmp;
        }

        // 处理右侧radius个像素的边界
        start = width - radius;

        for (int w = start; w < width; ++w) {
            tmp += input[h * width + width - 1]; // 加上最右边的新元素，此处为图像最右侧边界的值
            tmp -= input[h * width + w - radius - 1]; // 减去最左侧旧的值
            cachePtr[h * width + w] = tmp;
        }
    }

    // 列指针数值初始化
    float* colSumPtr = &(colSum[0]);

    // 考虑向上的边界，以radius相同的上边界值代替
    for (int indexW = 0; indexW < width; ++indexW) {
        colSumPtr[indexW] = (radius + 1) * cachePtr[0 + indexW]; // h = 0 第一行，第indexW列
    }

    ////__m128 _mm_loadu_ps(float * p); r0 := p[0] r1: = p[1] r2 : = p[2] r3 : = p[3] 128 / 4 = 32
    ////__m128 _mm_add_ps(__m128 a , __m128 b ); r0 := a0 + b0 r1: = a1 + b1 r2 : = a2 + b2 r3 : = a3 + b3
    ////__m128 _mm_sub_ps(__m128 a , __m128 b ); r0 := a0 - b0 r1: = a1 - b1 r2 : = a2 - b2 r3 : = a3 - b3 Subtracts the four single-precision, floating-point values of a and b.
    ////void _mm_store_ps(float *p, __m128 a ); Stores four single-precision, floating-point values. p[0] := a0 p[1] : = a1 p[2] : = a2 p[3] : = a3
    ////__m128 _mm_set1_ps(float w ); Sets the four single-precision, floating-point values to w. r0 := r1 := r2 := r3 := w

    // 列方向向下相加radius个像素值
    for (int h = 0; h < radius; ++h) {
        float* tmpColSumPtr = colSumPtr;
        float* tmpCachePtr = cachePtr + h * width;

        int w = 0;

#if defined(__SSE2__)
        for (; w + 7 < width; w += 8) {
            __m128 _colSum_0 = _mm_loadu_ps(tmpColSumPtr);
            __m128 _colSum_1 = _mm_loadu_ps(tmpColSumPtr + 4);

            __m128 _cache_0 = _mm_loadu_ps(tmpCachePtr);
            __m128 _cache_1 = _mm_loadu_ps(tmpCachePtr + 4);

            __m128 _addTmp_0 = _mm_add_ps(_colSum_0, _cache_0);
            __m128 _addTmp_1 = _mm_add_ps(_colSum_1, _cache_1);

            _mm_store_ps(tmpColSumPtr, _addTmp_0);
            _mm_store_ps(tmpColSumPtr + 4, _addTmp_1);

            tmpColSumPtr += 8;
            tmpCachePtr += 8;
        }

        for (; w < width; ++w) {
            //colSumPtr[w] += cachePtr[h * width + w];
            *tmpColSumPtr += *tmpCachePtr;
            tmpColSumPtr++;
            tmpCachePtr++;
        }
#else
        for (; w < width; ++w) {
            //colSumPtr[w] += cachePtr[h * width + w];
            *tmpColSumPtr += *tmpCachePtr;
            tmpColSumPtr++;
            tmpCachePtr++;
        }
#endif

    }

    // sum vertical
    // 处理上面radius个像素边界
    for (int h = 0; h <= radius; ++h) {
        float* addPtr = cachePtr + (h + radius) * width;
        float* subPtr = cachePtr + 0 * width;  // 最上面的像素以边界值代替
        float* outPtr = output + h * width;

        float* tmpAddPtr = addPtr; float* tmpSubPtr = subPtr;
        float* tmpColSumPtr = colSumPtr; float* tmpOutPtr = outPtr;
        int w = 0;

#if defined(__SSE2__)
        for (; w + 7 < width; w += 8) {
            __m128 _colSum_0 = _mm_loadu_ps(tmpColSumPtr);
            __m128 _colSum_1 = _mm_loadu_ps(tmpColSumPtr + 4);

            __m128 _outTmp_0 = _mm_loadu_ps(colSumPtr);
            __m128 _outTmp_1 = _mm_loadu_ps(colSumPtr + 4);

            __m128 _tmp_0 = _mm_add_ps(_colSum_0, _mm_loadu_ps(tmpAddPtr));
            _tmp_0 = _mm_sub_ps(_tmp_0, _mm_loadu_ps(tmpSubPtr));
            __m128 _tmp_out_0 = _mm_mul_ps(_tmp_0, _mm_set1_ps(InvertAmount));

            __m128 _tmp_1 = _mm_add_ps(_colSum_1, _mm_loadu_ps(tmpAddPtr + 4));
            _tmp_1 = _mm_sub_ps(_tmp_1, _mm_loadu_ps(tmpSubPtr + 4));
            __m128 _tmp_out_1 = _mm_mul_ps(_tmp_1, _mm_set1_ps(InvertAmount));

            _mm_store_ps(tmpColSumPtr, _tmp_0);
            _mm_store_ps(tmpColSumPtr + 4, _tmp_1);

            _mm_store_ps(tmpOutPtr, _tmp_out_0);
            _mm_store_ps(tmpOutPtr + 4, _tmp_out_1);

            tmpAddPtr += 8; tmpSubPtr += 8;
            tmpColSumPtr += 8; tmpOutPtr += 8;
        }

        for (; w < width; ++w) {
            //colSumPtr[w] += addPtr[w];
            //colSumPtr[w] -= subPtr[w];
            //outPtr[w] = colSumPtr[w] * InvertAmount;
            *tmpColSumPtr += *tmpAddPtr;
            *tmpColSumPtr -= *tmpSubPtr;
            *tmpOutPtr = *tmpColSumPtr * InvertAmount;

            tmpAddPtr++; tmpSubPtr++;
            tmpColSumPtr++; tmpOutPtr++;
        }
#else
        for (; w < width; ++w) {
            //colSumPtr[w] += addPtr[w];
            //colSumPtr[w] -= subPtr[w];
            //outPtr[w] = colSumPtr[w] * InvertAmount;
            *tmpColSumPtr += *tmpAddPtr;
            *tmpColSumPtr -= *tmpSubPtr;
            *tmpOutPtr = *tmpColSumPtr * InvertAmount;

            tmpAddPtr++; tmpSubPtr++;
            tmpColSumPtr++; tmpOutPtr++;
        }
#endif

    }

    int start = radius + 1;
    int end = height - 1 - radius;

    for (int h = start; h <= end; ++h) {
        float* addPtr = cachePtr + (h + radius) * width;
        float* subPtr = cachePtr + (h - radius - 1) * width;
        float* outPtr = output + h * width;

        float* tmpAddPtr = addPtr; float* tmpSubPtr = subPtr;
        float* tmpColSumPtr = colSumPtr; float* tmpOutPtr = outPtr;
        int w = 0;

#if defined(__SSE2__)
        for (; w + 7 < width; w += 8) {
            __m128 _colSum_0 = _mm_loadu_ps(tmpColSumPtr);
            __m128 _colSum_1 = _mm_loadu_ps(tmpColSumPtr + 4);

            __m128 _outTmp_0 = _mm_loadu_ps(colSumPtr);
            __m128 _outTmp_1 = _mm_loadu_ps(colSumPtr + 4);

            __m128 _tmp_0 = _mm_add_ps(_colSum_0, _mm_loadu_ps(tmpAddPtr));
            _tmp_0 = _mm_sub_ps(_tmp_0, _mm_loadu_ps(tmpSubPtr));
            __m128 _tmp_out_0 = _mm_mul_ps(_tmp_0, _mm_set1_ps(InvertAmount));

            __m128 _tmp_1 = _mm_add_ps(_colSum_1, _mm_loadu_ps(tmpAddPtr + 4));
            _tmp_1 = _mm_sub_ps(_tmp_1, _mm_loadu_ps(tmpSubPtr + 4));
            __m128 _tmp_out_1 = _mm_mul_ps(_tmp_1, _mm_set1_ps(InvertAmount));

            _mm_store_ps(tmpColSumPtr, _tmp_0);
            _mm_store_ps(tmpColSumPtr + 4, _tmp_1);

            _mm_store_ps(tmpOutPtr, _tmp_out_0);
            _mm_store_ps(tmpOutPtr + 4, _tmp_out_1);

            tmpAddPtr += 8; tmpSubPtr += 8;
            tmpColSumPtr += 8; tmpOutPtr += 8;
        }

        for (; w < width; ++w) {
            //colSumPtr[w] += addPtr[w];
            //colSumPtr[w] -= subPtr[w];
            //outPtr[w] = colSumPtr[w] * InvertAmount;
            *tmpColSumPtr += *tmpAddPtr;
            *tmpColSumPtr -= *tmpSubPtr;
            *tmpOutPtr = *tmpColSumPtr * InvertAmount;

            tmpAddPtr++; tmpSubPtr++;
            tmpColSumPtr++; tmpOutPtr++;
        }
#else
        for (; w < width; ++w) {
            //colSumPtr[w] += addPtr[w];
            //colSumPtr[w] -= subPtr[w];
            //outPtr[w] = colSumPtr[w] * InvertAmount;
            *tmpColSumPtr += *tmpAddPtr;
            *tmpColSumPtr -= *tmpSubPtr;
            *tmpOutPtr = *tmpColSumPtr * InvertAmount;

            tmpAddPtr++; tmpSubPtr++;
            tmpColSumPtr++; tmpOutPtr++;
        }

#endif

    }

    // 处理下面radius个像素边界
    start = height - radius;
    for (int h = start; h < height; ++h) {
        float* addPtr = cachePtr + (height - 1) * width;  // 最下面边界索引
        float* subPtr = cachePtr + (h - radius - 1) * width; // 最上面旧的一行索引
        float* outPtr = output + h * width;

        float* tmpAddPtr = addPtr; float* tmpSubPtr = subPtr;
        float* tmpColSumPtr = colSumPtr; float* tmpOutPtr = outPtr;
        int w = 0;

#if defined(__SSE2__)
        for (; w + 7 < width; w += 8) {
            __m128 _colSum_0 = _mm_loadu_ps(tmpColSumPtr);
            __m128 _colSum_1 = _mm_loadu_ps(tmpColSumPtr + 4);

            __m128 _outTmp_0 = _mm_loadu_ps(colSumPtr);
            __m128 _outTmp_1 = _mm_loadu_ps(colSumPtr + 4);

            __m128 _tmp_0 = _mm_add_ps(_colSum_0, _mm_loadu_ps(tmpAddPtr));
            _tmp_0 = _mm_sub_ps(_tmp_0, _mm_loadu_ps(tmpSubPtr));
            __m128 _tmp_out_0 = _mm_mul_ps(_tmp_0, _mm_set1_ps(InvertAmount));

            __m128 _tmp_1 = _mm_add_ps(_colSum_1, _mm_loadu_ps(tmpAddPtr + 4));
            _tmp_1 = _mm_sub_ps(_tmp_1, _mm_loadu_ps(tmpSubPtr + 4));
            __m128 _tmp_out_1 = _mm_mul_ps(_tmp_1, _mm_set1_ps(InvertAmount));

            _mm_store_ps(tmpColSumPtr, _tmp_0);
            _mm_store_ps(tmpColSumPtr + 4, _tmp_1);

            _mm_store_ps(tmpOutPtr, _tmp_out_0);
            _mm_store_ps(tmpOutPtr + 4, _tmp_out_1);

            tmpAddPtr += 8; tmpSubPtr += 8;
            tmpColSumPtr += 8; tmpOutPtr += 8;
        }

        for (; w < width; ++w) {
            //colSumPtr[w] += addPtr[w];  // 最下面的像素以边界值代替
            //colSumPtr[w] -= subPtr[w];
            //outPtr[w] = colSumPtr[w] * InvertAmount;
            *tmpColSumPtr += *tmpAddPtr;
            *tmpColSumPtr -= *tmpSubPtr;
            *tmpOutPtr = *tmpColSumPtr * InvertAmount;

            tmpAddPtr++; tmpSubPtr++;
            tmpColSumPtr++; tmpOutPtr++;
        }
#else
        for (; w < width; ++w) {
            //colSumPtr[w] += addPtr[w];  // 最下面的像素以边界值代替
            //colSumPtr[w] -= subPtr[w];
            //outPtr[w] = colSumPtr[w] * InvertAmount;
            *tmpColSumPtr += *tmpAddPtr;
            *tmpColSumPtr -= *tmpSubPtr;
            *tmpOutPtr = *tmpColSumPtr * InvertAmount;

            tmpAddPtr++; tmpSubPtr++;
            tmpColSumPtr++; tmpOutPtr++;
        }
#endif

    }
}

/// <summary>
/// 两幅相同尺寸的图像像素值相加
/// </summary>
/// <param name="a_array"></param> 输入图像，被加数 -- a
/// <param name="b_array"></param> 输入图像，加数 -- b
/// <param name="output"></param> 输出图像
/// <param name="width"></param> 图像的宽
/// <param name="height"></param> 图像的高
void array_add(float* a_array, float* b_array, float* output, int width, int height)
{
    for (int h = 0; h < height; h++)
    {
        float* a_arrayPtr = a_array + h * width;
        float* b_arrayPtr = b_array + h * width;
        float* outputPtr = output + h * width;

        int w = 0;
#if defined(__SSE2__)
        for (; w + 7 < width; w += 8) {
            __m128 _a_array_0 = _mm_loadu_ps(a_arrayPtr);
            __m128 _a_array_1 = _mm_loadu_ps(a_arrayPtr + 4);

            __m128 _b_array_0 = _mm_loadu_ps(b_arrayPtr);
            __m128 _b_array_1 = _mm_loadu_ps(b_arrayPtr + 4);

            __m128 _output_0 = _mm_add_ps(_a_array_0, _b_array_0);
            __m128 _output_1 = _mm_add_ps(_a_array_1, _b_array_1);

            _mm_store_ps(outputPtr, _output_0);
            _mm_store_ps(outputPtr + 4, _output_1);

            a_arrayPtr += 8; b_arrayPtr += 8; 
            outputPtr += 8;
        }

        for (; w < width; w++) {
            *outputPtr = (*a_arrayPtr) + (*b_arrayPtr);
            a_arrayPtr++; b_arrayPtr++; 
            outputPtr++;
        }
#else
        for (; w < width; w++) {
            *outputPtr = (*a_arrayPtr) + (*b_arrayPtr);
            a_arrayPtr++; b_arrayPtr++;
            outputPtr++;
        }
#endif    
    }
}

/// <summary>
/// 两幅相同尺寸的图像像素值相加，并取平均值：/2
/// </summary>
/// <param name="a_array"></param> 输入图像，被加数 -- a
/// <param name="b_array"></param> 输入图像，加数 -- b
/// <param name="output"></param> 输出图像
/// <param name="width"></param> 图像的宽
/// <param name="height"></param> 图像的高
void array_add_weighted(float* a_array, float* b_array, float* output, int width, int height, float a_weight, float b_weight)
{
    for (int h = 0; h < height; h++)
    {
        float* a_arrayPtr = a_array + h * width;
        float* b_arrayPtr = b_array + h * width;
        float* outputPtr = output + h * width;

        int w = 0;
#if defined(__SSE2__)
        for (; w + 7 < width; w += 8) {
            __m128 _a_array_0 = _mm_loadu_ps(a_arrayPtr);
            __m128 _a_array_1 = _mm_loadu_ps(a_arrayPtr + 4);

            __m128 _b_array_0 = _mm_loadu_ps(b_arrayPtr);
            __m128 _b_array_1 = _mm_loadu_ps(b_arrayPtr + 4);

            _a_array_0 = _mm_mul_ps(_a_array_0, _mm_set1_ps(a_weight));
            _a_array_1 = _mm_mul_ps(_a_array_1, _mm_set1_ps(a_weight));

            _b_array_0 = _mm_mul_ps(_b_array_0, _mm_set1_ps(b_weight));
            _b_array_1 = _mm_mul_ps(_b_array_1, _mm_set1_ps(b_weight));

            __m128 _output_0 = _mm_add_ps(_a_array_0, _b_array_0);
            __m128 _output_1 = _mm_add_ps(_a_array_1, _b_array_1);

            _mm_store_ps(outputPtr, _output_0);
            _mm_store_ps(outputPtr + 4, _output_1);

            a_arrayPtr += 8; b_arrayPtr += 8;
            outputPtr += 8;
        }

        for (; w < width; w++) {
            *outputPtr = (*a_arrayPtr) * a_weight + (*b_arrayPtr) * b_weight;
            a_arrayPtr++; b_arrayPtr++;
            outputPtr++;
        }
#else
        for (; w < width; w++) {
            *outputPtr = (*a_arrayPtr) * a_weight + (*b_arrayPtr) * b_weight;
            a_arrayPtr++; b_arrayPtr++; 
            outputPtr++;
        }
#endif    
    }
}

/// <summary>
/// 两幅相同尺寸的图像像素值相减
/// </summary>
/// <param name="a_array"></param> 输入图像，被减数 -- a
/// <param name="b_array"></param> 输入图像，减数 -- b
/// <param name="output"></param> 输出图像
/// <param name="width"></param> 图像的宽
/// <param name="height"></param> 图像的高
void array_sub(float* a_array, float* b_array, float* output, int width, int height)
{
    for (int h = 0; h < height; h++)
    {
        float* a_arrayPtr = a_array + h * width;
        float* b_arrayPtr = b_array + h * width;
        float* outputPtr = output + h * width;

        int w = 0;
#if defined(__SSE2__)
        for (; w + 7 < width; w += 8) {
            __m128 _a_array_0 = _mm_loadu_ps(a_arrayPtr);
            __m128 _a_array_1 = _mm_loadu_ps(a_arrayPtr + 4);

            __m128 _b_array_0 = _mm_loadu_ps(b_arrayPtr);
            __m128 _b_array_1 = _mm_loadu_ps(b_arrayPtr + 4);

            __m128 _output_0 = _mm_sub_ps(_a_array_0, _b_array_0);
            __m128 _output_1 = _mm_sub_ps(_a_array_1, _b_array_1);

            _mm_store_ps(outputPtr, _output_0);
            _mm_store_ps(outputPtr + 4, _output_1);

            a_arrayPtr += 8; b_arrayPtr += 8; 
            outputPtr += 8;
        }

        for (; w < width; w++) {
            *outputPtr = (*a_arrayPtr) - (*b_arrayPtr);
            a_arrayPtr++; b_arrayPtr++;
            outputPtr++;
        }
#else
        for (; w < width; w++) {
            *outputPtr = (*a_arrayPtr) - (*b_arrayPtr);
            a_arrayPtr++; b_arrayPtr++; 
            outputPtr++;
        }
#endif    
    }
}

/// <summary>
/// 两幅相同尺寸的图像像素值相减，并取差值的绝对值
/// </summary>
/// <param name="a_array"></param> 输入图像，被减数 -- a
/// <param name="b_array"></param> 输入图像，减数 -- b
/// <param name="output"></param> 输出图像
/// <param name="width"></param> 图像的宽
/// <param name="height"></param> 图像的高
void array_sub_abs(float* a_array, float* b_array, float* output, int width, int height)
{
    for (int h = 0; h < height; h++)
    {
        float* a_arrayPtr = a_array + h * width;
        float* b_arrayPtr = b_array + h * width;
        float* outputPtr = output + h * width;

        const int i = 0x7fffffff;
        float abs_mask = *(float*)&i;

        int w = 0;
#if defined(__SSE2__)
        for (; w + 7 < width; w += 8) {
            __m128 _a_array_0 = _mm_loadu_ps(a_arrayPtr);
            __m128 _a_array_1 = _mm_loadu_ps(a_arrayPtr + 4);

            __m128 _b_array_0 = _mm_loadu_ps(b_arrayPtr);
            __m128 _b_array_1 = _mm_loadu_ps(b_arrayPtr + 4);

            __m128 _output_0 = _mm_sub_ps(_a_array_0, _b_array_0);
            __m128 _output_1 = _mm_sub_ps(_a_array_1, _b_array_1);

            //将数值与mask按位相与，即可得到绝对值
            _output_0 = _mm_and_ps(_output_0, _mm_set1_ps(abs_mask));
            _output_1 = _mm_and_ps(_output_1, _mm_set1_ps(abs_mask));

            _mm_store_ps(outputPtr, _output_0);
            _mm_store_ps(outputPtr + 4, _output_1);

            a_arrayPtr += 8; b_arrayPtr += 8; 
            outputPtr += 8;
        }

        for (; w < width; w++) {
            //abs求取整数绝对值，fabs求取浮点数绝对值
            *outputPtr = fabs((*a_arrayPtr) - (*b_arrayPtr));
            a_arrayPtr++; b_arrayPtr++;
            outputPtr++;
        }
#else
        for (; w < width; w++) {
            //abs求取整数绝对值，fabs求取浮点数绝对值
            *outputPtr = fabs((*a_arrayPtr) - (*b_arrayPtr));
            a_arrayPtr++; b_arrayPtr++; 
            outputPtr++;
        }
#endif

    }
}

/// <summary>
/// 两幅相同尺寸的图像像素值点乘
/// </summary>
/// <param name="a_array"></param> 输入图像 -- a
/// <param name="b_array"></param> 输入图像 -- b
/// <param name="output"></param> 输出图像
/// <param name="width"></param> 图像的宽
/// <param name="height"></param> 图像的高
void array_dot_mul(float* a_array, float* b_array, float* output, int width, int height)
{
    for (int h = 0; h < height; h++)
    {
        float* a_arrayPtr = a_array + h * width;
        float* b_arrayPtr = b_array + h * width;
        float* outputPtr = output + h * width;

        int w = 0;
#if defined(__SSE2__)
        for (; w + 7 < width; w += 8) {
            __m128 _a_array_0 = _mm_loadu_ps(a_arrayPtr);
            __m128 _a_array_1 = _mm_loadu_ps(a_arrayPtr + 4);

            __m128 _b_array_0 = _mm_loadu_ps(b_arrayPtr);
            __m128 _b_array_1 = _mm_loadu_ps(b_arrayPtr + 4);

            __m128 _output_0 = _mm_mul_ps(_a_array_0, _b_array_0);
            __m128 _output_1 = _mm_mul_ps(_a_array_1, _b_array_1);

            _mm_store_ps(outputPtr, _output_0);
            _mm_store_ps(outputPtr + 4, _output_1);

            a_arrayPtr += 8; b_arrayPtr += 8; 
            outputPtr += 8;
        }

        for (; w < width; w++) {
            *outputPtr = (*a_arrayPtr) * (*b_arrayPtr);
            a_arrayPtr++; b_arrayPtr++;
            outputPtr++;
        }
#else
        for (; w < width; w++) {
            *outputPtr = (*a_arrayPtr) * (*b_arrayPtr);
            a_arrayPtr++; b_arrayPtr++;
            outputPtr++;
        }
#endif    
    }
}

/// <summary>
/// 两幅相同尺寸图像做像素值点除
/// </summary>
/// <param name="a_array"></param> 输入图像 -- a 被除数
/// <param name="b_array"></param> 输入图像 -- b 除数
/// <param name="output"></param> 输入结果
/// <param name="width"></param> 图像的宽
/// <param name="height"></param> 图像的高
/// <param name="eps"></param> 防止除数为0，通常为一个很小的数值
void array_dot_div(float* a_array, float* b_array, float* output, int width, int height, float eps)
{
    for (int h = 0; h < height; h++)
    {
        float* a_arrayPtr = a_array + h * width;
        float* b_arrayPtr = b_array + h * width;
        float* outputPtr = output + h * width;

        int w = 0;
#if defined(__SSE2__)
        for (; w + 7 < width; w += 8) {
            __m128 _a_array_0 = _mm_loadu_ps(a_arrayPtr);
            __m128 _a_array_1 = _mm_loadu_ps(a_arrayPtr + 4);

            __m128 _b_array_0 = _mm_loadu_ps(b_arrayPtr);
            __m128 _b_array_1 = _mm_loadu_ps(b_arrayPtr + 4);

            // _mm_rcp_ps 
            // Computes the approximations of reciprocals of the four single-precision, floating-point values of a. r0 := recip(a0) 
            // r1: = recip(a1) r2 : = recip(a2) r3 : = recip(a3)

            _b_array_0 = _mm_rcp_ps(_mm_add_ps(_b_array_0, _mm_set1_ps(eps)));
            _b_array_1 = _mm_rcp_ps(_mm_add_ps(_b_array_1, _mm_set1_ps(eps)));

            __m128 _output_0 = _mm_mul_ps(_a_array_0, _b_array_0);
            __m128 _output_1 = _mm_mul_ps(_a_array_1, _b_array_1);

            _mm_store_ps(outputPtr, _output_0);
            _mm_store_ps(outputPtr + 4, _output_1);

            a_arrayPtr += 8; b_arrayPtr += 8;
            outputPtr += 8;
        }

        for (; w < width; w++) {
            *outputPtr = (*a_arrayPtr) * (1 / (*b_arrayPtr + eps));
            a_arrayPtr++; b_arrayPtr++;
            outputPtr++;
        }
#else
        for (; w < width; w++) {
            *outputPtr = (*a_arrayPtr) * (1 / (*b_arrayPtr + eps));
            a_arrayPtr++; b_arrayPtr++;
            outputPtr++;
        }
#endif    
    }
}

int main()
{
    // 读取红外灰度图像和可见光灰度图像
    cv::Mat image_IR = cv::imread("Marne_09_IR.bmp", cv::IMREAD_GRAYSCALE);
    cv::Mat image_Vis = cv::imread("Marne_09_Vis.bmp", cv::IMREAD_GRAYSCALE);

    // 检查图像是否成功读取
    if (image_IR.empty() || image_Vis.empty()) {
        std::printf("无法读取图像文件\n");
        return -1;
    }

    // 检查红外图像和可见光图像的尺寸是否一致
    if ((image_IR.cols != image_Vis.cols) || (image_IR.rows != image_Vis.rows)) {
        std::printf("红外图像和可见光图像尺寸不一致\n");
        return -1;
    }

    // 获取图像宽度和高度
    int width = image_IR.cols; int height = image_IR.rows;

    //开始计时，将中值滤波的时间算在内
    int64 st = cv::getTickCount();

    //作者通过验证，中值滤波窗口大小为3比较合适
    cv::Mat median_IR(height, width, CV_8UC1);
    cv::Mat median_Vis(height, width, CV_8UC1);

    cv::medianBlur(image_IR, median_IR, 3);
    cv::medianBlur(image_Vis, median_Vis, 3);

    /****************************** convert the image data into float type *****************************/
    //转换原始红外图像和可见光图像数据
    cv::Mat image_IR_float(height, width, CV_32FC1);
    image_IR.convertTo(image_IR_float, CV_32FC1, 1 / 255.0);

    cv::Mat image_Vis_float(height, width, CV_32FC1);
    image_Vis.convertTo(image_Vis_float, CV_32FC1, 1 / 255.0);

    float* image_IR_data = new float[width * height * sizeof(float)];
    float* image_Vis_data = new float[width * height * sizeof(float)];

    std::memcpy(image_IR_data, image_IR_float.data, width * height * sizeof(float));
    std::memcpy(image_Vis_data, image_Vis_float.data, width * height * sizeof(float));

    //转换红外图像和可见光图像中值滤波数据
    cv::Mat median_IR_float(height, width, CV_32FC1);
    median_IR.convertTo(median_IR_float, CV_32FC1, 1 / 255.0);

    cv::Mat median_Vis_float(height, width, CV_32FC1);
    median_Vis.convertTo(median_Vis_float, CV_32FC1, 1 / 255.0);

    float* median_IR_data = new float[width * height * sizeof(float)];
    float* median_Vis_data = new float[width * height * sizeof(float)];

    std::memcpy(median_IR_data, median_IR_float.data, width * height * sizeof(float));
    std::memcpy(median_Vis_data, median_Vis_float.data, width * height * sizeof(float));
    /****************************** convert the image data into float type *****************************/

    /*************************** calculate the base layers and detail layers ***************************/
    float* base_IR = new float[width * height * sizeof(float)];
    float* base_Vis = new float[width * height * sizeof(float)];

    int base_kernel = 31;
    boxfilter(image_IR_data, base_IR, width, height, base_kernel);
    boxfilter(image_Vis_data, base_Vis, width, height, base_kernel);

    float* detail_IR = new float[width * height * sizeof(float)];
    float* detail_Vis = new float[width * height * sizeof(float)];

    array_sub(image_IR_data, base_IR, detail_IR, width, height);
    array_sub(image_Vis_data, base_Vis, detail_Vis, width, height);
    /*************************** calculate the base layers and detail layers ***************************/

    /*************************** calculate the salience maps and wieght maps ***************************/
    float* salience_IR = new float[width * height * sizeof(float)];
    float* salience_Vis = new float[width * height * sizeof(float)];
    float* salience_Sum = new float[width * height * sizeof(float)];

    array_sub_abs(base_IR, median_IR_data, salience_IR, width, height);
    array_sub_abs(base_Vis, median_Vis_data, salience_Vis, width, height);
    array_add(salience_IR, salience_Vis, salience_Sum, width, height);

    float* weight_IR = new float[width * height * sizeof(float)];
    float* weight_Vis = new float[width * height * sizeof(float)];

    float eps = 0.000001;
    array_dot_div(salience_IR, salience_Sum, weight_IR, width, height, eps);
    array_dot_div(salience_Vis, salience_Sum, weight_Vis, width, height, eps);
    /*************************** calculate the salience maps and wieght maps ***************************/

    /***************** calculate the fusion map for both base layer and detail layer ******************/
    //计算base layer的fusion map
    float* base_layer = new float[width * height * sizeof(float)];
    array_add_weighted(base_IR, base_Vis, base_layer, width, height, 0.5, 0.5);

    float* weighted_detail_IR = new float[width * height * sizeof(float)];
    float* weighted_detail_Vis = new float[width * height * sizeof(float)];
    float* detail_layer = new float[width * height * sizeof(float)];

    array_dot_mul(weight_IR, detail_IR, weighted_detail_IR, width, height);
    array_dot_mul(weight_Vis, detail_Vis, weighted_detail_Vis, width, height);
    array_add(weighted_detail_IR, weighted_detail_Vis, detail_layer, width, height);
    /***************** calculate the fusion map for both base layer and detail layer ******************/

    /******************************** calculate the fusion map finally ********************************/
    float* fusion_map = new float[width * height * sizeof(float)];
    array_add(base_layer, detail_layer, fusion_map, width, height);
    /******************************** calculate the fusion map finally ********************************/

    double duration = (cv::getTickCount() - st) / cv::getTickFrequency() * 1000;
    std::cout << "image fusion time delay: " << duration << "ms;" << std::endl;

    cv::Mat fusion_result(height, width, CV_32FC1, fusion_map);
    fusion_result.convertTo(fusion_result, CV_8UC1, 255.0);
    cv::imwrite("fusion_result.png", fusion_result);

    return 0;
}

上面一组图像分别为同一场景的：可见光图像、红外图像、融合结果图像。从融合结果图像与可见光图像和红外图像对比效果来看，融合结果图像既体现了可见光图像和红外图像的主要信息，又保留了重要的边缘细节信息，因此基于TIF算法的图像融合方法不仅具有实时处理的高效特性，更能获得不错的融合效果，对于实时图像融合的项目需求是一个不错的选择方案。

posted @ 2024-07-12 21:53 彩英-pink 阅读(89) 评论(0) 编辑收藏举报

刷新页面返回顶部

彩英

起风了，唯有努力生存

从图像融合谈起

公告