论文解读-Image splicing forgery detection combining coarse to refined convolutional neural network and adaptive clustering
论文解读-Image splicing forgery detection combining coarse to refined convolutional neural network and adaptive clustering
Abstract
This paper proposes a splicing forgery detection method with two parts: a coarse-to-
refined convolutional neural network (C2RNet) and diluted adaptive clustering.
The proposed C2RNet cascades a coarse convolutional neural network (C-CNN) and a refined CNN(R-CNN) and extracts the differences in the image properties between un-tampered and tampered regions from image patches with different scales.
Further, to decrease the computational complexity, an image-level CNN is introduced to replace patch-level CNN in C2RNet. The proposed detection method learns the differences of various image properties to guarantee a stable detection performance, and the image-level CNN tremendously decreases its computational time. After the suspicious forgery regions are located by the proposed C2RNet, the final detected forgery regions are generated by applying the proposed adaptive clustering approach. The experiment results demonstrate that the proposed detection method achieves relatively promising results compared with state-of-the-art splicing forgery detection methods, even under various attack conditions.
本文提出一种拼接伪造检测方法有两个部分:一个coarse-torefined卷积神经网络(C2RNet)和稀释自适应聚类。拟议的C2RNet瀑布粗卷积神经网络(C-CNN)和精制CNN (R-CNN)和提取图像属性差异un-tampered和干扰区域与不同尺度的图像补丁。进一步降低计算复杂度,映像级别介绍了CNN取代缀块C2RNet CNN。该检测方法学习各种图像属性的差异来保证一个稳定的检测性能,和映像级别CNN极大地减少计算时间。可疑后伪造区域位于C2RNet提议,最后发现伪造区域生成的应用提出的自适应聚类方法。实验结果表明,该检测方法实现比较有前景的结果与最先进的拼接伪造检测方法相比,甚至在各种攻击条件下。
1 Introduction
To further explore the splicing forgery detection method based on a CNN, a two-stage hierarchical feature learning approach is proposed in this paper. The first stage of feature learning is based on a coarse CNN, which can roughly identify the differences in the image properties between un-tampered and tampered regions. The second stage of feature learning, using a refined CNN, further reveals the essential differences in the image properties between un-tampered and tampered regions. The proposed coarse-to-refined convolutional neural network (C2RNet) can learn the differences in various image properties between tampered and untampered regions, which can solve the defects in the previous splicing forgery detection methods and can maintain a stable detection performance for different splicing forgery images. Nevertheless, when the proposed C2RNet is implemented at the patch level, a high computational complexity occurs. Therefore, in this study, an image-level CNN is further utilized to replace the patch-level CNN in C2RNet, which can significantly decrease the computational complexity of the proposed detection method. Finally, based on suspicious forgery regions detected by C2RNet, an adaptive clustering is then applied to obtain the final detected splicing forgery regions. The contributions of the proposed scheme rely on the following aspects:
为了进一步探索基于CNN的拼接伪造检测方法,本文提出了一种两阶段的分层特征学习方法。 特征学习的第一阶段是基于一个粗略的 CNN,它可以粗略地识别出未篡改区域和篡改区域之间图像属性的差异。 特征学习的第二阶段,使用精炼的 CNN,进一步揭示了未篡改和篡改区域之间图像属性的本质差异。 所提出的粗到精卷积神经网络(C2RNet)可以学习篡改和未篡改区域之间各种图像属性的差异,可以解决以往拼接伪造检测方法中的缺陷,并且对于不同的拼接伪造可以保持稳定的检测性能 图片。 然而,当提出的 C2RNet 在补丁级别实现时,会出现很高的计算复杂度。 因此,在本研究中,进一步利用图像级 CNN 代替 C2RNet 中的补丁级 CNN,可以显着降低所提出检测方法的计算复杂度。 最后,基于 C2RNet 检测到的可疑伪造区域,然后应用自适应聚类得到最终检测到的拼接伪造区域。 拟议方案的贡献取决于以下方面:
-
a coarse-to-refined CNN proposed for locating suspicious forgery regions at the pixel-level;
-
an image-level CNN used to replace a patch-level CNN in the proposed framework, which significantly reduces the time complexity;
-
an adaptive clustering approach that can filter falsely detected areas to achieve a better location of the forgery regions.
-
一种粗到精的 CNN,用于在像素级定位可疑伪造区域;
-
图像级 CNN 用于替换提议框架中的补丁级 CNN,显著降低了时间复杂度;
-
一种自适应聚类方法,可以过滤错误检测的区域以更好地定位伪造区域。
在上一部分,分为两个特征学习过程,第一个特征学习过程是基于粗CNN,能够粗略的确认在图像篡改区域和图像未篡改区域的差别,尤其是边缘。第二个特征学习过程是基于细CNN,能够进一步学习到必要的在图像篡改区域和图像未篡改区域的差别,即在粗CNN得到的边缘进一步筛选。
在下一部分,在通过C2RNet得到最终更加准确的可疑篡改区域(Net_out)后,采用自适应聚类算法确定最终准确的篡改篡改区域(FD_Out),然后利用填充算法进行填充,得到最终结果。
2 Proposed splicing forgery detection method
-
These suspicious forgery regions (Net_out) obtained by C2RNet will be adaptively filtered
and filled in through a post-processing step to generate the final detected forgery regions (FD_ out). The proposed C2RNet and the adaptive filtering and filling applied through a post-processing step are described in Sections 2.1 and 2.2 , respectively. To reduce the computational time of C2RNet, an image-level CNN is proposed to replace the patch-level CNN, which is also presented in Section 2.1 .C2RNet得到的这些可疑伪造区域(Net_out)将通过后处理步骤进行自适应过滤和填充,生成最终检测到的伪造区域(F D_out)。 所提出的 C2RNet 以及通过后处理步骤应用的自适应滤波和填充分别在第 2.1 节和第 2.2 节中描述。 为了减少 C2RNet 的计算时间,提出了一个图像级 CNN 来代替补丁级 CNN,这也在第 2.1 节中介绍。
2.1 Coarse-to-refined network (C2RNet)
-
Based on the above discussion, the proposed C2RNet is applied, which is cascaded using a coarse CNN (C-CNN) and a refined CNN (R-CNN). Through the training process, the trained C-CNN can detect suspicious coarse forgery regions, which mainly exposes the edges of the forgery regions and contains a few inaccurately detected edges that actually belong to the un-tampered regions in the host image. To locate more accurate suspicious forgery regions, the trained R-CNN will further filter out these inaccurately detected edges to obtain the refined suspicious forgery regions.
在上述讨论的基础上,提出了一种用粗CNN(C-CNN)和精CNN(R-CNN)级联的C2RNet。通过训练过程,训练后的C-CNN能够检测出可疑的粗伪造区域,该粗伪造区域主要暴露出伪造区域的边缘,并且含有少量的不准确的边缘,这些边缘实际上属于宿主图像中未被篡改的区域。为了定位更准确的可疑伪造区域,训练后的R-CNN将进一步滤除这些不准确检测到的边缘以获得精化的可疑伪造区域。
2.1.1 Coarse CNN (C-CNN) in C2RNet
-
Therefore, as shown in Fig. 3 , the architecture of a coarse CNN (C-CNN) is based on VGG-16 [31] , which includes 13 convolutional layers, five max-pooling layers, and two fully connect layers, and each convolutional layer is followed by a ReLU activation function [28] . Image patches W c ×W c in size are extracted from each pixel along the contour of the tampered regions and from the corresponding pixel on the original.image, and these image patches are then used to train a C-CNN. Details on the extraction of image patches used to train a C-CNN are presented in Section 3.2 . Through the training process, a C-CNN can learn the differences in the image properties between the tampered and un-tampered regions. During the testing process, the output of the C-CNN is a predicted score of two classes, namely, tampered and un-tampered classes. These patches are classified as a tampered class, which reveals the coarseness of the suspicious forgery regions, where differences in the image properties occur between the tampered and un-tampered regions in the host image.
-
因此,如图 3 所示,粗略 CNN(C-CNN)的架构基于 VGG-16 [31],包括 13 个卷积层、5 个最大池化层和两个全连接层,每个 卷积层之后是 ReLU 激活函数 [28]。 从沿篡改区域轮廓的每个像素和 original.image 上的相应像素中提取大小为 W c ×W c 的图像块,然后将这些图像块用于训练 C-CNN。 第 3.2 节介绍了用于训练 C-CNN 的图像块提取的详细信息。 通过训练过程,C-CNN 可以学习篡改和未篡改区域之间图像属性的差异。 在测试过程中,C-CNN 的输出是两个类别的预测分数,即篡改类别和未篡改类别。 这些补丁被归类为篡改类,它揭示了可疑伪造区域的粗糙度,其中图像属性的差异出现在主图像中的篡改和未篡改区域之间。
2.1.2 Refined CNN (R-CNN) in C2RNet
-
To allow an R-CNN to further learn the image properties from a sufficient amount of local information, the size of the input patch is W r ×W r , satisfying W c < W r . These patches are extracted from the edges of the original images and the contour of the tampered regions. Details on extracting the image patches to train an R-CNN are provided in Section 3.2 . From these training patches, an R-CNN can learn the differences in the image properties to filter out these inaccurate regions detected by a C-CNN and obtain the refined suspicious forgery regions.
为了允许r-cnn从大量的本地信息中进一步学习图像属性,输入补丁的大小为Wr × Wr,满足Wc < Wr。这些补丁是从原始图像的边缘和篡改区域的轮廓中提取的。关于提取图像补丁以训练r-cnn的细节在第3.2节中提供。从这些训练补丁中,r-cnn可以了解图像属性的差异,以滤除c-cnn检测到的这些不准确区域,并获得改进的可疑伪造区域。
2.1.3 Image-level CNN for fast computations
-
As shown in Fig. 5 , it can be seen that all filters and bias parameters between the two CNNs are the same, and the result of a patch-level CNN is equal to the result of an image-level CNN. The differences between a patch-level CNN and an image-level CNN are as follows:
(1) a max-pooling layer in a patch-level CNN is replaced with an overlapping max-pooling layer in an image-level CNN,
(2) a down-sampling step is added after the overlapping max-pooling layer in an image-level CNN.
By transforming a patch-level CNN into an image-level CNN, the computational complexity of the proposed detection method is dramatically reduced.
如图5所示,可以看出两个CNN之间的所有filters和bias参数都是相同的,patch-level CNN的结果与image-level CNN的结果是相等的。 patch-level CNN 和 image-level CNN 的区别如下:
(1)patch-level CNN 中的 max-pooling 层被 image-level CNN 中的重叠 max-pooling 层替换
( 2) 在图像级 CNN 中重叠的最大池化层之后添加下采样步骤。
通过将补丁级 CNN 转换为图像级 CNN,所提出的检测方法的计算复杂度显着降低。
2.2. Adaptive clustering approach
3.1 Training process of C2RNet
-
For the training image set of a C-CNN, the size of the image patch W c is set as 32. In each splicing forgery image, pixels along the contour of the tampered regions are used as the center pixels to extract neighboring patches with a resolution of 32 ×32. Meanwhile, the same operation is applied to the same pixels in the original corresponding image. Finally, a total of 115,000 patches labeled as tampered and 115,000 patches labeled as un-tampered are obtained for training the proposed C- CNN. Because the functions of a C-CNN and an R-CNN differ, different training image sets are created. For the training set of the R-CNN, the size of the patch W r is set as 96. First, the Canny edge detector [5] is used to obtain the edges of the original images. Next, pixels along the edges are used as the center pixels to extract patches with a resolution of 96 ×96, and these extracted patches are labeled as un-tampered. In a splicing forgery image, similar to the generation of the training set of the C-CNN, pixels along the contour of the tampered regions are then used as the center pixels to extract the neighboring patches with a resolution of 96 ×96, and these extracted patches are labeled as tampered. Finally, 230,000 patches in total (115,000 patches labeled as tampered and 115,000 patches labeled as un-tampered) are obtained as the training image set for the R-CNN.
对于一个 C-CNN 的训练图像集,图像块 Wc 的大小设置为 32。在每张拼接伪造图像中,沿篡改区域轮廓的像素作为中心像素,提取相邻的块 分辨率为 32 × 32。 同时,对原始对应图像中的相同像素应用相同的操作。 最后,总共获得了 115,000 个标记为篡改的补丁和 115,000个标记为未篡改的补丁,用于训练所提出的 C-CNN。
由于 C-CNN 和 R-CNN 的功能不同,因此创建了不同的训练图像集。 对于 R-CNN 的训练集,patch Wr 的大小设置为 96。首先,使用 Canny 边缘检测器 [5] 获取原始图像的边缘。 接下来,将沿边缘的像素作为中心像素,提取分辨率为 96×96 的补丁,并将这些提取的补丁标记为未篡改。 在拼接伪造图像中,类似于 C-CNN 训练集的生成,然后将沿篡改区域轮廓的像素作为中心像素,提取分辨率为 96×96 的相邻块,这些 提取的补丁被标记为被篡改。 最后,总共获得 230,000 个补丁(115,000 个被标记为篡改的补丁和 115,000 个被标记为未篡改的补丁)作为 R-CNN 的训练图像集。
3.2 Analysis of cluster number n
-
In the proposed post-processing approach, a K-means algorithm [13] is used to divide Net_out into the clusters { C 1 , C 2 , . . . , C n −1 , C n } , and to then calculate the adaptive threshold t h to filter out the clusters that are inaccurately detected through C2RNet. How the cluster number n is chosen is a key issue because it is related to the adaptive threshold th , and thus affects the performance of the detection method. For testing the performance of the proposed splicing forgery detection method with the number of clusters, experiments using 2, 4, 6, 8, 10, and 12 clusters when applying the K -means algorithm [13] are conducted. The experiment results of the proposed splicing forgery detection method are shown in Fig. 8 .
In this figure, each point indicates the average value of 98 images. When the number of clusters n is 4, the F -measure of its corresponding experiment result is the best. According to this result, four clusters are configured in the following experiments.在提出的后处理方法中,使用K-means算法将Net_out分成簇{C1,C2,……, Cn-1, Cn},然后计算自适应阈值th以过滤出通过C2RNet不准确检测到的簇。如何选择簇数n是一个关键问题,因为它与自适应阈值th有关,从而影响检测方法的性能。为了测试所提出的拼接伪造检测方法在簇数上的性能,在使用K-means算法 [13] 时使用2、4、6、8、10和12个簇进行了实验。所提出的拼接伪造检测方法的实验结果如图所示。8.在该图中,每个点指示98个图像的平均值。当簇数n为4时,相应实验结果的F测度最好。根据这个结果,在下面的实验中配置了四个集群。
Most of the source codes of these comparative detection methods were implemented by Zampoglou et al. [41]
M. Zampoglou , S. Papadopoulos , Y. Kompatsiaris , Large-scale evaluation of splicing localization algorithms for web images, Multimed. Tools Appl. 76
(4) (2017) 4 801–4 834 .
4. Conclusion
In this paper, a novel image splicing forgery detection method based on a CNN was proposed. For the proposed method, the proposed C2RNet is used to obtain the predicted results and apply a post-processing approach to locate the final detected tampered regions. The proposed C2RNet, which includes a C-CNN and R-CNN, is a progressive process network. A C-CNN can generally predict suspicious coarse forgery regions, and C2RNet then utilizes an R-CNN to further obtain refined results based on the detection results from the C-CNN.
本文提出了一种新的基于CNN的图像拼接伪造检测方法。对于所提出的方法,所提出的 C2RNet 用于获得预测结果并应用后处理方法来定位最终检测到的篡改区域。提出的 C2RNet 包括 C-CNN 和 R-CNN,是一个渐进式过程网络。 C-CNN 通常可以预测可疑的粗略伪造区域,然后 C2RNet 根据 C-CNN 的检测结果利用 R-CNN 进一步获得精细结果。在两个公共数据集上对所提出的检测方法进行了评估并与其他最先进的检测方法进行了比较,实验结果表明,所提出的方法比其他以前的检测方法取得了更好的结果。
个人总结:
- 实验做得比较完整,之后做实验可以借鉴该篇的消融实验设置。
- 一开始就想到用滑动窗口,所谓的patch去做有点意思。
- 2.1.3里把patch-level cnn改为image-level cnn这一部分有点没看懂,需要结合代码看一下,但是感觉可以再改进为用transformer。
- 自适应阈值这一部分没看懂啊!!之后有时间再回过头来补充吧!
- https://www.cnblogs.com/wenshinlee/p/12082856.html 这个博主解读的不错。