Image Splicing Localization Using A Multi-Task Fully Convolutional Network (MFCN)

论文解读-Image Splicing Localization Using A Multi-Task Fully Convolutional Network (MFCN)

论文链接:TransForensics: Image Forgery Localization with Dense Self-Attention

摘要

在这项工作中,我们提出了一种利用全卷积网络 (FCN) 来定位图像拼接攻击的技术。我们首先评估了仅在表面标签上训练的单任务FCN (SFCN)。尽管SFCN显示出比现有方法具有卓越的性能,但在某些情况下,它仍然提供粗略的本地化输出。因此,我们建议使用多任务FCN (MFCN),该多任务FCN利用两个输出分支进行多任务学习。一个分支用于学习曲面标签,而另一个分支用于学习拼接区域的边缘或边界。我们使用CASIA v2.0数据集训练了网络,并在CASIA v1.0,Columbia uncompresed,Carvalho和DARPA/NIST Nimble Challenge 2016 SCI数据集上测试了训练过的模型。实验表明,SFCN和MFCN优于现有的拼接定位算法,并且MFCN可以实现比SFCN更好的定位。

1 Introduction

  • The base network architecture is the FCN VGG-16 architecture with skip connections, but we incorporate several modifications, including batch normalization layers and class weighting.

    基本网络体系结构是带有跳过连接的FCN VGG-16体系结构,但我们进行了一些修改,包括批量规范化层和类权衡。

  • Thus, we next propose the use of a multi-task FCN (MFCN) that utilizes two output branches for multi-task learning. One branch is used to learn the surface label, while the other branch is used to learn the edge or boundary of the spliced region. It is shown that by simultaneously training on the surface and edge labels, we can achieve finer localization of the spliced region, as compared to the SFCN. Once the MFCN was trained, we evaluated two different inference approaches. The first approach utilizes only the surface output probability map in the inference step. The second approach, which is referred to as the edge-enhanced MFCN, utilizes both the surface and edge output probability maps to achieve finer localization.

    因此,我们接下来建议使用多任务 FCN (MFCN),它利用两个输出分支进行多任务学习。一个分支用于学习表面标签,而另一个分支用于学习拼接区域的边缘或边界。结果表明,与 SFCN 相比,通过同时训练表面和边缘标签,我们可以实现拼接区域的更精细定位。训练 MFCN 后,我们评估了两种不同的推理方法。第一种方法在推理步骤中仅使用表面输出概率图。第二种方法,称为边缘增强 MFCN,利用表面和边缘输出概率图来实现更精细的定位。

  • Furthermore, we show that after applying various post-processing techniques such as JPEG compression, blurring, and added noise to the spliced images, the SFCN and MFCN methods still outperform the existing methods.

    此外,我们表明,在对拼接图像应用各种后处理技术(如 JPEG 压缩、模糊和添加噪声)后,SFCN 和 MFCN 方法仍然优于现有方法。

3 Proposed Methods

3.1 Brief Review of Fully Convolutional Networks (FCNs)

  • In [24], the authors adapted common classification net-works into fully convolutional ones for the task of semantic segmentation. It was shown in [24] that FCNs can efficiently learn to make dense predictions for per-pixel tasks such as semantic segmentation.

3.2 Single-task Fully Convolutional Network (SFCN)

  • In addition, we incorporated several modifications, including batch normalization and class weighting. We utilized batch normalization to eliminate the bias and normalize the inputs at each layer [17]. Class weighting refers to the application of different weights to the different classes in the loss function.

    此外,我们还进行了一些修改,包括批量标准化和类权重。我们利用批量标准化来消除偏差,并对每一层的输入进行标准化[17]。类别权重是指对损失函数中的不同类别应用不同的权重。

  • We apply a larger weight to the spliced pixels (since there are fewer spliced pixels than non-splicedones).In particular, we used median frequency class weighting [13, 2].

    我们对拼接像素施加更大的权重 (因为拼接像素比非拼接像素少)。特别是,我们使用了中值频率类加权。

3.3 Multi-task Fully Convolutional Network (MFCN)

  • In our work, we adopt the idea in [30] of utilizing a multi-task network, but we
    incorporate several modifications, including skip connections, batch normaliza-
    tion, and class weighting (as discussed in Section 3.2). In contrast to the SFCN,
    the MFCN utilizes two output branches for multi-task learning. One branch
    is used to learn the surface label, while the other branch is used to learn the
    edge or boundary of the spliced region.

    在我们的工作中,我们采用了 [30] 中利用多任务网络的想法,但我们合并了一些修改,包括跳过连接、批量归一化和类权重(如第 3.2 节所述)。与 SFCN 相比,MFCN 利用两个输出分支进行多任务学习。一个分支用于学习表面标签,而另一个分支用于学习拼接区域的边缘或边界。

  • The architecture of the MFCN used in our paper is shown in Fig. 3. In addition to the surface labels, the boundaries between inserted regions and their host background can be an important indicator of a manipulated area. This is what motivated us to use a multi-task learning network. The weights or parameters of the network are influenced by both the surface and edge labels during the training process. By simultaneously training on the surface and edge labels, we are able to obtain a finer localization of the spliced region, as compared to training only on the surface labels. Once the network was fully trained, we evaluated two different binary output mask generation approaches. In the first approach, we extract the surface output probability map, and then threshold it to yield the binary system output mask. In this approach, the edge output probability map is not utilized in the inference step. Please note that the edge label still influenced the weights of the network during the training process.

    在我们的工作中,我们采用了 [30] 中利用多任务网络的想法,但我们合并了一些修改,包括跳过连接、批量归一化和类权重(如第 3.2 节所述)。与 SFCN 相比,MFCN 利用两个输出分支进行多任务学习。一个分支用于学习表面标签,而另一个分支用于学习拼接区域的边缘或边界。我们论文中使用的 MFCN 的架构如图 3 所示。除了表面标签外,插入区域和它们的宿主背景之间的边界可能是操作区域的重要指标。这就是促使我们使用多任务学习网络的原因。在训练过程中,网络的权重或参数受表面和边缘标签的影响。通过同时在表面和边缘标签上进行训练,与仅在表面标签上进行训练相比,我们能够获得拼接区域的更精细定位。网络经过充分训练后,我们评估了两种不同的二进制输出掩码生成方法。在第一种方法中,我们提取表面输出概率图,然后对其进行阈值化以产生二进制系统输出掩码。在这种方法中,在推理步骤中不使用边缘输出概率图。请注意,在训练过程中,边缘标签仍然会影响网络的权重。

3.4 Edge-enhanced MFCN Inference

  • The second inference strategy, which we refer to as the edge-enhanced MFCN,
    utilizes both the surface and edge output probability maps, as described in the
    following steps:
    1. We threshold the surface probability map with a given threshold.
    2. We threshold the edge probability map with a given threshold.
    3. Next, we apply hole-filling to the output of step (2), yielding the hole-filled,
      thresholded edge mask.
    4. Finally, we generate the binary system output mask by computing the
      intersection of the output of step (1) and output of step (3).

It is shown in this paper that by utilizing both the edge and surface probability
maps in the inference step, we obtain finer localization of the spliced region. An
example illustrating inference with edge-enhancement is shown in Figure 4. It
can be seen that utilizing both the edge and surface probability maps leads to
a finer localization of the spliced region.

第二种推理策略,我们称为边缘增强 MFCN,利用表面和边缘输出概率图,如以下步骤所述:

  1. 我们使用给定阈值对表面概率图进行阈值化。
  2. 我们用给定的阈值对边缘概率图进行阈值化。
  3. 接下来,我们将孔填充应用于步骤 (2) 的输出,产生孔填充的阈值边缘掩码。
  4. 最后,我们通过计算步骤(1)的输出和步骤(3)的输出的交集来生成二进制系统输出掩码。

3.5 Training and Testing Procedure

For the MFCN, the total loss function, Lt, is the sum of the loss corresponding to the surface label and the loss corresponding to the edge label, denoted by Ls and Le, respectively. Thus, we have
Lt = Ls + Le,
where Ls and Le are per-pixel softmax loss functions. In addition, we apply median-frequency class weighting to the surface and edge loss functions, as described in Sections 3.2 and 3.3. For the SFCN, the total loss function is equal to the surface loss function Ls.

4 Performance Evaluation Metrics

For each output map, we varied the threshold and picked the optimal threshold (this is done for each method). This technique of varying the threshold was also utilized by Zampoglou et. al. in [37]

Once the MFCN or SFCN is trained, we use the trained model to evaluate other images not in the training set. We evaluated the performance of the proposed and existing methods using the F1 and Matthews Correlation Coefficient (MCC) metrics, which are per-pixel localization metrics.

5 Experimental Results

6 Conclusion

It was demonstrated in this work that the application of FCN to the splicing localization problem yields large improvement over current published techniques.
The FCN we utilized is based on the FCN VGG-16 architecture with skip connections, and we incorporated several modifications, such as batch normalization layers and class weighting. We first evaluated a single-task FCN (SFCN) trained only on the surface ground truth mask (which classifies each pixel in an image as spliced or authentic). Although the single-task network is shown to outperform existing techniques, it can still yield a coarse localization output incertain cases. Thus, we next proposed the use of a multi-task FCN (MFCN) that is simultaneously trained on the surface ground truth mask and the edge ground truth mask, which indicates whether each pixel belongs to the boundary of the spliced region. For the MFCN-based method, we presented two different inference approaches. In the first approach, we compute the binary system output mask by thresholding the surface output probability map. In this approach, the edge output probability map is not utilized in the inference step. This first MFCN-based inference approach is shown to outperform the SFCN-based approach. In the second MFCN-based inference approach, which we refer to as edge-enhanced MFCN, we utilize both the surface and edge output probability map when generating the binary system output mask. The edge-enhanced MFCN is shown to yield finer localization of the spliced region, as compared to the SFCN-based approach and the MFCN without edge-enhanced inference. The proposed methods were evaluated on manipulated images from the Carvalho, CASIA v1.0, Columbia, and the DARPA/NIST Nimble Challenge 2016 SCI datasets. The experimental results showed that the proposed methods outperform existing splicing localization methods on these datasets, with the edge-enhanced MFCN performing the best.

个人总结

这篇文章主要的几个contribution:

  1. 几个FCN模型,还都是用的别人的结构,差评!
  2. 两种推理方式的对比:一个是直接预测,另一个结合了边缘概率图。
  3. 对结构做的一些小修改还是可以借鉴的,包括批量规范化层和类权衡等,再进一步看下代码。
  4. 训练数据需要用到edge label,得自己生成。
posted @ 2022-04-06 15:16  梁君牧  阅读(375)  评论(1编辑  收藏  举报