论文解读-《MA-Net: A Multi-Scale Attention Network for Liver and Tumor Segmentation》
MA-Net: A Multi-Scale Attention Network for Liver and Tumor Segmentation
-
we propose a novel network named Multi-scale Attention Net (MA-Net) by introducing self-attention mechanism into our method to adaptively integrate local features with their global dependencies. The MA-Net can capture rich contextual dependencies based on the attention mechanism. We design two blocks: Position-wise Attention Block (PAB) and Multi-scale Fusion Attention Block (MFAB). The PAB is used to model the feature interdependencies in spatial dimensions, which capture the spatial dependencies between pixels in a global view. In addition, the MFAB is to capture the channel dependencies between any feature map by multi-scale semantic feature fusion.
通过在我们的方法中引入自我注意机制,我们提出了一种名为多尺度注意网 (ma-net) 的新型网络,以自适应地将局部特征与其全局依赖性相结合。MA-Net可以基于注意力机制捕获丰富的上下文依赖关系。我们设计了两个块: 位置注意块 (PAB) 和多尺度融合注意块 (MFAB)。PAB用于对空间维度中的特征相互依赖性进行建模,从而捕获全局视图中像素之间的空间依赖性。此外,MFAB是通过多尺度语义特征融合来捕获任何特征图之间的通道依赖性。
I. INTRODUCTION
-
Although the variances of skip-connections proposed help to capture rich different-levels semantic features, it cannot describe spatial and channel-wise relationships between pixels of image, which are essential for medical image segmentation.
尽管提出的跳过连接的方差有助于捕获丰富的不同级别的语义特征,但它无法描述图像像素之间的空间和通道关系,这对于医学图像分割至关重要。
-
In addition to designing skip connections to fuse different-level semantic features, other state-of-the-art methods based on FCNs architecture have been proposed to capture Multi-scale context feature information of image via using dilated convolutions with different sampling rate and pooling operations [11]–[14]. For example, [14] designed Residual Multi-kernel Pooling (RMP) strategy which has different-size pooling kernels to fuse multi-scale context feature information. The dilated convolutions with different sampling rate and pooling operations are used to obtain rich Multi-scale context information of images, which further improve segmentation performance. However, the dilated convolutions and pooling operations cannot leverage the spatial and channel-wise relationship between pixels in a global view. Moreover, it is easy to loss details from the feature map information by using pooling operations.
除了设计跳跃连接来融合不同级别的语义特征外,还提出了其他基于 FCN 架构的最新方法,通过使用不同采样率的扩张卷积和池化来捕获图像的多尺度上下文特征信息 操作 [11]-[14]。 例如,[14] 设计了残差多内核池(RMP)策略,该策略具有不同大小的池内核来融合多尺度上下文特征信息。 不同采样率的空洞卷积和池化操作用于获取丰富的图像多尺度上下文信息,进一步提高分割性能。 然而,空洞卷积和池化操作不能利用全局视图中像素之间的空间和通道关系。 此外,使用池化操作很容易从特征图信息中丢失细节。
-
In order to address the above problems, we propose an novel network architecture named Multi-scale Attention-Net(MA-Net) for liver and tumors segmentation, which is shown in Fig1. The self-attention mechanism is used in the MA-Net. Specifcally, we use two blocks based on self-attention mechanism to capture spatial and channel dependencies of feature maps. One is Position-wise Attention Block (PAB), and the other is Multi-scale Fusion Attention Block (MFAB). ThePAB is used to obtain the spatial dependencies between pixels in feature maps by a self-attention mechanism manner. The MFAB is used to capture the channel dependencies between any feature maps by applying attention mechanism. Besides considering the channel dependencies of high-level feature maps, the channel dependencies of Low-level feature maps are also considered in the MFAB. The channel dependencies of high-level and low-level feature maps are fused in a sum manner, which aims to obtain rich Multi-scale semantic information of feature maps by using attention mechanism and improve network performance.
为了解决上述问题,我们提出了一种新颖的网络体系结构,称为用于肝脏和肿瘤分割的多尺度注意网 (MA-Net),如图1所示。自我注意机制在MA-Net中使用。具体而言,我们使用两个基于自我注意机制的块来捕获特征图的空间和通道依赖性。一种是位置注意力块 (PAB),另一种是多尺度融合注意力块 (MFAB)。PAB用于通过自我注意机制的方式获得特征图中像素之间的空间依赖性。MFAB用于通过应用注意力机制来捕获任何特征图之间的通道依赖性。除了考虑高级特征图的通道依赖性外,MFAB中还考虑了低级特征图的通道依赖性。将高层和低层特征图的通道依赖关系以sum方式进行融合,旨在利用注意力机制获取特征图丰富的多尺度语义信息,提高网络性能。
main contributions:
-
We propose a novel network named Multi-scale Attention-Net with the dual attention mechanism to enhance the ability of feature representation for liver and tumors
segmentation我们提出了一种新的具有双重注意机制的多尺度注意网络,以增强肝脏和肿瘤分割的特征表示能力。
-
We design two Blocks with self-attention mechanism: Position-wise Attention Block (PAB) and Multi-scale Fusion Attention Block (MFAB). We use PAB and MFAB to capture attention feature maps of spatial and channel levels. The PAB is proposed to obtain the spatial dependencies between pixels in a global view, and the MFAB is to capture the channel dependencies between any feature maps by fusing high and low level semantic features.
我们设计了两个具有自我注意机制的块:Position-wise Attention Block (PAB) 和 Multi-scale Fusion Attention Block (MFAB)。我们使用 PAB 和 MFAB 来捕获空间和通道级别的注意力特征图。 PAB 被提出来获取全局视图中像素之间的空间依赖关系,MFAB 是通过融合高低级语义特征来捕获任何特征图之间的通道依赖关系。
III. MATH
-
In this section, we describe the proposed method in detail including Res-block, Position-wise Attention Block and Multi-scale Fusion Attention Block. We adopt the improved encoder-decoder architecture of U-Net for liver and tumors segmentation in the paper. The Res-block consists of three 3×3 Convolution blocks and residual connections to extract high-dimensional feature information. The Position-wise Attention Block is used to capture the spatial dependencies of feature maps, and the Multi-scale Fusion Attention Block is to aggregate the channel dependencies between any feature maps via fusing High and Low-level feature information.
在本节中,我们详细描述了所提出的方法,包括 Res-block、Position-wise Attention Block 和 Multi-scale Fusion Attention Block。我们在论文中采用改进的 U-Net 编码器-解码器架构进行肝脏和肿瘤分割。 Res-block 由三个 3×3 卷积块和残差连接组成,用于提取高维特征信息。 Position-wise Attention Block 用于捕捉特征图的空间依赖关系,Multi-scale Fusion Attention Block 是通过融合 High-Low-level 特征信息来聚合任意特征图之间的通道依赖关系。
A. RES-BLOCK
-
With the increasing of network layers, [5] designed a novel skip connection named residual connection to address the problem of vanishing gradient. Inspired the residual connections, we use three 3 × 3 Conv blocks and one residual connection to capture high-dimensional feature information of CT images in the encoder path. The 1 × 1 Conv is to control the number of input channels. Because the size of the experimental platform’s memory is limited, the batch size usually is small in the image segmentation field. The small batch size can cause performance degradation of model. Hence, [36] proposed the group normalization to alleviate the problem. We replace Batch Normalization with group normalization in the MA-Net. We use the group normalization in the Res-block. The frame of Res-block is shown as Fig2.
随着网络层的增加,[5]设计了一种新的跳跃连接剩余连接来解决梯度消失的问题。受残差连接的启发,我们使用3个3×3 Conv块和1个残差连接在编码器路径上捕获CT图像的高维特征信息。1×1 Conv是用来控制输入通道的数量的。由于实验平台的内存大小有限,在图像分割领域的批量大小通常较小。小批量处理会导致模型的性能下降。因此,[36]提出了群体规范化来缓解这个问题。我们在MA-Net中用群归一化来代替批处理归一化。我们在res块中使用组标准化。res块的框架如图2所示。
B. POSITION-WISE ATTENTION BLOCK
-
Previous work [11], [37] have suggested that local feature information captured via using traditional convolutional net-work could lead to misclassification of objects. In orderto capture rich contextual relationships over local feature maps, [26] designed a position attention module. Inspired the position attention module, we use PAB to capture thespatial dependencies between any two position feature maps. The PAB can model a wider range of rich spatial contextual information over local feature maps.
之前的工作[11],[37]认为,通过使用传统的卷积网络捕获的局部特征信息可能会导致对象的错误分类。为了在局部特征图上捕获丰富的上下文关系,[26]设计了一个位置注意模块。受位置注意模块的启发,我们使用PAB来捕获任意两个位置特征图之间的空间依赖关系。PAB可以在局部特征图上建模更广泛的丰富的空间上下文信息。
-
The final output O at each position is a weighted sum of the feature maps across all positions and original feature maps. Therefore, the final output O has a global contextual view and selectively aggregates rich contextual information over local feature maps according to the spatial attention map, and it considers the long-range spatial dependency between features in a global view, which improves intra-class correlation and semantic consistency.
每个位置的最终输出O是所有位置和原始特征图的加权和。因此,最终输出O具有全局上下文视图,并根据空间注意力图选择性地在局部特征图上聚合丰富的上下文信息,并且它考虑了全局视图中特征之间的长域空间依赖性,从而提高了类内相关性和语义一致性。
C. MULTI-SCALE FUSION ATTENTION BLOCK
-
The attention mechanism in the deep learning is similar to the human visual system. It aims to select information which is important for current task from a variety of information. SENet models the channel-wise dependencies among feature channels and automatically obtains the importance of each feature channel. The purpose is to enhance the helpful feature maps and suppress the feature maps that are useless for cur-rent task. The each channel feature map of high dimensions can be seen as class-specific response. The area of liver and tumor is relatively small compared to the whole CT image. Hence, we try to imitate physicians to review CT images via introducing attention mechanism into MA-Net. By capturing the channel-wise dependencies among feature maps, model can improve the ability of feature representation. Moreover,many previous works suggest that the Multi-scale information helps to improve the segmentation accuracy.
深度学习中的注意力机制类似于人类视觉系统。它旨在从各种信息中选择对当前任务重要的信息。SENet对特征通道之间的通道依赖关系进行建模,并自动获取每个特征通道的重要性。目的是增强有用的特征图,并抑制对当前任务无用的特征图。高维的每个通道特征图可以看作是特定于类的响应。与整个CT图像相比,肝脏和肿瘤的面积相对较小。因此,我们尝试通过将注意力机制引入MA-Net来模仿医生来检查CT图像。通过捕获特征图之间的通道依赖关系,模型可以提高特征表示的能力。此外,许多先前的工作表明,多尺度信息有助于提高分割精度。
-
Inspired [25], we design a novel Multi-scale Fusion Attention Block (MFAB) to extract the interdependence among feature channels via combining the High and Low-level feature maps. The MFAB is similar to the human visual system and automatically select the information that is important for liver and tumor segmentation from a variety of information. Our main idea for designing MFAB is that the MFAB learns the importance of each feature channels which come from multi-level feature maps without extra spatial dimension, and enhance the helpful feature maps and suppress feature maps that have less contribution for liver and tumor segmentation task according to the importance.
受[25]的启发,我们设计了一种新的多尺度融合注意块(MFAB),通过结合高、低层特征映射来提取特征通道之间的相互依赖关系。MFAB类似于人类视觉系统,可以从各种信息中自动选择对肝脏和肿瘤分割非常重要的信息。我们设计MFAB的主要思想是,MFAB在没有额外空间维度的情况下,学习来自多层特征图的每个特征通道的重要性,并根据重要性增强有用的特征图,抑制对肝脏和肿瘤分割任务贡献较小的特征图。
-
Specifically, we describe the interdependence of feature channels from Low-level and High-level feature maps. The High-level features have rich semantic information of image and the Low-level features from Skip-Connection have more edge information. The Low-level features are used to recover the details of images. The MFAB is shown as Fig4. We apply attention mechanism of channel-wise for High-level and Low-level features, respectively. The purpose is to increase the weight of important information for each feature channel in segmentation task and the useless feature information is omitted.
具体来说,我们描述了来自低级和高级特征图的特征通道的相互依赖性。 High-level 特征具有丰富的图像语义信息,而来自 Skip-Connection 的 Low-level 特征具有更多的边缘信息。低级特征用于恢复图像的细节。 MFAB 如图 4 所示。我们分别将通道注意机制应用于高级和低级特征。目的是增加分割任务中每个特征通道的重要信息的权重,省略无用的特征信息。