神经网络中concatenate和add层的不同

在网络结构的设计上，经常说DenseNet和Inception中更多采用的是concatenate操作，而ResNet更多采用的add操作，那么这两个操作有什么异同呢？

concatenate操作是网络结构设计中很重要的一种操作，经常用于将特征联合，多个卷积特征提取框架提取的特征融合或者是将输出层的信息进行融合，而add层更像是信息之间的叠加。

This reveals that both DenseNets and ResNets densely aggregate features from prior layers and their essential difference is how features are aggregated: ResNets aggregate features by summation and DenseNets aggregate them by concatenation.

Resnet是做值的叠加，通道数是不变的，DenseNet是做通道的合并。你可以这么理解，add是描述图像的特征下的信息量增多了，但是描述图像的维度本身并没有增加，只是每一维下的信息量在增加，这显然是对最终的图像的分类是有益的。而concatenate是通道数的合并，也就是说描述图像本身的特征增加了，而每一特征下的信息是没有增加。

在代码层面就是ResNet使用的都是add操作，而DenseNet使用的是concatenate。

这些对我们设计网络结构其实有很大的启发。

通过看keras的源码，发现add操作，

def _merge_function(self, inputs):
    output = inputs[0]
    for i in range(1, len(inputs)):
        output += inputs[i]
    return output

执行的就是加和操作，举个例子

import keras
input1 = keras.layers.Input(shape=(16,))
x1 = keras.layers.Dense(8, activation='relu')(input1)
input2 = keras.layers.Input(shape=(32,))
x2 = keras.layers.Dense(8, activation='relu')(input2)
added = keras.layers.add([x1, x2])
out = keras.layers.Dense(4)(added)
model = keras.models.Model(inputs=[input1, input2], outputs=out)
model.summary()

打印出来模型结构就是：

__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 16) 0
__________________________________________________________________________________________________
input_2 (InputLayer) (None, 32) 0
__________________________________________________________________________________________________
dense_1 (Dense) (None, 8) 136 input_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 8) 264 input_2[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 8) 0 dense_1[0][0]
dense_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 4) 36 add_1[0][0]
==================================================================================================
Total params: 436
Trainable params: 436
Non-trainable params: 0

__________________________________________________________________________________________________

这个比较好理解，add层就是接在dense_1,dense_2后面的是一个连接操作，并没有训练参数。

相对来说，concatenate操作比较难理解一点。

if py_all([is_sparse(x) for x in tensors]):
    return tf.sparse_concat(axis, tensors)
else:
    return tf.concat([to_dense(x) for x in tensors], axis)

通过keras源码发现，一个返回sparse_concate，一个返回concate，这个就比较明朗了，

concate操作，举个例子

t1 = [[1, 2, 3], [4, 5, 6]]
t2 = [[7, 8, 9], [10, 11, 12]]
tf.concat([t1, t2], 0) ==> [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
tf.concat([t1, t2], 1) ==> [[1, 2, 3, 7, 8, 9], [4, 5, 6, 10, 11, 12]]

# tensor t3 with shape [2, 3]
# tensor t4 with shape [2, 3]
tf.shape(tf.concat([t3, t4], 0)) ==> [4, 3]
tf.shape(tf.concat([t3, t4], 1)) ==> [2, 6]

事实上，是关于维度的一个联合，axis=0表示列维，1表示行维，沿着通道维度连接两个张量。另一个sparse_concate则是关于稀疏矩阵的级联，也比较好理解。

posted @ 2020-10-25 14:54 瘋耔阅读(1345) 评论(0) 编辑收藏举报

刷新页面返回顶部