tensorflow中的padding方式SAME和VALID的区别
Tensorflow中的padding有两种方式,其中的SAME方式比较特殊,可能产生非对称pad的情况,之前在验证一个tensorflow的网络时踩到这个坑。
Tensorflow的计算公式#
二维卷积接口#
tf.nn.conv2d(
input, filters, strides, padding, data_format='NHWC', dilations=None, name=None
)
padding计算公式#
需要注意padding的配置,如果是字符串就有SAME
和VALID
两种配置,如果是数字list就明确表明padding在各个维度的数量。
首先,padding
如果表示确切的数字,其维度是input维度的2倍,因为每个维度两个边需要补pad,比如宽度的左边和右边,高度的上边和下边,但是tensorflow中不会给N维度以及C维度补pad,仅仅在W和H维度补pad,因此对于NHWC
,padding =[[0, 0], [pad_top, pad_bottom], [pad_left, pad_right], [0, 0]]
,对于NCHW
的pad的顺序换过来。
然后,如果输入的是字符串选项,补的pad都可以映射到padding这个参数上,
-
VALID
模式表示不在任何维度补pad,等价于padding =[[0, 0], [0, 0], [0, 0], [0, 0]]
; -
SAME
模式表示在stride
的尺度下,Wo
与Wi
保持在stride下保持一致(以宽度维度为例),需要满足如下关系我们知道如果
dilation = 1
,那么在某个维度上,卷积的输出宽度、输出宽度和步长,在没有pad的情况下,满足如下的关系式我们以补最小程度的为基准,于是有关系式
反推过来得到
这是需要补的总的pad,tensorflow的补充pad是尽量两边对称的,
- 如果是偶数,那么两边都补;
- 如果是奇数,那么左边补,右边是。
参考如下的代码
inputH, inputW = 7, 8
strideH, strideW = 3, 3
filterH = 4
filterW = 4
inputC = 16
outputC = 3
inputData = np.ones([1,inputH,inputW,inputC]) # format [N, H, W, C]
filterData = np.float16(np.ones([filterH, filterW, inputC, outputC]) - 0.33)
strides = (1, strideH, strideW, 1)
convOutputSame = tf.nn.conv2d(inputData, filterData, strides, padding='SAME')
convOutput = tf.nn.conv2d(inputData, filterData, strides, padding=[[0,0],[1,2],[1,1],[0,0]]) # padded input data
print("output1, ", convOutputSame)
print("output2, ", convOutput)
print("Sum of a - b is ", np.sum(np.square(convOutputSame - convOutput)))
计算结果是
output1, tf.Tensor(
[[[[ 96.46875 96.46875 96.46875]
[128.625 128.625 128.625 ]
[ 96.46875 96.46875 96.46875]]
[[128.625 128.625 128.625 ]
[171.5 171.5 171.5 ]
[128.625 128.625 128.625 ]]
[[ 64.3125 64.3125 64.3125 ]
[ 85.75 85.75 85.75 ]
[ 64.3125 64.3125 64.3125 ]]]])
output2, tf.Tensor(
[[[[ 96.46875 96.46875 96.46875]
[128.625 128.625 128.625 ]
[ 96.46875 96.46875 96.46875]]
[[128.625 128.625 128.625 ]
[171.5 171.5 171.5 ]
[128.625 128.625 128.625 ]]
[[ 64.3125 64.3125 64.3125 ]
[ 85.75 85.75 85.75 ]
[ 64.3125 64.3125 64.3125 ]]]], shape=(1, 3, 3, 3), dtype=float64)
Sum of a - b is 0.0
ONNX计算公式#
onnx的接口,参考IR定义如下
std::function<void(OpSchema&)> ConvOpSchemaGenerator(const char* filter_desc) {
return [=](OpSchema& schema) {
std::string doc = R"DOC(
The convolution operator consumes an input tensor and {filter_desc}, and
computes the output.)DOC";
ReplaceAll(doc, "{filter_desc}", filter_desc);
schema.SetDoc(doc);
schema.Input(
0,
"X",
"Input data tensor from previous layer; "
"has size (N x C x H x W), where N is the batch size, "
"C is the number of channels, and H and W are the "
"height and width. Note that this is for the 2D image. "
"Otherwise the size is (N x C x D1 x D2 ... x Dn). "
"Optionally, if dimension denotation is "
"in effect, the operation expects input data tensor "
"to arrive with the dimension denotation of [DATA_BATCH, "
"DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...].",
"T");
schema.Input(
1,
"W",
"The weight tensor that will be used in the "
"convolutions; has size (M x C/group x kH x kW), where C "
"is the number of channels, and kH and kW are the "
"height and width of the kernel, and M is the number "
"of feature maps. For more than 2 dimensions, the "
"kernel shape will be (M x C/group x k1 x k2 x ... x kn), "
"where (k1 x k2 x ... kn) is the dimension of the kernel. "
"Optionally, if dimension denotation is in effect, "
"the operation expects the weight tensor to arrive "
"with the dimension denotation of [FILTER_OUT_CHANNEL, "
"FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL ...]. "
"X.shape[1] == (W.shape[1] * group) == C "
"(assuming zero based indices for the shape array). "
"Or in other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL. ",
"T");
schema.Input(
2,
"B",
"Optional 1D bias to be added to the convolution, has size of M.",
"T",
OpSchema::Optional);
schema.Output(
0,
"Y",
"Output data tensor that contains the result of the "
"convolution. The output dimensions are functions "
"of the kernel size, stride size, and pad lengths.",
"T");
schema.TypeConstraint(
"T",
{"tensor(float16)", "tensor(float)", "tensor(double)"},
"Constrain input and output types to float tensors.");
schema.Attr(
"kernel_shape",
"The shape of the convolution kernel. If not present, should be inferred from input W.",
AttributeProto::INTS,
OPTIONAL);
schema.Attr(
"dilations",
"dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.",
AttributeProto::INTS,
OPTIONAL);
schema.Attr(
"strides",
"Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.",
AttributeProto::INTS,
OPTIONAL);
schema.Attr(
"auto_pad",
auto_pad_doc,
AttributeProto::STRING,
std::string("NOTSET"));
schema.Attr(
"pads",
pads_doc,
AttributeProto::INTS,
OPTIONAL);
schema.Attr(
"group",
"number of groups input channels and output channels are divided into.",
AttributeProto::INT,
static_cast<int64_t>(1));
schema.TypeAndShapeInferenceFunction([](InferenceContext& ctx) {
propagateElemTypeFromInputToOutput(ctx, 0, 0);
convPoolShapeInference(ctx, true, false, 0, 1);
});
};
}
需要注意上面的auto_pad
选项,与tensorflow类似有3个选项
-
NOTSET
,同tensorflow的VALID
-
SAME_UPPER
或者SAME_LOWER
,这里的内容可以参考onnx文件defs.cc
中的代码std::vector<int64_t> pads; if (getRepeatedAttribute(ctx, "pads", pads)) { if (pads.size() != n_input_dims * 2) { fail_shape_inference("Attribute pads has incorrect size"); } } else { pads.assign(n_input_dims * 2, 0); // pads的size是输入维度的2倍 const auto* auto_pad_attr = ctx.getAttribute("auto_pad"); if ((nullptr != auto_pad_attr) && (auto_pad_attr->s() != "VALID")) { // 如果pad mode是SAME_UPPER或者SAME_LOWER则进入该分支 int input_dims_size = static_cast<int>(n_input_dims); for (int i = 0; i < input_dims_size; ++i) { // 计算每个axis的pads int64_t residual = 0; int64_t stride = strides[i]; if (stride > 1) { // 如果stride == 1,那么total_pad就是Wk - Stride = Wk - 1 if (!input_shape.dim(2 + i).has_dim_value()) { continue; } residual = input_shape.dim(2 + i).dim_value(); while (residual >= stride) { residual -= stride; } } int64_t total_pad = residual == 0 ? effective_kernel_shape[i] - stride : effective_kernel_shape[i] - residual; // effective_kernel_shape = Wk if (total_pad < 0) total_pad = 0; int64_t half_pad_small = total_pad >> 1; int64_t half_pad_big = total_pad - half_pad_small; if (auto_pad_attr->s() == "SAME_UPPER") { // pad mode is here pads[i] = half_pad_small; pads[i + input_dims_size] = half_pad_big; } else if (auto_pad_attr->s() == "SAME_LOWER") { pads[i] = half_pad_big; pads[i + input_dims_size] = half_pad_small; } } } }
上面的代码里面最难理解14~23行,其实这里计算total_pad就是tensorflow中的,以上面的公式,做更进一步的推导,
下面的分析有两种情况,对应代码第23行,
- 如果是的整数倍,那么,带入上面的公式有;
- 如果不是的整数倍,那么,带入上面的公式有,这个就是被Stride相除之后的余数,即代码中的
residual
。
SAME_UPPER
和SAME_LOWER
对应是奇数的情况,如果是偶数,结果一样,如果是奇数,那么SAME_UPPER
放小半部分,SAME_LOWER
放大半部分。
举例#
在python - What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow? - Stack Overflow上有一个比较具体的例子,可以看到,使用SAME
模式可以使得stride刚好完整取完所有的input而不会有剩余或者短缺。
-
VALID
模式inputs: 1 2 3 4 5 6 7 8 9 10 11 (12 13) |________________| dropped |_________________|
-
SAME
模式pad| |pad inputs: 0 |1 2 3 4 5 6 7 8 9 10 11 12 13|0 0 |________________| |_________________| |________________|
在这个例子中。
参考资料#
- TensorFlow中CNN的两种padding方式“SAME”和“VALID” - wuzqChom的博客 - CSDN博客
- python - What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow? - Stack Overflow
- What does the 'same' padding parameter in convolution mean in TensorFlow? - Quora
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 【译】Visual Studio 中新的强大生产力特性
· 2025年我用 Compose 写了一个 Todo App
· 张高兴的大模型开发实战:(一)使用 Selenium 进行网页爬虫