pytorch种, 一维Conv1d, 二维Conv2d
pytorch之nn.Conv1d详解
之前学习pytorch用于文本分类的时候,用到了一维卷积,花了点时间了解其中的原理,看网上也没有详细解释的博客,所以就记录一下。
Conv1d
class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
in_channels(int) – 输入信号的通道。在文本分类中,即为词向量的维度
out_channels(int) – 卷积产生的通道。有多少个out_channels,就需要多少个1维卷积
kerner_size(int or tuple) - 卷积核的尺寸,卷积核的大小为(k,),第二个维度是由in_channels来决定的,所以实际上卷积大小为kerner_size*in_channels
stride(int or tuple, optional) - 卷积步长
padding (int or tuple, optional)- 输入的每一条边补充0的层数
dilation(int or tuple, `optional``) – 卷积核元素之间的间距
groups(int, optional) – 从输入通道到输出通道的阻塞连接数
bias(bool, optional) - 如果bias=True,添加偏置
举个例子:
1 2 3 4 5 6 | conv1 = nn.Conv1d(in_channels = 256 ,out_channels = 100 ,kernel_size = 2 ) input = torch.randn( 32 , 35 , 256 ) # batch_size x text_len x embedding_size -> batch_size x embedding_size x text_len input = input .permute( 0 , 2 , 1 ) out = conv1( input ) print (out.size()) |
这里32为batch_size,35为句子最大长度,256为词向量
再输入一维卷积的时候,需要将32*25*256变换为32*256*35,因为一维卷积是在最后维度上扫的,最后out的大小即为:32*100*(35-2+1)=32*100*34
附上一张图,可以很直观的理解一维卷积是如何用的:
图中输入的词向量维度为5,输入大小为7*5,一维卷积和的大小为2、3、4,每个都有两个,总共6个特征。
对于k=4,见图中红色的大矩阵,卷积核大小为4*5,步长为1。这里是针对输入从上到下扫一遍,输出的向量大小为((7-4)/1+1)*1=4*1,最后经过一个卷积核大小为4的max_pooling,变成1个值。最后获得6个值,进行拼接,在经过一个全连接层,输出2个类别的概率。
附上一个代码来详解:
其中,embedding_size=256, feature_size=100, window_sizes=[3,4,5,6], max_text_len=35
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | class TextCNN(nn.Module): def __init__( self , config): super (TextCNN, self ).__init__() self .is_training = True self .dropout_rate = config.dropout_rate self .num_class = config.num_class self .use_element = config.use_element self .config = config self .embedding = nn.Embedding(num_embeddings = config.vocab_size, embedding_dim = config.embedding_size) self .convs = nn.ModuleList([ nn.Sequential(nn.Conv1d(in_channels = config.embedding_size, out_channels = config.feature_size, kernel_size = h), # nn.BatchNorm1d(num_features=config.feature_size), nn.ReLU(), nn.MaxPool1d(kernel_size = config.max_text_len - h + 1 )) for h in config.window_sizes ]) self .fc = nn.Linear(in_features = config.feature_size * len (config.window_sizes), out_features = config.num_class) if os.path.exists(config.embedding_path) and config.is_training and config.is_pretrain: print ( "Loading pretrain embedding..." ) self .embedding.weight.data.copy_(torch.from_numpy(np.load(config.embedding_path))) def forward( self , x): embed_x = self .embedding(x) #print('embed size 1',embed_x.size()) # 32*35*256 # batch_size x text_len x embedding_size -> batch_size x embedding_size x text_len embed_x = embed_x.permute( 0 , 2 , 1 ) #print('embed size 2',embed_x.size()) # 32*256*35 out = [conv(embed_x) for conv in self .convs] #out[i]:batch_size x feature_size*1 #for o in out: # print('o',o.size()) # 32*100*1 out = torch.cat(out, dim = 1 ) # 对应第二个维度(行)拼接起来,比如说5*2*1,5*3*1的拼接变成5*5*1 #print(out.size(1)) # 32*400*1 out = out.view( - 1 , out.size( 1 )) #print(out.size()) # 32*400 if not self .use_element: out = F.dropout( input = out, p = self .dropout_rate) out = self .fc(out) return out |
embed_x一开始大小为32*35*256,32为batch_size。经过permute,变为32*256*35,输入到自定义的网络后,out中的每一个元素,大小为32*100*1,共有4个元素。在dim=1维度上进行拼接后,变为32*400*1,在经过view,变为32*400,最后通过400*num_class大小的全连接矩阵,变为32*2。
===================================================================================================================================
Pytorch中计算卷积方法的区别(conv2d的区别)
在二维矩阵间的运算:
1 | class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride = 1 , padding = 0 , dilation = 1 , groups = 1 , bias = True ) |
对由多个特征平面组成的输入信号进行2D的卷积操作。
1 | torch.nn.functional.conv2d( input , weight, bias = None , stride = 1 , padding = 0 , dilation = 1 , groups = 1 ) |
在由多个输入平面组成的输入图像上应用2D卷积,这个操作其实和上面的操作是一样的,只不过这个操作多用于计算一组卷积核对于输入的卷积结果,而上面的那条代码更多的则是用在定义网络中去。
======================================================================================================================
先来看二维卷积conv2d
conv2d(input, filter, strides, padding, use_cudnn_on_gpu=True, data_format="NHWC", dilations=[1, 1, 1, 1], name=None)
"""Computes a 2-D convolution given 4-D `input` and `filter` tensors."""
给定4维的输入张量和滤波器张量来进行2维的卷积计算。
input:4维张量,形状:[batch, in_height, in_width, in_channels]
filter:滤波器(卷积核),4维张量,形状:[filter_height, filter_width, in_channels, out_channels]
strides:滤波器滑动窗口在input的每一维度上,每次要滑动的步长,是一个长度为4的一维张量。
padding:边界填充算法参数,有两个值:‘SAME’、‘VALID’。具体差别体现在卷积池化后,特征图的大小变化上面。卷积池化后特征矩阵的大小计算参见https://blog.csdn.net/qq_26552071/article/details/81171161
return:该函数返回一个张量,其类型与input输入张量相同。
再看一维卷积conv1d,python中的一维卷积最终还是通过二维卷积实现的,先将输入张量和滤波器的维度扩展,再调用二维卷积conv2d来实现。
def conv1d(value,filters, stride, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
"""Computes a 1-D convolution given 3-D input and filter tensors."""
给定三维的输入张量和滤波器来进行1维卷积计算。
input:3维张量,形状shape和data_format有关:
(1)data_format = "NWC", shape = [batch, in_width, in_channels]
(2)data_format = "NCW", shape = [batch, in_channels, in_width]
filters:3维张量,shape = [filter_width, in_channels, out_channels],
stride:滤波器窗口移动的步长,为一个整数。
padding:与上文一致。
由conv1d源码可以看出,一维卷积的实现,是先对输入张量和filter扩展了一维,然后调用二维卷积进行运算的:
1 2 3 4 5 6 7 8 9 10 | value = array_ops.expand_dims(value, spatial_start_dim) # 输入张量 filters = array_ops.expand_dims(filters, 0 ) # 滤波器 result = gen_nn_ops.conv2d( value, filters, strides, padding, use_cudnn_on_gpu = use_cudnn_on_gpu, data_format = data_format) return array_ops.squeeze(result, [spatial_start_dim]) |
下面为conv1d完整源码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | def conv1d(value, filters, stride, padding, use_cudnn_on_gpu = None , data_format = None , name = None ): with ops.name_scope(name, "conv1d" , [value, filters]) as name: # Reshape the input tensor to [batch, 1, in_width, in_channels] if data_format is None or data_format = = "NHWC" or data_format = = "NWC" : data_format = "NHWC" spatial_start_dim = 1 strides = [ 1 , 1 , stride, 1 ] elif data_format = = "NCHW" or data_format = = "NCW" : data_format = "NCHW" spatial_start_dim = 2 strides = [ 1 , 1 , 1 , stride] else : raise ValueError( "data_format must be \"NWC\" or \"NCW\"." ) value = array_ops.expand_dims(value, spatial_start_dim) filters = array_ops.expand_dims(filters, 0 ) result = gen_nn_ops.conv2d( value, filters, strides, padding, use_cudnn_on_gpu = use_cudnn_on_gpu, data_format = data_format) return array_ops.squeeze(result, [spatial_start_dim]) |
==============================================
备忘
input w*h
output wo*ho
filter F
Padding P
stride S
wo = (w - F + 2*P)/S +1
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步