Loading

『论文』SECOND

『论文』SECOND

1. Introduction

The key contributions of our work are as follows:

  • We propose an improved method of sparse convolution that allows it to run faster.
  • We propose a novel angle loss regression approach that demonstrates better orientation regression performance than other methods do.
  • We introduce a novel data augmentation method for LiDAR-only learning problems that greatly increases the convergence speed and performance.
模型整体config概览
(vfe): MeanVFE()
  (backbone_3d): VoxelBackBone8x(
    (conv_input): SparseSequential(
      (0): SubMConv3d()
      (1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (conv1): SparseSequential(
      (0): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (conv2): SparseSequential(
      (0): SparseSequential(
        (0): SparseConv3d()
        (1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (2): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (conv3): SparseSequential(
      (0): SparseSequential(
        (0): SparseConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (2): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (conv4): SparseSequential(
      (0): SparseSequential(
        (0): SparseConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (2): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (conv_out): SparseSequential(
      (0): SparseConv3d()
      (1): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): ReLU()
    )
  )
  (map_to_bev_module): HeightCompression()
  (pfe): None
  (backbone_2d): BaseBEVBackbone(
    (blocks): ModuleList(
      (0): Sequential(
        (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
        (1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), bias=False)
        (2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (3): ReLU()
        (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (6): ReLU()
        (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (9): ReLU()
        (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (12): ReLU()
        (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (15): ReLU()
        (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (18): ReLU()
      )
      (1): Sequential(
        (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
        (1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (3): ReLU()
        (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (6): ReLU()
        (7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (9): ReLU()
        (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (12): ReLU()
        (13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (15): ReLU()
        (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (18): ReLU()
      )
    )
    (deblocks): ModuleList(
      (0): Sequential(
        (0): ConvTranspose2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): Sequential(
        (0): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
  )
  (dense_head): AnchorHeadSingle(
    (cls_loss_func): SigmoidFocalClassificationLoss()
    (reg_loss_func): WeightedSmoothL1Loss()
    (dir_loss_func): WeightedCrossEntropyLoss()
    (conv_cls): Conv2d(512, 18, kernel_size=(1, 1), stride=(1, 1))
    (conv_box): Conv2d(512, 42, kernel_size=(1, 1), stride=(1, 1))
    (conv_dir_cls): Conv2d(512, 12, kernel_size=(1, 1), stride=(1, 1))
  )
  (point_head): None
  (roi_head): None
)

3. SECOND Detector

3.1. Network Architecture

The proposed SECOND detector, depicted in Figure 1, consists of three components: (1) a voxelwise feature extractor; (2) a sparse convolutional middle layer; and (3) an RPN

3.1.1. Point Cloud Grouping

Here, we follow the simple procedure described in VoxelNet to obtain a voxel representation of point cloud data

Note: 即总结OpenPCDet里面的数据准备部分,VoxelGenearatorV2所做的。其实首先对应了之前我的猜想,就是仅根据已有points生成voxels的(有hash表的参与),而不是整个voxel map。另外值得一提的是,VoxelNet和SECOND这部分的设定都是Kitti dataset的config的默认设定,所以每个voxel尺寸是5个点,而PointPillars在自己的config上叠加了设定,每个voxel是32个点,且尺寸也变为[0.16, 0.16, 4]

- NAME: transform_points_to_voxels
      VOXEL_SIZE: [0.05, 0.05, 0.1]
      MAX_POINTS_PER_VOXEL: 5
      MAX_NUMBER_OF_VOXELS: {
        'train': 16000,
        'test': 40000
      }

于是,point cloud (#, 4) → voxel_features (32000, 5, 4), voxel_num_points (32000)

3.1.2. Voxelwise Feature Extractor

Note: 可以想象,这部分就是类似PointNet的部分,也是pillar VFE的第二部分,本来还是point-wise features organized in voxels还都在,现在提炼成voxel-wise features。值得一提的是,在OpenPCDet只是使用的mean VFE替代了这一步,求voxel内points features均值

VFE:
        NAME: MeanVFE

于是,voxel_features (32000, 5, 4) → voxel_features (32000, 4)

3.1.3. Sparse Convolutional Middle Extractor

Our middle extractor is used to learn information about the z-axis and convert the sparse 3D data into a 2D BEV image. Figure 3 shows the structure of the middle extractor. It consists of two phases of sparse convolution. Each phase contains several submanifold convolutional layers and one normal sparse convolution to perform downsampling in the z-axis. After the z-dimensionality has been downsampled to one or two, the sparse data are converted into dense feature maps. Then, the data are simply reshaped into image-like 2D data.


关于im2col

https://www.jianshu.com/p/93a1abcc4717

这篇写得挺好的。总而言之,im2col的目标就是,将siiding window转化成矩阵相乘的形式,于是可以利用更高效的GEMM方法。具体来说,每个sliding window内的每次操作都flatten,kernel也flatten,这样两个vector的乘法就是element-wise multiplication + summation了。而多channel,就是继续铺平。总而言之,思路就是,把所有每个类似window内element-wise multiplication + summation这样一个操作(有点atomic operation的意思)表达乘一行的vector矩阵相乘的形式,所有的这样的原子操作成行排成排,一起矩阵相乘。很聪明。

关于GEMM

https://jackwish.net/blog/2019/gemm-optimization.html

这篇写得挺好的。总得来说,gemm就是从实现的角度减少同一个元素的访问和存储,具体来说就是把矩阵输出的内容分成块,在每个块内,例如说对于同一行不同列的计算结果,如果是最原始的循环,这个行的值每次都会取出来计算,但现在可以把它放入寄存器(寄存器在cpu内,相当于口袋,内存在cpu外,相当于背包;而存储相当于家里的抽屉)。同样的道理应用在列,以及channels的k上,就实现了加速

回到稀疏卷积

相关链接和个人总结

个人总结:总得来说,该过程的关键就是基于rulebook,以及在active input sites上的atomic operation的思想。Rulebook的每一行是一个atomic operation,其代表着对于当前kernel中的这个位置(kernel内部的局部坐标位置),其操作在一个active input site,能够贡献值到一个output input site的“原子过程”。可快速回忆为由不同kernel位置的情况组织起来的,对于每个kernel位置可能对多个input site都产生了有效的atomic operation。

于是逐个对kernel位置操作,构建这个spatial位置上做操作的所有的kernel元素的矩阵(深入channels,以及多个kernels也就是output channels x 多个竖直的channels),然后对逐个有效的input site,也就单个atomic operation,进行GEMM矩阵乘法计算。最后再把结果还能根据rulebook的output site的信息scatter到spatial空间中,得到结果。

类似将卷积从数学形式转化为有效的可编程形式的im2col[5],也是利用Gather-GEMM-Scatter这样一个过程取代sliding window,而rulebook相当于管理了这个过程(gather和scatter)。具体来说,im2col是每个sliding window给组成了一行一块乘,而这里是把该relative kernel位置的对应的每个active input site的深入channels向量这个单个atomic组成了一个小行一块乘

目前3D稀疏卷积分为两大类SubMConv3d与SparseConv3d,这里也就正是SECOND提出的SparseConv3d,其只要kernel 覆盖一个active input site,就计算出output site。而另一个就是submanifold xxx,其区别在于,只有当kernel的中心覆盖一个active input site时,卷积输出才会被计算(按个人理解,也就是才形成有效的atomic operation)。

显然,SubMConv3d使得输出feature map上active的位置和输入的完全相同,保持了稀疏性(sparity=不为空的位置/所有位置和),使得计算量不会膨胀。而SparseConv3d会增加稀疏性,从而也就增加了计算量。但是如果只用SubMConv3d,卷积核的感受野会限制在一定范围内,所以要结合stride=2的SparseConv3d一起使用,在尽量保持稀疏性的同时增大感受野

3.1.4. Region Proposal Network

In this work, we use a single shot multibox detector (SSD)-like [32] architecture to construct an RPN architecture

  • 作者把后面整个部分叫做RPN,其实即和PointPillars后面部分差不多的2d backbone + head

3.1.5. Anchors and Targets

显然,PointPillars也沿用了这个方式

3.2. Training and Inference

3.2.1. Loss

Sine-Error Loss for Angle Regression

VoxelNet [14] directly predicts the radian offset but is subject to an adversarial example problem between the cases of 0 and π radians because these two angles correspond to the same box but generate a large loss when one is misidentified as the other. Our architecture solves this problem by introducing a new angle loss regression:

Lθ=SmoothL1(sin(θpθt))

where the subscript p indicates the predicted value

To address the issue that this loss treats boxes with opposite directions as being the same, we have added a simple direction classifier to the output of the RPN. This direction classifier uses a softmax loss function. We use the following approach to generate the direction classifier target: if the yaw rotation around the z-axis of the ground truth is higher than zero, the result is positive; otherwise, it is negative.

Focal Loss for Classification

Total Training Loss

总得来说,loss这块是这样的。classification是focal loss,regression的dim和size是smooth l1,像往常一样。对于orientation,在目标检测问题中的逻辑是这样的:1. orientation当然是ground-truth全部0到2π都不一样的,网络需要预测出exactly那个orientation值 2. 但是虽然如此,预测值在靠近target的过程时,要去分配的loss却需要智能地按照离的比较近的那个重合角度,以和真正的IoU保持一致。VoxelNet就是朴素地去regress target的radian值,会有adversarial问题。SECOND这里的方式就是,orientation的regression的时候去regress sin的值,这样0和π就是一个loss,但我们还是要预测出真正的那一个值啊,所以还有一个classification head预测angle的区域,根据yaw rotation是大于0还是小于0而定。这样就让regression的时候根据当前已经造成的重合程度产生合理的loss,而如果真的重合了,两种可能的angle也可由angle classification head确定最终结果。我们知道,后面PointPillars也用了这个方式。

3.2.2. Data Augmentation

Sample Ground Truths from the Database

Object Noise

Global Rotation and Scaling

3.2.4. Network Details

显然,PointPillars也是这个方式

Sparse Convolutional Middle Extractor:代码部分

于是在OpenPCDet中,该部分的模块也就是依赖于spconv库使用。下面结合代码进行梳理

BACKBONE_3D:
        NAME: VoxelBackBone8x

Note:十分值得一提的是,一开始我在想,之前我们得到的(32000, 4),这也不是什么稀疏矩阵啊,拿到的不就是有points的那些voxel们,难道还要先scatter回去,可是scatter不是在后面吗。然后突然意识到,我们不是有voxel coords吗,这不正是现在我们voxel features这些active sites的在整个sparse的voxel map上的对应位置吗,可以想象也就是构建rulebook所需要的,而且根据参数们map尺寸也是可以一下算出来并且告诉这个模块。妙哉妙哉,由于之前PointPillars并不需要3D卷积,原来这样的流程,完美契合了voxel backbone的使用

```
在过程中始终以SparseConvTensor的形式输出。构建时主要包括:
"""
	  features: [num_points, num_features] feature tensor
	  indices: [num_points, ndim + 1] indice tensor. batch index saved in indices[:, 0]
	  spatial_shape: spatial shape of your sparse data (z,y,x)
	  batch_size: batch size of your sparse data
"""
以及在过程中由网络层所加上的携带rulebook和其他信息:
# indice_dict{(tuple:5),}: 
0:输出索引,1:输入索引,2:输入Rulebook索引,3:输出Rulebook索引,4:spatial shape
# sparity: 稀疏率
# spatial_size

# 在heigh_compression.py中结合batch,spatial_shape、indice和feature将特征还原的对应位置,并在高度方向合并压缩至BEV特征图
```

indice_key,是为了在indice相同的情况下重复利用计算好的'rulebook'和'hash表',减少计算量。一般在第一次构建层时,返回的都是None,只有在spconv.SparseSequential中3个block堆叠,最后一个submconv3d才可以复用第二个submconv3d的indice
另外,发现indice_dict貌似是共享的一个变量,新的变量输出后,之前的变量中的indice_dict也更新到最新了

点击查看代码
# 根据voxel特征和坐标以及空间形状和batch,建立稀疏tensor
input_sp_tensor = spconv.SparseConvTensor(
            features=voxel_features, # (32000, 4)
            indices=voxel_coords.int(), # (32000, 4)
            spatial_shape=self.sparse_shape, # (41, 1600, 1408)
            batch_size=batch_size # 4
        )

所建立的SparseConvTensor如下,在传播过程中,其所含有的量不断更新

点击查看代码 ```python x = self.conv_input(input_sp_tensor) # features: (32000,16) # indice_dict加入'subm1': ((32000,4),(32000,4),(2,27,32000),(27,),(3,)) # 最后一个是[41,1600,1408]

x_conv1 = self.conv1(x)

features: (32000,16)

'subm1'得到沿用,indice_dict无更新

x_conv2 = self.conv2(x_conv1)

features: (57850,32)

indice_dict加入'spconv2': ((57850,4),(32000,4),(2,27,32000),(27,),(3,))

最后一个是[41,1600,1408]

indice_dict加入'subm2': ((57850,4),(57850,4),(2,27,57850),(27,),(3,))

最后一个是[21,800,704]

x_conv3 = self.conv3(x_conv2)

features: (42021,64)

indice_dict加入'spconv3': ((42021,4),(57850,4),(2,27,57850),(27,),(3,))

最后一个是[21,800,704]

indice_dict加入'subm3': ((42021,4),(42021,4),(2,27,42021),(27,),(3,))

最后一个是[11,400,352]

x_conv4 = self.conv4(x_conv3)

features: (18983,64)

indice_dict加入'spconv4': ((18983,4),(42021,4),(2,27,42021),(27,),(3,))

最后一个是[11,400,352]

indice_dict加入'subm4': ((18983,4),(18983,4),(2,27,18983),(27,),(3,))

最后一个是[5,200,176]

out = self.conv_out(x_conv4)

features: (14723,128)

indice_dict加入'spconv_down2': ((14723,4),(18983,4),(2,3,18983),(3,),(3,))

最后一个是[5,200,176]

将输出特征图和各尺度的3d特征图存入batch_dict

batch_dict.update({
'encoded_spconv_tensor': out, # 输出特征
'encoded_spconv_tensor_stride': 8 # 下采样倍数
})

多尺度特征

batch_dict.update({
'multi_scale_3d_features': {
'x_conv1': x_conv1,
'x_conv2': x_conv2,
'x_conv3': x_conv3,
'x_conv4': x_conv4,
}
})

多尺度下采样倍数

batch_dict.update({
'multi_scale_3d_strides': {
'x_conv1': 1,
'x_conv2': 2,
'x_conv3': 4,
'x_conv4': 8,
}
})

</details>

初始化时网络的部分也放在这里:

<details>
<summary>点击查看代码</summary>

```python
def post_act_block(in_channels, out_channels, kernel_size, indice_key=None, stride=1, padding=0,
                   conv_type='subm', norm_fn=None):
    """
    后处理执行块,根据conv_type选择对应的卷积操作并和norm与激活函数封装为块
    """
    if conv_type == 'subm':
        conv = spconv.SubMConv3d(in_channels, out_channels, kernel_size, bias=False, indice_key=indice_key)
    elif conv_type == 'spconv':
        conv = spconv.SparseConv3d(in_channels, out_channels, kernel_size, stride=stride, padding=padding,
                                   bias=False, indice_key=indice_key)
    elif conv_type == 'inverseconv':
        conv = spconv.SparseInverseConv3d(in_channels, out_channels, kernel_size, indice_key=indice_key, bias=False)
    else:
        raise NotImplementedError

    m = spconv.SparseSequential(
        conv,
        norm_fn(out_channels),
        nn.ReLU(),
    )

    return m
点击查看代码
norm_fn = partial(nn.BatchNorm1d, eps=1e-3, momentum=0.01)

self.sparse_shape = grid_size[::-1] + [1, 0, 0] # [41, 1600, 1408] 在原始网格的高度方向上增加了一维

self.conv_input = spconv.SparseSequential(
    spconv.SubMConv3d(input_channels, 16, 3, padding=1, bias=False, indice_key='subm1'),
    norm_fn(16),
    nn.ReLU(),
)
block = post_act_block

self.conv1 = spconv.SparseSequential(
    block(16, 16, 3, norm_fn=norm_fn, padding=1, indice_key='subm1'),
)

self.conv2 = spconv.SparseSequential(
    # [1600, 1408, 41] -> [800, 704, 21]
    block(16, 32, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv2', conv_type='spconv'),
    block(32, 32, 3, norm_fn=norm_fn, padding=1, indice_key='subm2'),
    block(32, 32, 3, norm_fn=norm_fn, padding=1, indice_key='subm2'),
)

self.conv3 = spconv.SparseSequential(
    # [800, 704, 21] -> [400, 352, 11]
    block(32, 64, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv3', conv_type='spconv'),
    block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm3'),
    block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm3'),
)

self.conv4 = spconv.SparseSequential(
    # [400, 352, 11] -> [200, 176, 5]
    block(64, 64, 3, norm_fn=norm_fn, stride=2, padding=(0, 1, 1), indice_key='spconv4', conv_type='spconv'),
    block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm4'),
    block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm4'),
)

last_pad = 0
last_pad = self.model_cfg.get('last_pad', last_pad)
self.conv_out = spconv.SparseSequential(
    # [200, 150, 5] -> [200, 150, 2]
    spconv.SparseConv3d(64, 128, (3, 1, 1), stride=(2, 1, 1), padding=last_pad,
                        bias=False, indice_key='spconv_down2'),
    norm_fn(128),
    nn.ReLU(),
)

个人总结:总得来说,通过spconv.SparseConvTensor管理变量,其可以看作对你稀疏矩阵的一个压缩信息,保存着真的有东西的features,并有其在稀疏矩阵中的coords,以及稀疏矩阵的整体尺寸。那么,在需要时(最后),就可以通过这些信息还原。在传播中,使用包装好的spconv.SparseConv3d网络层,在SparseConvTensor上直接做操作。显然,这就是sparse convolution,其操作在SparseConvTensor,却真正地完成了3D convolution。通过管理着变量们,最后就可以还原成真正的结果。在过程中,rulebook被管理于indice_dict,并可通过key名的识别对同样尺寸等参数下一样的rulebook重复利用

当然,还是很不清楚具体的操作方式,不过,对该过程有了基本的了解。

于是其实VoxelBackBone8x 还是返回的SparseConvTensor,还没有还原scatter回去。接下来

MAP_TO_BEV:
        NAME: HeightCompression
        NUM_BEV_FEATURES: 256

于是输入即为上一个module VoxelBackBone8x网络最后输出的out :

利用spconv.SparseConvTensor的dense()得到解码后的spatial features,也就完成了3D convolution。具体来说,进行完了sparse convolution的scatter。当然,对于OpenPCDet这个位置的模块来说,这不是唯一目的,因为你可以想象这个更主要的是map to bev,这里采用的也就是height compression

结合PointPillars时的Pillar scatter,可以总结为OpenPCDet中这个模块是做(scatter +) map to bev

点击查看代码
encoded_spconv_tensor = batch_dict['encoded_spconv_tensor']
# 结合batch, spatial_shape, indice和feature将特征还原的对应位置
spatial_features = encoded_spconv_tensor.dense() # (2,128,2,200,176)
N, C, D, H, W = spatial_features.shape # 2,128,2,200,176
spatial_features = spatial_features.view(N, C * D, H, W) 
# (2,256,200,176)在高度上合并feature channels,将三维voxel fmp特征图压缩至BEV fmp
# 将特征和采样尺度加入batch_dict
batch_dict['spatial_features'] = spatial_features
batch_dict['spatial_features_stride'] = batch_dict['encoded_spconv_tensor_stride'] # 8
return batch_dict
posted @   traviscui  阅读(46)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
点击右上角即可分享
微信分享提示