『论文』SECOND

1. Introduction

The key contributions of our work are as follows:

We propose an improved method of sparse convolution that allows it to run faster.
We propose a novel angle loss regression approach that demonstrates better orientation regression performance than other methods do.
We introduce a novel data augmentation method for LiDAR-only learning problems that greatly increases the convergence speed and performance.

模型整体config概览

(vfe): MeanVFE()
  (backbone_3d): VoxelBackBone8x(
    (conv_input): SparseSequential(
      (0): SubMConv3d()
      (1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (conv1): SparseSequential(
      (0): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (conv2): SparseSequential(
      (0): SparseSequential(
        (0): SparseConv3d()
        (1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (2): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (conv3): SparseSequential(
      (0): SparseSequential(
        (0): SparseConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (2): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (conv4): SparseSequential(
      (0): SparseSequential(
        (0): SparseConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (2): SparseSequential(
        (0): SubMConv3d()
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (conv_out): SparseSequential(
      (0): SparseConv3d()
      (1): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): ReLU()
    )
  )
  (map_to_bev_module): HeightCompression()
  (pfe): None
  (backbone_2d): BaseBEVBackbone(
    (blocks): ModuleList(
      (0): Sequential(
        (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
        (1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), bias=False)
        (2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (3): ReLU()
        (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (6): ReLU()
        (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (9): ReLU()
        (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (12): ReLU()
        (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (15): ReLU()
        (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (18): ReLU()
      )
      (1): Sequential(
        (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
        (1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (3): ReLU()
        (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (6): ReLU()
        (7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (9): ReLU()
        (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (12): ReLU()
        (13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (15): ReLU()
        (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (18): ReLU()
      )
    )
    (deblocks): ModuleList(
      (0): Sequential(
        (0): ConvTranspose2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): Sequential(
        (0): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
  )
  (dense_head): AnchorHeadSingle(
    (cls_loss_func): SigmoidFocalClassificationLoss()
    (reg_loss_func): WeightedSmoothL1Loss()
    (dir_loss_func): WeightedCrossEntropyLoss()
    (conv_cls): Conv2d(512, 18, kernel_size=(1, 1), stride=(1, 1))
    (conv_box): Conv2d(512, 42, kernel_size=(1, 1), stride=(1, 1))
    (conv_dir_cls): Conv2d(512, 12, kernel_size=(1, 1), stride=(1, 1))
  )
  (point_head): None
  (roi_head): None
)

3. SECOND Detector

3.1. Network Architecture

The proposed SECOND detector, depicted in Figure 1, consists of three components: (1) a voxelwise feature extractor; (2) a sparse convolutional middle layer; and (3) an RPN

3.1.1. Point Cloud Grouping

Here, we follow the simple procedure described in VoxelNet to obtain a voxel representation of point cloud data

Note: 即总结OpenPCDet里面的数据准备部分，VoxelGenearatorV2所做的。其实首先对应了之前我的猜想，就是仅根据已有points生成voxels的（有hash表的参与），而不是整个voxel map。另外值得一提的是，VoxelNet和SECOND这部分的设定都是Kitti dataset的config的默认设定，所以每个voxel尺寸是5个点，而PointPillars在自己的config上叠加了设定，每个voxel是32个点，且尺寸也变为[0.16, 0.16, 4]

- NAME: transform_points_to_voxels
      VOXEL_SIZE: [0.05, 0.05, 0.1]
      MAX_POINTS_PER_VOXEL: 5
      MAX_NUMBER_OF_VOXELS: {
        'train': 16000,
        'test': 40000
      }

于是，point cloud (#, 4) → voxel_features (32000, 5, 4), voxel_num_points (32000)

3.1.2. Voxelwise Feature Extractor

Note: 可以想象，这部分就是类似PointNet的部分，也是pillar VFE的第二部分，本来还是point-wise features organized in voxels还都在，现在提炼成voxel-wise features。值得一提的是，在OpenPCDet只是使用的mean VFE替代了这一步，求voxel内points features均值

VFE:
        NAME: MeanVFE

于是，voxel_features (32000, 5, 4) → voxel_features (32000, 4)

3.1.3. Sparse Convolutional Middle Extractor

Our middle extractor is used to learn information about the z-axis and convert the sparse 3D data into a 2D BEV image. Figure 3 shows the structure of the middle extractor. It consists of two phases of sparse convolution. Each phase contains several submanifold convolutional layers and one normal sparse convolution to perform downsampling in the z-axis. After the z-dimensionality has been downsampled to one or two, the sparse data are converted into dense feature maps. Then, the data are simply reshaped into image-like 2D data.

关于im2col

https://www.jianshu.com/p/93a1abcc4717

这篇写得挺好的。总而言之，im2col的目标就是，将siiding window转化成矩阵相乘的形式，于是可以利用更高效的GEMM方法。具体来说，每个sliding window内的每次操作都flatten，kernel也flatten，这样两个vector的乘法就是element-wise multiplication + summation了。而多channel，就是继续铺平。总而言之，思路就是，把所有每个类似window内element-wise multiplication + summation这样一个操作（有点atomic operation的意思）表达乘一行的vector矩阵相乘的形式，所有的这样的原子操作成行排成排，一起矩阵相乘。很聪明。

关于GEMM

https://jackwish.net/blog/2019/gemm-optimization.html

这篇写得挺好的。总得来说，gemm就是从实现的角度减少同一个元素的访问和存储，具体来说就是把矩阵输出的内容分成块，在每个块内，例如说对于同一行不同列的计算结果，如果是最原始的循环，这个行的值每次都会取出来计算，但现在可以把它放入寄存器（寄存器在cpu内，相当于口袋，内存在cpu外，相当于背包；而存储相当于家里的抽屉）。同样的道理应用在列，以及channels的k上，就实现了加速

回到稀疏卷积

3.1.4. Region Proposal Network

In this work, we use a single shot multibox detector (SSD)-like [32] architecture to construct an RPN architecture

作者把后面整个部分叫做RPN，其实即和PointPillars后面部分差不多的2d backbone + head

3.1.5. Anchors and Targets

显然，PointPillars也沿用了这个方式

3.2. Training and Inference

3.2.1. Loss

Sine-Error Loss for Angle Regression

VoxelNet [14] directly predicts the radian offset but is subject to an adversarial example problem between the cases of 0 and π radians because these two angles correspond to the same box but generate a large loss when one is misidentified as the other. Our architecture solves this problem by introducing a new angle loss regression:

\[L_\theta = \text{SmoothL1}(\sin(\theta_p-\theta_t)) \]

where the subscript \(p\) indicates the predicted value

To address the issue that this loss treats boxes with opposite directions as being the same, we have added a simple direction classifier to the output of the RPN. This direction classifier uses a softmax loss function. We use the following approach to generate the direction classifier target: if the yaw rotation around the z-axis of the ground truth is higher than zero, the result is positive; otherwise, it is negative.

Focal Loss for Classification

Total Training Loss

总得来说，loss这块是这样的。classification是focal loss，regression的dim和size是smooth l1，像往常一样。对于orientation，在目标检测问题中的逻辑是这样的：1. orientation当然是ground-truth全部0到\(2\pi\)都不一样的，网络需要预测出exactly那个orientation值 2. 但是虽然如此，预测值在靠近target的过程时，要去分配的loss却需要智能地按照离的比较近的那个重合角度，以和真正的IoU保持一致。VoxelNet就是朴素地去regress target的radian值，会有adversarial问题。SECOND这里的方式就是，orientation的regression的时候去regress sin的值，这样0和\(\pi\)就是一个loss，但我们还是要预测出真正的那一个值啊，所以还有一个classification head预测angle的区域，根据yaw rotation是大于0还是小于0而定。这样就让regression的时候根据当前已经造成的重合程度产生合理的loss，而如果真的重合了，两种可能的angle也可由angle classification head确定最终结果。我们知道，后面PointPillars也用了这个方式。

3.2.2. Data Augmentation

Sample Ground Truths from the Database

Object Noise

Global Rotation and Scaling

3.2.4. Network Details

显然，PointPillars也是这个方式

Sparse Convolutional Middle Extractor：代码部分

于是在OpenPCDet中，该部分的模块也就是依赖于spconv库使用。下面结合代码进行梳理

BACKBONE_3D:
        NAME: VoxelBackBone8x

Note：十分值得一提的是，一开始我在想，之前我们得到的(32000, 4)，这也不是什么稀疏矩阵啊，拿到的不就是有points的那些voxel们，难道还要先scatter回去，可是scatter不是在后面吗。然后突然意识到，我们不是有voxel coords吗，这不正是现在我们voxel features这些active sites的在整个sparse的voxel map上的对应位置吗，可以想象也就是构建rulebook所需要的，而且根据参数们map尺寸也是可以一下算出来并且告诉这个模块。妙哉妙哉，由于之前PointPillars并不需要3D卷积，原来这样的流程，完美契合了voxel backbone的使用

```
在过程中始终以SparseConvTensor的形式输出。构建时主要包括:
"""
	  features: [num_points, num_features] feature tensor
	  indices: [num_points, ndim + 1] indice tensor. batch index saved in indices[:, 0]
	  spatial_shape: spatial shape of your sparse data (z,y,x)
	  batch_size: batch size of your sparse data
"""
以及在过程中由网络层所加上的携带rulebook和其他信息：
# indice_dict{(tuple:5),}: 
0:输出索引，1:输入索引，2:输入Rulebook索引，3:输出Rulebook索引，4:spatial shape
# sparity: 稀疏率
# spatial_size

# 在heigh_compression.py中结合batch，spatial_shape、indice和feature将特征还原的对应位置，并在高度方向合并压缩至BEV特征图
```

indice_key，是为了在indice相同的情况下重复利用计算好的'rulebook'和'hash表'，减少计算量。一般在第一次构建层时，返回的都是None，只有在spconv.SparseSequential中3个block堆叠，最后一个submconv3d才可以复用第二个submconv3d的indice
另外，发现indice_dict貌似是共享的一个变量，新的变量输出后，之前的变量中的indice_dict也更新到最新了

点击查看代码

# 根据voxel特征和坐标以及空间形状和batch，建立稀疏tensor
input_sp_tensor = spconv.SparseConvTensor(
            features=voxel_features, # (32000, 4)
            indices=voxel_coords.int(), # (32000, 4)
            spatial_shape=self.sparse_shape, # （41, 1600, 1408）
            batch_size=batch_size # 4
        )

所建立的SparseConvTensor如下，在传播过程中，其所含有的量不断更新

点击查看代码

```python x = self.conv_input(input_sp_tensor) # features: (32000,16) # indice_dict加入'subm1': ((32000,4),(32000,4),(2,27,32000),(27,),(3,)) # 最后一个是[41,1600,1408]

x_conv1 = self.conv1(x)

features: (32000,16)

'subm1'得到沿用，indice_dict无更新

x_conv2 = self.conv2(x_conv1)

features: (57850,32)

indice_dict加入'spconv2': ((57850,4),(32000,4),(2,27,32000),(27,),(3,))

最后一个是[41,1600,1408]

indice_dict加入'subm2': ((57850,4),(57850,4),(2,27,57850),(27,),(3,))

最后一个是[21,800,704]

x_conv3 = self.conv3(x_conv2)

features: (42021,64)

indice_dict加入'spconv3': ((42021,4),(57850,4),(2,27,57850),(27,),(3,))

最后一个是[21,800,704]

indice_dict加入'subm3': ((42021,4),(42021,4),(2,27,42021),(27,),(3,))

最后一个是[11,400,352]

x_conv4 = self.conv4(x_conv3)

features: (18983,64)

indice_dict加入'spconv4': ((18983,4),(42021,4),(2,27,42021),(27,),(3,))

最后一个是[11,400,352]

indice_dict加入'subm4': ((18983,4),(18983,4),(2,27,18983),(27,),(3,))

最后一个是[5,200,176]

out = self.conv_out(x_conv4)

features: (14723,128)

indice_dict加入'spconv_down2': ((14723,4),(18983,4),(2,3,18983),(3,),(3,))

最后一个是[5,200,176]

将输出特征图和各尺度的3d特征图存入batch_dict

batch_dict.update({
'encoded_spconv_tensor': out, # 输出特征
'encoded_spconv_tensor_stride': 8 # 下采样倍数
})

多尺度特征

batch_dict.update({
'multi_scale_3d_features': {
'x_conv1': x_conv1,
'x_conv2': x_conv2,
'x_conv3': x_conv3,
'x_conv4': x_conv4,
}
})

多尺度下采样倍数

batch_dict.update({
'multi_scale_3d_strides': {
'x_conv1': 1,
'x_conv2': 2,
'x_conv3': 4,
'x_conv4': 8,
}
})

</details>

初始化时网络的部分也放在这里：

<details>
<summary>点击查看代码</summary>

```python
def post_act_block(in_channels, out_channels, kernel_size, indice_key=None, stride=1, padding=0,
                   conv_type='subm', norm_fn=None):
    """
    后处理执行块，根据conv_type选择对应的卷积操作并和norm与激活函数封装为块
    """
    if conv_type == 'subm':
        conv = spconv.SubMConv3d(in_channels, out_channels, kernel_size, bias=False, indice_key=indice_key)
    elif conv_type == 'spconv':
        conv = spconv.SparseConv3d(in_channels, out_channels, kernel_size, stride=stride, padding=padding,
                                   bias=False, indice_key=indice_key)
    elif conv_type == 'inverseconv':
        conv = spconv.SparseInverseConv3d(in_channels, out_channels, kernel_size, indice_key=indice_key, bias=False)
    else:
        raise NotImplementedError

    m = spconv.SparseSequential(
        conv,
        norm_fn(out_channels),
        nn.ReLU(),
    )

    return m

点击查看代码

norm_fn = partial(nn.BatchNorm1d, eps=1e-3, momentum=0.01)

self.sparse_shape = grid_size[::-1] + [1, 0, 0] # [41, 1600, 1408] 在原始网格的高度方向上增加了一维

self.conv_input = spconv.SparseSequential(
    spconv.SubMConv3d(input_channels, 16, 3, padding=1, bias=False, indice_key='subm1'),
    norm_fn(16),
    nn.ReLU(),
)
block = post_act_block

self.conv1 = spconv.SparseSequential(
    block(16, 16, 3, norm_fn=norm_fn, padding=1, indice_key='subm1'),
)

self.conv2 = spconv.SparseSequential(
    # [1600, 1408, 41] -> [800, 704, 21]
    block(16, 32, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv2', conv_type='spconv'),
    block(32, 32, 3, norm_fn=norm_fn, padding=1, indice_key='subm2'),
    block(32, 32, 3, norm_fn=norm_fn, padding=1, indice_key='subm2'),
)

self.conv3 = spconv.SparseSequential(
    # [800, 704, 21] -> [400, 352, 11]
    block(32, 64, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv3', conv_type='spconv'),
    block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm3'),
    block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm3'),
)

self.conv4 = spconv.SparseSequential(
    # [400, 352, 11] -> [200, 176, 5]
    block(64, 64, 3, norm_fn=norm_fn, stride=2, padding=(0, 1, 1), indice_key='spconv4', conv_type='spconv'),
    block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm4'),
    block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm4'),
)

last_pad = 0
last_pad = self.model_cfg.get('last_pad', last_pad)
self.conv_out = spconv.SparseSequential(
    # [200, 150, 5] -> [200, 150, 2]
    spconv.SparseConv3d(64, 128, (3, 1, 1), stride=(2, 1, 1), padding=last_pad,
                        bias=False, indice_key='spconv_down2'),
    norm_fn(128),
    nn.ReLU(),
)

个人总结：总得来说，通过spconv.SparseConvTensor管理变量，其可以看作对你稀疏矩阵的一个压缩信息，保存着真的有东西的features，并有其在稀疏矩阵中的coords，以及稀疏矩阵的整体尺寸。那么，在需要时（最后），就可以通过这些信息还原。在传播中，使用包装好的spconv.SparseConv3d网络层，在SparseConvTensor上直接做操作。显然，这就是sparse convolution，其操作在SparseConvTensor，却真正地完成了3D convolution。通过管理着变量们，最后就可以还原成真正的结果。在过程中，rulebook被管理于indice_dict，并可通过key名的识别对同样尺寸等参数下一样的rulebook重复利用

当然，还是很不清楚具体的操作方式，不过，对该过程有了基本的了解。

于是其实VoxelBackBone8x 还是返回的SparseConvTensor，还没有还原scatter回去。接下来

MAP_TO_BEV:
        NAME: HeightCompression
        NUM_BEV_FEATURES: 256

于是输入即为上一个module VoxelBackBone8x网络最后输出的out :

利用spconv.SparseConvTensor的dense()得到解码后的spatial features，也就完成了3D convolution。具体来说，进行完了sparse convolution的scatter。当然，对于OpenPCDet这个位置的模块来说，这不是唯一目的，因为你可以想象这个更主要的是map to bev，这里采用的也就是height compression

结合PointPillars时的Pillar scatter，可以总结为OpenPCDet中这个模块是做(scatter +) map to bev

点击查看代码

encoded_spconv_tensor = batch_dict['encoded_spconv_tensor']
# 结合batch, spatial_shape, indice和feature将特征还原的对应位置
spatial_features = encoded_spconv_tensor.dense() # (2,128,2,200,176)
N, C, D, H, W = spatial_features.shape # 2,128,2,200,176
spatial_features = spatial_features.view(N, C * D, H, W) 
# (2,256,200，176）在高度上合并feature channels，将三维voxel fmp特征图压缩至BEV fmp
# 将特征和采样尺度加入batch_dict
batch_dict['spatial_features'] = spatial_features
batch_dict['spatial_features_stride'] = batch_dict['encoded_spconv_tensor_stride'] # 8
return batch_dict

posted @ 2022-08-07 05:44 traviscui 阅读(190) 评论(0) 收藏举报

刷新页面返回顶部

Loading

快乐老家

记录学习和生活点滴

『论文』SECOND

『论文』SECOND

1. Introduction

3. SECOND Detector

3.1. Network Architecture

3.1.1. Point Cloud Grouping

3.1.2. Voxelwise Feature Extractor

3.1.3. Sparse Convolutional Middle Extractor

3.1.4. Region Proposal Network

3.1.5. Anchors and Targets

3.2. Training and Inference

3.2.1. Loss

3.2.2. Data Augmentation

3.2.4. Network Details

Sparse Convolutional Middle Extractor：代码部分

features: (32000,16)

'subm1'得到沿用，indice_dict无更新

features: (57850,32)

indice_dict加入'spconv2': ((57850,4),(32000,4),(2,27,32000),(27,),(3,))

最后一个是[41,1600,1408]

indice_dict加入'subm2': ((57850,4),(57850,4),(2,27,57850),(27,),(3,))

最后一个是[21,800,704]

features: (42021,64)

indice_dict加入'spconv3': ((42021,4),(57850,4),(2,27,57850),(27,),(3,))

最后一个是[21,800,704]

indice_dict加入'subm3': ((42021,4),(42021,4),(2,27,42021),(27,),(3,))

最后一个是[11,400,352]

features: (18983,64)

indice_dict加入'spconv4': ((18983,4),(42021,4),(2,27,42021),(27,),(3,))

最后一个是[11,400,352]

indice_dict加入'subm4': ((18983,4),(18983,4),(2,27,18983),(27,),(3,))

最后一个是[5,200,176]

features: (14723,128)

indice_dict加入'spconv_down2': ((14723,4),(18983,4),(2,3,18983),(3,),(3,))

最后一个是[5,200,176]

将输出特征图和各尺度的3d特征图存入batch_dict

多尺度特征

多尺度下采样倍数

公告