mmdection预处理pipeline中的resize到底做了什么

1. 预处理配置

一般用过mmdetection的，都配置过训练和推理时图片预处理的参数，下面以推理时的配置作为说明，代码如下所示：

img_scale = (640,640)
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=img_scale,
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img']),
        ])
]

可以看到这个配置首先作了resize操作，然后keep_ratio参数设置为True，意思是保持宽高比，那么，是保持宽高比的同时，长短边最后是多少呢？是长边为640，还是短边为640？

2. 源码解读

从官方放出来的源码，Resize类在具体resize时的操作为：

def _resize_img(self, results):
    """Resize images with ``results['scale']``."""
    for key in results.get('img_fields', ['img']):
        if self.keep_ratio:
            img, scale_factor = mmcv.imrescale(
                results[key],
                results['scale'],
                return_scale=True,
                backend=self.backend)
            # the w_scale and h_scale has minor difference
            # a real fix should be done in the mmcv.imrescale in the future
            new_h, new_w = img.shape[:2]
            h, w = results[key].shape[:2]
            w_scale = new_w / w
            h_scale = new_h / h
        else:
          ...(以下分支代码省略)

可见如果启用了keep_ratio，最后调用了mmcv.imrescale。

查看mmcv源码：

def imrescale(img,
              scale,
              return_scale=False,
              interpolation='bilinear',
              backend=None):
    """Resize image while keeping the aspect ratio.

    Args:
        img (ndarray): The input image.
        scale (float | tuple[int]): The scaling factor or maximum size.
            If it is a float number, then the image will be rescaled by this
            factor, else if it is a tuple of 2 integers, then the image will
            be rescaled as large as possible within the scale.
        return_scale (bool): Whether to return the scaling factor besides the
            rescaled image.
        interpolation (str): Same as :func:`resize`.
        backend (str | None): Same as :func:`resize`.

    Returns:
        ndarray: The rescaled image.
    """
    h, w = img.shape[:2]
    new_size, scale_factor = rescale_size((w, h), scale, return_scale=True)
    rescaled_img = imresize(
        img, new_size, interpolation=interpolation, backend=backend)
    if return_scale:
        return rescaled_img, scale_factor
    else:
        return rescaled_img

这个就比较清晰了，根据原图片尺寸和目标尺寸，重新算一个新的尺寸和缩放比例（即rescale_size函数），然后使用指定的后端（opencv或者PIL）进行最后的resize操作。

那么关键就是如何计算新的尺寸：

def rescale_size(old_size, scale, return_scale=False):
    """Calculate the new size to be rescaled to.

    Args:
        old_size (tuple[int]): The old size (w, h) of image.
        scale (float | tuple[int]): The scaling factor or maximum size.
            If it is a float number, then the image will be rescaled by this
            factor, else if it is a tuple of 2 integers, then the image will
            be rescaled as large as possible within the scale.
        return_scale (bool): Whether to return the scaling factor besides the
            rescaled image size.

    Returns:
        tuple[int]: The new rescaled image size.
    """
    w, h = old_size
    if isinstance(scale, (float, int)):
        if scale <= 0:
            raise ValueError(f'Invalid scale {scale}, must be positive.')
        scale_factor = scale
    elif isinstance(scale, tuple):
        max_long_edge = max(scale)
        max_short_edge = min(scale)
        scale_factor = min(max_long_edge / max(h, w),
                           max_short_edge / min(h, w))
    else:
        raise TypeError(
            f'Scale must be a number or tuple of int, but got {type(scale)}')

    new_size = _scale_size((w, h), scale_factor)

    if return_scale:
        return new_size, scale_factor
    else:
        return new_size
    
def _scale_size(size, scale):
    """Rescale a size by a ratio.

    Args:
        size (tuple[int]): (w, h).
        scale (float | tuple(float)): Scaling factor.

    Returns:
        tuple[int]: scaled size.
    """
    if isinstance(scale, (float, int)):
        scale = (scale, scale)
    w, h = size
    return int(w * float(scale[0]) + 0.5), int(h * float(scale[1]) + 0.5)

最绕的是算缩放比例选择部分：

        max_long_edge = max(scale)
        max_short_edge = min(scale)
        scale_factor = min(max_long_edge / max(h, w),
                           max_short_edge / min(h, w))

其实就是将目标尺寸中的长边除以原尺寸中的长边，短边除以短边，然后取两者最小的作为缩放比例。

譬如，原尺寸为(224,320)，目标尺寸为(640, 512)，则长边除长边：640/320，为2；短边除以短边512/224，为2.229，则最终结果为2，最终的目标尺寸为(448, 640)。

在jupyter notebook中运行代码，结果如下：

In [26]: rescale_size((224, 320), (640,512), return_scale=True)
Out[26]: ((448, 640), 2.0)

3. 总结

mmdetection框架的预处理操作Resize，如果启用了keep_ratio参数，则会将目标尺寸中的长边除以长边，短边除以短边，取两者最小者作为最终的缩放比例，然后使用指定的后端进行缩放。

他们github上的回答也可以验证：Confusing Resizing in the Pipelines。

（完）

posted @ 2024-06-28 18:02 大师兄啊哈阅读(28) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

大师兄的博客

过早退出是一切失败的根源

mmdection预处理pipeline中的resize到底做了什么

1. 预处理配置

2. 源码解读

3. 总结

公告