mmdection预处理pipeline中的resize到底做了什么

1. 预处理配置

一般用过mmdetection的,都配置过训练和推理时图片预处理的参数,下面以推理时的配置作为说明,代码如下所示:

img_scale = (640,640)
test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=img_scale, flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']), ]) ]

可以看到这个配置首先作了resize操作,然后keep_ratio参数设置为True,意思是保持宽高比,那么,是保持宽高比的同时,长短边最后是多少呢?是长边为640,还是短边为640?

2. 源码解读

从官方放出来的源码,Resize类在具体resize时的操作为:

def _resize_img(self, results):
    """Resize images with ``results['scale']``."""
    for key in results.get('img_fields', ['img']):
        if self.keep_ratio:
            img, scale_factor = mmcv.imrescale(
                results[key],
                results['scale'],
                return_scale=True,
                backend=self.backend)
            # the w_scale and h_scale has minor difference
            # a real fix should be done in the mmcv.imrescale in the future
            new_h, new_w = img.shape[:2]
            h, w = results[key].shape[:2]
            w_scale = new_w / w
            h_scale = new_h / h
        else:
          ...(以下分支代码省略)

可见如果启用了keep_ratio,最后调用了mmcv.imrescale。

查看mmcv源码:

def imrescale(img,
              scale,
              return_scale=False,
              interpolation='bilinear',
              backend=None):
    """Resize image while keeping the aspect ratio.

    Args:
        img (ndarray): The input image.
        scale (float | tuple[int]): The scaling factor or maximum size.
            If it is a float number, then the image will be rescaled by this
            factor, else if it is a tuple of 2 integers, then the image will
            be rescaled as large as possible within the scale.
        return_scale (bool): Whether to return the scaling factor besides the
            rescaled image.
        interpolation (str): Same as :func:`resize`.
        backend (str | None): Same as :func:`resize`.

    Returns:
        ndarray: The rescaled image.
    """
    h, w = img.shape[:2]
    new_size, scale_factor = rescale_size((w, h), scale, return_scale=True)
    rescaled_img = imresize(
        img, new_size, interpolation=interpolation, backend=backend)
    if return_scale:
        return rescaled_img, scale_factor
    else:
        return rescaled_img

这个就比较清晰了,根据原图片尺寸和目标尺寸,重新算一个新的尺寸和缩放比例(即rescale_size函数),然后使用指定的后端(opencv或者PIL)进行最后的resize操作。

那么关键就是如何计算新的尺寸:

def rescale_size(old_size, scale, return_scale=False):
    """Calculate the new size to be rescaled to.

    Args:
        old_size (tuple[int]): The old size (w, h) of image.
        scale (float | tuple[int]): The scaling factor or maximum size.
            If it is a float number, then the image will be rescaled by this
            factor, else if it is a tuple of 2 integers, then the image will
            be rescaled as large as possible within the scale.
        return_scale (bool): Whether to return the scaling factor besides the
            rescaled image size.

    Returns:
        tuple[int]: The new rescaled image size.
    """
    w, h = old_size
    if isinstance(scale, (float, int)):
        if scale <= 0:
            raise ValueError(f'Invalid scale {scale}, must be positive.')
        scale_factor = scale
    elif isinstance(scale, tuple):
        max_long_edge = max(scale)
        max_short_edge = min(scale)
        scale_factor = min(max_long_edge / max(h, w),
                           max_short_edge / min(h, w))
    else:
        raise TypeError(
            f'Scale must be a number or tuple of int, but got {type(scale)}')

    new_size = _scale_size((w, h), scale_factor)

    if return_scale:
        return new_size, scale_factor
    else:
        return new_size
    
def _scale_size(size, scale):
    """Rescale a size by a ratio.

    Args:
        size (tuple[int]): (w, h).
        scale (float | tuple(float)): Scaling factor.

    Returns:
        tuple[int]: scaled size.
    """
    if isinstance(scale, (float, int)):
        scale = (scale, scale)
    w, h = size
    return int(w * float(scale[0]) + 0.5), int(h * float(scale[1]) + 0.5)

最绕的是算缩放比例选择部分:

        max_long_edge = max(scale)
        max_short_edge = min(scale)
        scale_factor = min(max_long_edge / max(h, w),
                           max_short_edge / min(h, w))

其实就是将目标尺寸中的长边除以原尺寸中的长边,短边除以短边,然后取两者最小的作为缩放比例

譬如,原尺寸为(224,320),目标尺寸为(640, 512),则长边除长边:640/320,为2;短边除以短边512/224,为2.229,则最终结果为2,最终的目标尺寸为(448, 640)。

在jupyter notebook中运行代码,结果如下:

In [26]: rescale_size((224, 320), (640,512), return_scale=True)
Out[26]: ((448, 640), 2.0)

3. 总结

mmdetection框架的预处理操作Resize,如果启用了keep_ratio参数,则会将目标尺寸中的长边除以原来的长边,短边除以原来的短边,取两者最小者作为最终的缩放比例,也就是原来的短边缩放为目标尺寸,同时保持宽高比。

他们github上的回答也可以验证:Confusing Resizing in the Pipelines

(完)

 

posted @ 2024-06-28 18:02  大师兄啊哈  阅读(178)  评论(0编辑  收藏  举报