mmdection预处理pipeline中的resize到底做了什么
1. 预处理配置
一般用过mmdetection的,都配置过训练和推理时图片预处理的参数,下面以推理时的配置作为说明,代码如下所示:
img_scale = (640,640)
test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=img_scale, flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']), ]) ]
可以看到这个配置首先作了resize操作,然后keep_ratio参数设置为True,意思是保持宽高比,那么,是保持宽高比的同时,长短边最后是多少呢?是长边为640,还是短边为640?
2. 源码解读
从官方放出来的源码,Resize类在具体resize时的操作为:
def _resize_img(self, results): """Resize images with ``results['scale']``.""" for key in results.get('img_fields', ['img']): if self.keep_ratio: img, scale_factor = mmcv.imrescale( results[key], results['scale'], return_scale=True, backend=self.backend) # the w_scale and h_scale has minor difference # a real fix should be done in the mmcv.imrescale in the future new_h, new_w = img.shape[:2] h, w = results[key].shape[:2] w_scale = new_w / w h_scale = new_h / h else: ...(以下分支代码省略)
可见如果启用了keep_ratio,最后调用了mmcv.imrescale。
查看mmcv源码:
def imrescale(img, scale, return_scale=False, interpolation='bilinear', backend=None): """Resize image while keeping the aspect ratio. Args: img (ndarray): The input image. scale (float | tuple[int]): The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale. return_scale (bool): Whether to return the scaling factor besides the rescaled image. interpolation (str): Same as :func:`resize`. backend (str | None): Same as :func:`resize`. Returns: ndarray: The rescaled image. """ h, w = img.shape[:2] new_size, scale_factor = rescale_size((w, h), scale, return_scale=True) rescaled_img = imresize( img, new_size, interpolation=interpolation, backend=backend) if return_scale: return rescaled_img, scale_factor else: return rescaled_img
这个就比较清晰了,根据原图片尺寸和目标尺寸,重新算一个新的尺寸和缩放比例(即rescale_size函数),然后使用指定的后端(opencv或者PIL)进行最后的resize操作。
那么关键就是如何计算新的尺寸:
def rescale_size(old_size, scale, return_scale=False): """Calculate the new size to be rescaled to. Args: old_size (tuple[int]): The old size (w, h) of image. scale (float | tuple[int]): The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale. return_scale (bool): Whether to return the scaling factor besides the rescaled image size. Returns: tuple[int]: The new rescaled image size. """ w, h = old_size if isinstance(scale, (float, int)): if scale <= 0: raise ValueError(f'Invalid scale {scale}, must be positive.') scale_factor = scale elif isinstance(scale, tuple): max_long_edge = max(scale) max_short_edge = min(scale) scale_factor = min(max_long_edge / max(h, w), max_short_edge / min(h, w)) else: raise TypeError( f'Scale must be a number or tuple of int, but got {type(scale)}') new_size = _scale_size((w, h), scale_factor) if return_scale: return new_size, scale_factor else: return new_size def _scale_size(size, scale): """Rescale a size by a ratio. Args: size (tuple[int]): (w, h). scale (float | tuple(float)): Scaling factor. Returns: tuple[int]: scaled size. """ if isinstance(scale, (float, int)): scale = (scale, scale) w, h = size return int(w * float(scale[0]) + 0.5), int(h * float(scale[1]) + 0.5)
最绕的是算缩放比例选择部分:
max_long_edge = max(scale) max_short_edge = min(scale) scale_factor = min(max_long_edge / max(h, w), max_short_edge / min(h, w))
其实就是将目标尺寸中的长边除以原尺寸中的长边,短边除以短边,然后取两者最小的作为缩放比例。
譬如,原尺寸为(224,320),目标尺寸为(640, 512),则长边除长边:640/320,为2;短边除以短边512/224,为2.229,则最终结果为2,最终的目标尺寸为(448, 640)。
在jupyter notebook中运行代码,结果如下:
In [26]: rescale_size((224, 320), (640,512), return_scale=True)
Out[26]: ((448, 640), 2.0)
3. 总结
mmdetection框架的预处理操作Resize,如果启用了keep_ratio参数,则会将目标尺寸中的长边除以原来的长边,短边除以原来的短边,取两者最小者作为最终的缩放比例,也就是原来的短边缩放为目标尺寸,同时保持宽高比。
他们github上的回答也可以验证:Confusing Resizing in the Pipelines。
(完)
本文版权归作者(https://www.cnblogs.com/harrymore/)和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文链接,如有问题, 可邮件(harrymore@126.com)咨询.