yolov3 不错的代码段

图像归一化

原图是1280,720
经过letterbox函数处理后是640,384 （32*12=384）
letterbox函数功能是返回最长边为640，并且最短边为32的倍数的图像。

def letterbox(img, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
    # Resize and pad image while meeting stride-multiple constraints
    shape = img.shape[:2]  # current shape [height, width] [720,1280]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape) #[640,640]

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) #0.5
    if not scaleup:  # only scale down, do not scale up (for better test mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) #[640,360]
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding  0  280
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding   0  24
        #这里为啥要取余？
        #因为一开始的dw, dh是指缩放后的图片与640大小差距。 缩放后最长边保证为640，最短边肯定比640小
        #dw, dh 其中一个为0，最短边与640的差距
        #现在这个函数的目的是要使得原图缩放为640后，最短边也要能整除strde=32
        #640是能够整除32的，那么最短边需要是32的倍数，那么你差距也需要是32的倍数
        #所以，现在把差距取余32假设为offset，我就补offset。这样的话，剩下的差距是32的倍数，我最短边肯定也是32的倍数了
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides   0
    dh /= 2  # 12

    if shape[::-1] != new_unpad:  # resize
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return img, ratio, (dw, dh)

返回的图像效果图如下：大小为[640,384，3], 可见在高度上面补了灰边！

生成坐标点矩阵 stack

函数的意义：使用stack可以保留两个信息：[1. 序列] 和 [2. 张量矩阵] 信息，属于【扩张再拼接】的函数；可以认为把一个个矩阵按时间序列压紧成一个矩阵。常出现在自然语言处理（NLP）和图像卷积神经网络(CV)中。

stack()
官方解释：沿着一个新维度对输入张量序列进行连接。序列中所有的张量都应该为相同形状。
浅显说法：把多个2维的张量凑成一个3维的张量；多个3维的凑成一个4维的张量…以此类推，也就是在增加新的维度进行堆叠。
outputs = torch.stack(inputs, dim=0) → Tensor
参数
inputs : 待连接的张量序列。注：python的序列数据只有list和tuple。dim : 新的维度，必须在0到len(outputs)之间。注：len(outputs)是生成数据的维度大小，也就是outputs的维度值。

import torch

T1 = torch.tensor([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])

T2 = torch.tensor([[10, 20, 30],
                 [40, 50, 60],
                 [70, 80, 90]])

stack_0 = torch.stack((T1,T2),dim=0)
stack_1 = torch.stack((T1,T2),dim=1)
stack_2 = torch.stack((T1,T2),dim=2)

print("\n=======stack_0 value====")
print(stack_0)

print("======stack_0  shape===")
print(stack_0.shape)

print("******************************************\n")

print("\n=======stack_1 value====")
print(stack_1)

print("======stack_1  shape===")
print(stack_1.shape)


print("******************************************\n")

print("\n=======stack_2 value====")
print(stack_2)

print("======stack_2  shape===")
print(stack_2.shape)

打印如下：

=======stack_0 value====
tensor([[[ 1,  2,  3],
         [ 4,  5,  6],
         [ 7,  8,  9]],

        [[10, 20, 30],
         [40, 50, 60],
         [70, 80, 90]]])
======stack_0  shape===
torch.Size([2, 3, 3])
******************************************


=======stack_1 value====
tensor([[[ 1,  2,  3],
         [10, 20, 30]],

        [[ 4,  5,  6],
         [40, 50, 60]],

        [[ 7,  8,  9],
         [70, 80, 90]]])
======stack_1  shape===
torch.Size([3, 2, 3])
******************************************


=======stack_2 value====
tensor([[[ 1, 10],
         [ 2, 20],
         [ 3, 30]],

        [[ 4, 40],
         [ 5, 50],
         [ 6, 60]],

        [[ 7, 70],
         [ 8, 80],
         [ 9, 90]]])
======stack_2  shape===
torch.Size([3, 3, 2])

在yolov3里面用它来生成坐标网格点。

 @staticmethod
    def _make_grid(nx=20, ny=20):
        nx = 5
        ny = 5
        yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
        #yv [20,20]   xv[20,20]
        tmp_0 = torch.stack((xv, yv), 2)  #[20,20,2]
        tmp_1 = torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float() #[1,1,20,20,2]

        return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()

我改成了5，方便实验查看：
xv的shape是[5,5]

tensor([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]])

yv的shape是[5,5]

tensor([[0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1],
        [2, 2, 2, 2, 2],
        [3, 3, 3, 3, 3],
        [4, 4, 4, 4, 4]])

tmp_0的shape是[5,5,2]

tensor([[[0, 0],
         [1, 0],
         [2, 0],
         [3, 0],
         [4, 0]],

        [[0, 1],
         [1, 1],
         [2, 1],
         [3, 1],
         [4, 1]],

        [[0, 2],
         [1, 2],
         [2, 2],
         [3, 2],
         [4, 2]],

        [[0, 3],
         [1, 3],
         [2, 3],
         [3, 3],
         [4, 3]],

        [[0, 4],
         [1, 4],
         [2, 4],
         [3, 4],
         [4, 4]]])

可以看到是这样的，生成了网格点矩阵坐标。

torch.meshgrid

yolov3里面是这么用的：
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])

tmp= torch.meshgrid([torch.arange(3), torch.arange(3)])

tmp值如下：
<class 'tuple'>: (tensor([[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]]), 

tensor([[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]]))

arr shape [3,2] 变成 [3,1,2]

import numpy as np

arr = np.random.randn(3,2)
print(arr.shape)

arr1 = arr[:,None]
print(arr1.shape)
#(3, 2)
#(3, 1, 2)

numpy 数组截取过滤 t = t[j]

t = t[j]  # filter  # t[3,72,7]    j[3,72]  --->> [95,7]

posted @ 2021-09-26 16:11 无左无右阅读(398) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· 手把手教你更优雅的享受 DeepSeek
· 腾讯元宝接入 DeepSeek R1 模型，支持深度思考 + 联网搜索，好用不卡机！
· AI工具推荐：领先的开源 AI 代码助手——Continue
· 探秘Transformer系列之（2）---总体架构
· V-Control：一个基于 .NET MAUI 的开箱即用的UI组件库

公告

昵称：无左无右
园龄： 8年3个月
粉丝： 19
关注： 7

2025年2月

日

一

二

三

四

五

六

无左无右

yolov3 不错的代码段

图像归一化

生成坐标点矩阵 stack

torch.meshgrid

arr shape [3,2] 变成 [3,1,2]

numpy 数组截取过滤 t = t[j]

公告

搜索

常用链接

我的标签

随笔分类

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论