BEV开山之作LSS(Lift,Splat,Shoot)代码浅析之一:数据加载
compile_data:
1)初始化Nuscenes API
2)Segmentation Data类,__getitem__得到traindata和valdata,主要调用NuscData的初始化,
a.get_scenes调用create_split_scenes得到train,val的场景ID,比如scene-xxxx list赋值scenes
b.调用prepro得到sle,赋值ixes,sle即一段场景视频中每隔0.5s采样的一帧信息
c.调用gen_dx_bx,dx:[0.5,0.5,20]代表单位长度,bx是[-49.75,49.75,0]代表起始网格点的中心,nx[200,200,1] 代表网格数目
DataLoader包装,trainloader,valloader
3) 训练数据读取
a.ixes读取sle,赋值rec
b.调用choose_cams,从6个cam中抽取5个
c.调用get_image_data,输入rec,cams,遍历每个相机,读取采集的信息和相机内外参,调用sle_augmentation
def sle_augmentation(self): H, W = self.data_aug_conf['H'], self.data_aug_conf['W'] # 900, 1600 fH, fW = self.data_aug_conf['final_dim'] # 128, 352 if self.is_train: resize = np.random.uniform(*self.data_aug_conf['resize_lim']) # (0.193, 0.225)区间范围内均匀采一个值 resize_dims = (int(W*resize), int(H*resize)) # resize后的尺寸 newW, newH = resize_dims # 计算裁剪的框 crop_h = int((1 - np.random.uniform(*self.data_aug_conf['bot_pct_lim']))*newH) - fH crop_w = int(np.random.uniform(0, max(0, newW - fW))) crop = (crop_w, crop_h, crop_w + fW, crop_h + fH) flip = False if self.data_aug_conf['rand_flip'] and np.random.choice([0, 1]): flip = True rotate = np.random.uniform(*self.data_aug_conf['rot_lim']) else: resize = max(fH/H, fW/W) resize_dims = (int(W*resize), int(H*resize)) newW, newH = resize_dims crop_h = int((1 - np.mean(self.data_aug_conf['bot_pct_lim']))*newH) - fH crop_w = int(max(0, newW - fW) / 2) crop = (crop_w, crop_h, crop_w + fW, crop_h + fH) flip = False rotate = 0 return resize, resize_dims, crop, flip, rotate
d.调用img_transform,输入增强参数,增强原理可参考该链接:https://www.cnblogs.com/jimchen1218/p/17940326
def img_transform(img, post_rot, post_tran, resize, resize_dims, crop, flip, rotate): # adjust image img = img.resize(resize_dims) # 变形 crop img = img.crop(crop) if flip: img = img.transpose(method=Image.FLIP_LEFT_RIGHT) img = img.rotate(rotate) # post-homography transformation post_rot *= resize post_tran -= torch.Tensor(crop[:2]) if flip: A = torch.Tensor([[-1, 0], [0, 1]]) b = torch.Tensor([crop[2] - crop[0], 0]) post_rot = A.matmul(post_rot) post_tran = A.matmul(post_tran) + b A = get_rot(rotate/180*np.pi) b = torch.Tensor([crop[2] - crop[0], crop[3] - crop[1]]) / 2 b = A.matmul(-b) + b post_rot = A.matmul(post_rot) post_tran = A.matmul(post_tran) + b return img, post_rot, post_tran
e.调用get_binimg,输入rec,构造shot任务:分割gt图
1)读取rec对应LiDAR的ego pose,从中读取translation和rotation,令trans为-translation,rot为rotation的逆,逆变换的参数;
2) 构造BEV网格:200x200
3) 遍历rec中anns字段,也就是每个标注实例:涉及ego坐标系,sensor坐标系,世界坐标系;inst的坐标是世界坐标系,要转到ego下,使用ego pose的Rt参数进行逆变换; 获取box的bottom corners的xy坐标;
通过box的whl参数,计算原点在box中心时的坐标,然后左乘旋转矩阵,平移到center中心,角点顺序如下:
bottom_corners按顺序提取2,3,7,6角点,返回3X4矩阵,因为是BEV,去除Z坐标,保留前两维,得到2x4矩阵,在转置到4x2矩阵,单位为米。
#得到在BEV上的网格坐标
pts = np.round((pts - self.bx[:2] + self.dx[:2]/2.) / self.dx[:2]).astype(np.int32) #bx第一个网格中心坐标
#imgs: [5,3,H,W]
#rots:[5,3,3]
#trans:[5,3]
#intrins:[5,3,3]
#post_rots:[5,3,3]
#post_trans:[5,3]
#binimg:[1,200,200]