【数据集基础】如何将labelme标注文件转换为coco128格式

前言

本文主要介绍将labelme标注的文件和数据集转换为coco128格式，同时处理数据集可以用于darknet-yolov3的训练和测试；

操作过程

使用labelme标注数据集，同时将其转为coco128格式；

labelme2coco128.py

'''
********************************************************************************
* @file  labelme2coco128.py
* @brief convert labelme json file to coco128 format.
********************************************************************************
* @author        xxx.zzz@yyy.com
* @date          2022.06.21
*
* @customer{     TBD}
* @project{      TFL}
* @processor{    TX2}
* @copyright     (C) Copyright ABC Technologies Co., Ltd
*
*                Contents and presentations are protected world-wide.
*                Any kind of using, copying etc. is prohibited without proor permission.
*                All rights - incl. industrial property rights - are reserved.
*
* @starthistory
* @revision{     1.0.0, AMY, Initial version.}
* @endhistory
********************************************************************************
* parent
* ├── images
* │   ├── xxx.png
* │   ├── yyy.png
* ├── labels
* │   ├── xxx.txt
* │   ├── yyy.txt
* ├── jsons
* │   ├── xxx.json
* │   ├── yyy.json
* ├── labelme2coco128.py
********************************************************************************
'''
import json
import os
 
# lisalabels = {'go':0, 'goForward':1, 'goLeft':2, 'stop':3, 'stopLeft':4, 'warning':5, 'warningLeft':6}
tfl_label = {'circle_green':0, 'circle_red':1, 'circle_yellow':2, 'circle_off':3, 'left_green':4, 'left_red':5, 'left_yellow':6, 'left_off':7, 'nomotor_green':8, 'nomotor_red':9, 'nomotor_yellow':10, 'nomotor_off':11}
dicts = {"version":"4.2.7", "flags":{}, "shapes":[], "imagePath":"","imageData":"","imageHeight": 720, "imageWidth":1280}

def get_bbox(size, box):
    # Convert xyxy box to YOLOv5 xywh box
    dw = 1. / size[0]
    dh = 1. / size[1]
    xc = (box[0] + box[2])*0.5*dw
    yc = (box[1] + box[3])*0.5*dh
    w = (box[2]-box[0])*dw
    h = (box[3]-box[1])*dh
    return xc, yc, w, h

def get_minrect(points, size):
    len_pts = len(points)
    x_min = size[0] # image width
    y_min = size[1] # image height
    x_max = 0
    y_max = 0
    for i in range(len_pts):
        if points[i][0] < x_min:
            x_min = points[i][0]
        if points[i][1] < y_min:
            y_min = points[i][1]
        if points[i][0] > x_max:
            x_max = points[i][0]
        if points[i][1] > y_max:
            y_max = points[i][1]
    return x_min, y_min, x_max, y_max

def labelme2coco128(path):
    jsonpath = os.path.join(path, 'jsons')
    labelpath = os.path.join(path, 'labels')
    json_list = os.listdir(jsonpath)
    for i, name in enumerate(json_list, 1):
        jsonfile = open(os.path.join(jsonpath, name), 'r')
        jsondata = json.load(jsonfile)
        # imgpath = jsondata['imagePath']
        shapes = jsondata['shapes']
        imgsz = jsondata['imageWidth'], jsondata['imageHeight']
        labelname = os.path.join(labelpath, name.replace('json', 'txt'))
        labelfile = open(labelname, 'a+')
        for k, shape in enumerate(shapes, 1): 
            label  = shape['label']
            classid = tfl_label[label]
            points = shape['points']
            rect = get_minrect(points, imgsz)
            bbox = get_bbox(imgsz, rect)
            if bbox[2]*imgsz[0] < 5:
                print('this image tfl width less than 5:\n', name)
            if bbox[3]*imgsz[1] < 15:
                print('this image tfl height less than 15:\n', name)
            info = f"{classid} {' '.join(f'{x:.6f}' for x in bbox)}\n"
            labelfile.write(info)
        labelfile.close()

if __name__ == "__main__":
    path = os.path.dirname(os.path.realpath(__file__))
    labelme2coco128(path)

View Code

一定要注意open的模式选择！！！！

labelfile = open(labelname, 'a+') # 若文件已存在，则追加编辑，否则创建新文件读写；
labelfile = open(labelname, 'w+') # 若文件已存在，则删除内容重新编辑，否则创建新文件；

数据集准备好之后，需要划分好数据集用于训练，这里使用darknet-yolov3进行训练；

划分数据集的过程

'''
********************************************************************************
* @file  genpath.py
* @brief generate train/valid dataset and path.
********************************************************************************
* @author        xxx.zzz@yyy.com
* @date          2022.07.06
*
* @customer{     TBD}
* @project{      TFL}
* @processor{    TX2}
* @copyright     (C) Copyright ABC Technologies Co., Ltd
*
*                Contents and presentations are protected world-wide.
*                Any kind of using, copying etc. is prohibited without proor permission.
*                All rights - incl. industrial property rights - are reserved.
*
* @starthistory
* @revision{     1.0.0, AMY, Initial version.}
* @endhistory
********************************************************************************
* parent
*├── genpath.py
*├── images
*├── image.txt
*├── jsons
*├── labels
*├── label.txt
*├── tfl
*    ├── images
*    │   ├── train
*    │   └── valid
*    └── labels
*        ├── train
*        └── valid
********************************************************************************
'''

import os
import random
import shutil

f_imgs =   open('image.txt', 'rt').readlines()
f_labels = open('label.txt', 'rt').readlines()
f_train = open('train.txt', 'w')
f_valid = open('valid.txt', 'w')

print(len(f_imgs))
print(len(f_labels))
path = './dataset/tfl'
image_train = os.path.join(path, 'images/train')
image_valid = os.path.join(path, 'images/valid')
label_train = os.path.join(path, 'labels/train')
label_valid = os.path.join(path, 'labels/valid')

n = len(f_imgs)
i = 0
val_percent = 0.1
random.shuffle(f_imgs)
for line in f_imgs:
    print(i)
    image_name = line.strip('\n')
    # label name
    line_label = image_name.replace('.png', '.txt')
    label_name = line_label.replace('images', 'labels')
    
    if i>n*(1-val_percent):
         # copy image
         new_image_name = os.path.join(image_valid, image_name.split('/')[-1])
         shutil.copyfile(image_name, new_image_name) 
         f_valid.write(new_image_name+'\n')
         # copy label
         new_label_name = os.path.join(label_valid, label_name.split('/')[-1])
         shutil.copyfile(label_name, new_label_name) 

    else:
         # copy image
         new_image_name = os.path.join(image_train, image_name.split('/')[-1])
         shutil.copyfile(image_name, new_image_name) 
         f_train.write(new_image_name+'\n')
         # copy label
         new_label_name = os.path.join(label_train, label_name.split('/')[-1])
         shutil.copyfile(label_name, new_label_name) 
    i = i + 1
f_train.close()
f_valid.close()

View Code

注意，写入文件每行末尾需要添加换行符；按照coco128的格式，数据集目录是images和labels；

另外，image.txt和label.txt分别是images和labels中的文件路径；

ls ./dataset/images/*.png > image.txt
ls ./dataset/labels/*.txt > label.txt

需要注意的是，复制文件的过程中可能会造成文件内容损坏或者压缩的问题；

就是，数据集的处理过程需要注意细节！！！对shell和python的脚本处理过程比较熟悉！！！

update 20220805

项目开发过程中发现，labelme2coco128.py生成的文件，内容有多次重复；推断是多次运行labelme2coco128.py导致的，最根本的原因open语句有问题。

labelfile = open(labelname, 'a+')

应该修改为

labelfile = open(labelname, 'w+')

其中

w	打开一个文件只用于写入。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。
wb	以二进制格式打开一个文件只用于写入。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。一般用于非文本文件如图片等。
w+	打开一个文件用于读写。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。
wb+	以二进制格式打开一个文件用于读写。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。一般用于非文本文件如图片等。
a	打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。也就是说，新的内容将会被写入到已有内容之后。如果该文件不存在，创建新文件进行写入。
ab	以二进制格式打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。也就是说，新的内容将会被写入到已有内容之后。如果该文件不存在，创建新文件进行写入。
a+	打开一个文件用于读写。如果该文件已存在，文件指针将会放在文件的结尾。文件打开时会是追加模式。如果该文件不存在，创建新文件用于读写。

update 20231213

依次读取某个文件夹下的json文件 python_python读取多个json文件-CSDN博客

选择性读取目录下某种后缀格式的文件；

import os
import json

folder_path = "/path/to/folder"
for filename in os.listdir(folder_path):
    if filename.endswith(".json"):
        file_path = os.path.join(folder_path, filename)
        with open(file_path, "r") as f:
            json_data = json.load(f)

参考

1. labelme标注的json文件转成coco128的格式；

2. python_func_open;

完

posted on 2022-06-20 23:11 鹅要长大阅读(68) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

鹅要长大

【数据集基础】如何将labelme标注文件转换为coco128格式

公告

导航