Yolov8-源码解析-二-

Yolov8 源码解析(二)


comments: true
description: Explore the MNIST dataset, a cornerstone in machine learning for handwritten digit recognition. Learn about its structure, features, and applications.
keywords: MNIST, dataset, handwritten digits, image classification, deep learning, machine learning, training set, testing set, NIST

MNIST Dataset

The MNIST (Modified National Institute of Standards and Technology) dataset is a large database of handwritten digits that is commonly used for training various image processing systems and machine learning models. It was created by "re-mixing" the samples from NIST's original datasets and has become a benchmark for evaluating the performance of image classification algorithms.

Key Features

  • MNIST contains 60,000 training images and 10,000 testing images of handwritten digits.
  • The dataset comprises grayscale images of size 28x28 pixels.
  • The images are normalized to fit into a 28x28 pixel bounding box and anti-aliased, introducing grayscale levels.
  • MNIST is widely used for training and testing in the field of machine learning, especially for image classification tasks.

Dataset Structure

The MNIST dataset is split into two subsets:

  1. Training Set: This subset contains 60,000 images of handwritten digits used for training machine learning models.
  2. Testing Set: This subset consists of 10,000 images used for testing and benchmarking the trained models.

Extended MNIST (EMNIST)

Extended MNIST (EMNIST) is a newer dataset developed and released by NIST to be the successor to MNIST. While MNIST included images only of handwritten digits, EMNIST includes all the images from NIST Special Database 19, which is a large database of handwritten uppercase and lowercase letters as well as digits. The images in EMNIST were converted into the same 28x28 pixel format, by the same process, as were the MNIST images. Accordingly, tools that work with the older, smaller MNIST dataset will likely work unmodified with EMNIST.

Applications

The MNIST dataset is widely used for training and evaluating deep learning models in image classification tasks, such as Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and various other machine learning algorithms. The dataset's simple and well-structured format makes it an essential resource for researchers and practitioners in the field of machine learning and computer vision.

Usage

To train a CNN model on the MNIST dataset for 100 epochs with an image size of 32x32, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n-cls.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="mnist", epochs=100, imgsz=32)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    cnn detect train data=mnist model=yolov8n-cls.pt epochs=100 imgsz=28
    ```

Sample Images and Annotations

The MNIST dataset contains grayscale images of handwritten digits, providing a well-structured dataset for image classification tasks. Here are some examples of images from the dataset:

Dataset sample image

The example showcases the variety and complexity of the handwritten digits in the MNIST dataset, highlighting the importance of a diverse dataset for training robust image classification models.

Citations and Acknowledgments

If you use the MNIST dataset in your

research or development work, please cite the following paper:

!!! Quote ""

=== "BibTeX"

    ```py
    @article{lecun2010mnist,
             title={MNIST handwritten digit database},
             author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
             journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
             volume={2},
             year={2010}
    }
    ```

We would like to acknowledge Yann LeCun, Corinna Cortes, and Christopher J.C. Burges for creating and maintaining the MNIST dataset as a valuable resource for the machine learning and computer vision research community. For more information about the MNIST dataset and its creators, visit the MNIST dataset website.

FAQ

What is the MNIST dataset, and why is it important in machine learning?

The MNIST dataset, or Modified National Institute of Standards and Technology dataset, is a widely-used collection of handwritten digits designed for training and testing image classification systems. It includes 60,000 training images and 10,000 testing images, all of which are grayscale and 28x28 pixels in size. The dataset's importance lies in its role as a standard benchmark for evaluating image classification algorithms, helping researchers and engineers to compare methods and track progress in the field.

How can I use Ultralytics YOLO to train a model on the MNIST dataset?

To train a model on the MNIST dataset using Ultralytics YOLO, you can follow these steps:

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n-cls.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="mnist", epochs=100, imgsz=32)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    cnn detect train data=mnist model=yolov8n-cls.pt epochs=100 imgsz=28
    ```

For a detailed list of available training arguments, refer to the Training page.

What is the difference between the MNIST and EMNIST datasets?

The MNIST dataset contains only handwritten digits, whereas the Extended MNIST (EMNIST) dataset includes both digits and uppercase and lowercase letters. EMNIST was developed as a successor to MNIST and utilizes the same 28x28 pixel format for the images, making it compatible with tools and models designed for the original MNIST dataset. This broader range of characters in EMNIST makes it useful for a wider variety of machine learning applications.

Can I use Ultralytics HUB to train models on custom datasets like MNIST?

Yes, you can use Ultralytics HUB to train models on custom datasets like MNIST. Ultralytics HUB offers a user-friendly interface for uploading datasets, training models, and managing projects without needing extensive coding knowledge. For more details on how to get started, check out the Ultralytics HUB Quickstart page.


comments: true
description: Explore our African Wildlife Dataset featuring images of buffalo, elephant, rhino, and zebra for training computer vision models. Ideal for research and conservation.
keywords: African Wildlife Dataset, South African animals, object detection, computer vision, YOLOv8, wildlife research, conservation, dataset

African Wildlife Dataset

This dataset showcases four common animal classes typically found in South African nature reserves. It includes images of African wildlife such as buffalo, elephant, rhino, and zebra, providing valuable insights into their characteristics. Essential for training computer vision algorithms, this dataset aids in identifying animals in various habitats, from zoos to forests, and supports wildlife research.



Watch: African Wildlife Animals Detection using Ultralytics YOLOv8

Dataset Structure

The African wildlife objects detection dataset is split into three subsets:

  • Training set: Contains 1052 images, each with corresponding annotations.
  • Validation set: Includes 225 images, each with paired annotations.
  • Testing set: Comprises 227 images, each with paired annotations.

Applications

This dataset can be applied in various computer vision tasks such as object detection, object tracking, and research. Specifically, it can be used to train and evaluate models for identifying African wildlife objects in images, which can have applications in wildlife conservation, ecological research, and monitoring efforts in natural reserves and protected areas. Additionally, it can serve as a valuable resource for educational purposes, enabling students and researchers to study and understand the characteristics and behaviors of different animal species.

Dataset YAML

A YAML (Yet Another Markup Language) file defines the dataset configuration, including paths, classes, and other pertinent details. For the African wildlife dataset, the african-wildlife.yaml file is located at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/african-wildlife.yaml.

!!! Example "ultralytics/cfg/datasets/african-wildlife.yaml"

```py
--8<-- "ultralytics/cfg/datasets/african-wildlife.yaml"
```

Usage

To train a YOLOv8n model on the African wildlife dataset for 100 epochs with an image size of 640, use the provided code samples. For a comprehensive list of available parameters, refer to the model's Training page.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="african-wildlife.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=african-wildlife.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

!!! Example "Inference Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("path/to/best.pt")  # load a brain-tumor fine-tuned model

    # Inference using the model
    results = model.predict("https://ultralytics.com/assets/african-wildlife-sample.jpg")
    ```

=== "CLI"

    ```py
    # Start prediction with a finetuned *.pt model
    yolo detect predict model='path/to/best.pt' imgsz=640 source="https://ultralytics.com/assets/african-wildlife-sample.jpg"
    ```

Sample Images and Annotations

The African wildlife dataset comprises a wide variety of images showcasing diverse animal species and their natural habitats. Below are examples of images from the dataset, each accompanied by its corresponding annotations.

African wildlife dataset sample image

  • Mosaiced Image: Here, we present a training batch consisting of mosaiced dataset images. Mosaicing, a training technique, combines multiple images into one, enriching batch diversity. This method helps enhance the model's ability to generalize across different object sizes, aspect ratios, and contexts.

This example illustrates the variety and complexity of images in the African wildlife dataset, emphasizing the benefits of including mosaicing during the training process.

Citations and Acknowledgments

The dataset has been released available under the AGPL-3.0 License.

FAQ

What is the African Wildlife Dataset, and how can it be used in computer vision projects?

The African Wildlife Dataset includes images of four common animal species found in South African nature reserves: buffalo, elephant, rhino, and zebra. It is a valuable resource for training computer vision algorithms in object detection and animal identification. The dataset supports various tasks like object tracking, research, and conservation efforts. For more information on its structure and applications, refer to the Dataset Structure section and Applications of the dataset.

How do I train a YOLOv8 model using the African Wildlife Dataset?

You can train a YOLOv8 model on the African Wildlife Dataset by using the african-wildlife.yaml configuration file. Below is an example of how to train the YOLOv8n model for 100 epochs with an image size of 640:

!!! Example

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="african-wildlife.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=african-wildlife.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

For additional training parameters and options, refer to the Training documentation.

Where can I find the YAML configuration file for the African Wildlife Dataset?

The YAML configuration file for the African Wildlife Dataset, named african-wildlife.yaml, can be found at this GitHub link. This file defines the dataset configuration, including paths, classes, and other details crucial for training machine learning models. See the Dataset YAML section for more details.

Can I see sample images and annotations from the African Wildlife Dataset?

Yes, the African Wildlife Dataset includes a wide variety of images showcasing diverse animal species in their natural habitats. You can view sample images and their corresponding annotations in the Sample Images and Annotations section. This section also illustrates the use of mosaicing technique to combine multiple images into one for enriched batch diversity, enhancing the model's generalization ability.

How can the African Wildlife Dataset be used to support wildlife conservation and research?

The African Wildlife Dataset is ideal for supporting wildlife conservation and research by enabling the training and evaluation of models to identify African wildlife in different habitats. These models can assist in monitoring animal populations, studying their behavior, and recognizing conservation needs. Additionally, the dataset can be utilized for educational purposes, helping students and researchers understand the characteristics and behaviors of different animal species. More details can be found in the Applications section.


comments: true
description: Explore the comprehensive Argoverse dataset by Argo AI for 3D tracking, motion forecasting, and stereo depth estimation in autonomous driving research.
keywords: Argoverse dataset, autonomous driving, 3D tracking, motion forecasting, stereo depth estimation, Argo AI, LiDAR point clouds, high-resolution images, HD maps

Argoverse Dataset

The Argoverse dataset is a collection of data designed to support research in autonomous driving tasks, such as 3D tracking, motion forecasting, and stereo depth estimation. Developed by Argo AI, the dataset provides a wide range of high-quality sensor data, including high-resolution images, LiDAR point clouds, and map data.

!!! Note

The Argoverse dataset `*.zip` file required for training was removed from Amazon S3 after the shutdown of Argo AI by Ford, but we have made it available for manual download on [Google Drive](https://drive.google.com/file/d/1st9qW3BeIwQsnR0t8mRpvbsSWIo16ACi/view?usp=drive_link).

Key Features

  • Argoverse contains over 290K labeled 3D object tracks and 5 million object instances across 1,263 distinct scenes.
  • The dataset includes high-resolution camera images, LiDAR point clouds, and richly annotated HD maps.
  • Annotations include 3D bounding boxes for objects, object tracks, and trajectory information.
  • Argoverse provides multiple subsets for different tasks, such as 3D tracking, motion forecasting, and stereo depth estimation.

Dataset Structure

The Argoverse dataset is organized into three main subsets:

  1. Argoverse 3D Tracking: This subset contains 113 scenes with over 290K labeled 3D object tracks, focusing on 3D object tracking tasks. It includes LiDAR point clouds, camera images, and sensor calibration information.
  2. Argoverse Motion Forecasting: This subset consists of 324K vehicle trajectories collected from 60 hours of driving data, suitable for motion forecasting tasks.
  3. Argoverse Stereo Depth Estimation: This subset is designed for stereo depth estimation tasks and includes over 10K stereo image pairs with corresponding LiDAR point clouds for ground truth depth estimation.

Applications

The Argoverse dataset is widely used for training and evaluating deep learning models in autonomous driving tasks such as 3D object tracking, motion forecasting, and stereo depth estimation. The dataset's diverse set of sensor data, object annotations, and map information make it a valuable resource for researchers and practitioners in the field of autonomous driving.

Dataset YAML

A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. For the case of the Argoverse dataset, the Argoverse.yaml file is maintained at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Argoverse.yaml.

!!! Example "ultralytics/cfg/datasets/Argoverse.yaml"

```py
--8<-- "ultralytics/cfg/datasets/Argoverse.yaml"
```

Usage

To train a YOLOv8n model on the Argoverse dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="Argoverse.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=Argoverse.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

Sample Data and Annotations

The Argoverse dataset contains a diverse set of sensor data, including camera images, LiDAR point clouds, and HD map information, providing rich context for autonomous driving tasks. Here are some examples of data from the dataset, along with their corresponding annotations:

Dataset sample image

  • Argoverse 3D Tracking: This image demonstrates an example of 3D object tracking, where objects are annotated with 3D bounding boxes. The dataset provides LiDAR point clouds and camera images to facilitate the development of models for this task.

The example showcases the variety and complexity of the data in the Argoverse dataset and highlights the importance of high-quality sensor data for autonomous driving tasks.

Citations and Acknowledgments

If you use the Argoverse dataset in your research or development work, please cite the following paper:

!!! Quote ""

=== "BibTeX"

    ```py
    @inproceedings{chang2019argoverse,
      title={Argoverse: 3D Tracking and Forecasting with Rich Maps},
      author={Chang, Ming-Fang and Lambert, John and Sangkloy, Patsorn and Singh, Jagjeet and Bak, Slawomir and Hartnett, Andrew and Wang, Dequan and Carr, Peter and Lucey, Simon and Ramanan, Deva and others},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages={8748--8757},
      year={2019}
    }
    ```

We would like to acknowledge Argo AI for creating and maintaining the Argoverse dataset as a valuable resource for the autonomous driving research community. For more information about the Argoverse dataset and its creators, visit the Argoverse dataset website.

FAQ

What is the Argoverse dataset and its key features?

The Argoverse dataset, developed by Argo AI, supports autonomous driving research. It includes over 290K labeled 3D object tracks and 5 million object instances across 1,263 distinct scenes. The dataset provides high-resolution camera images, LiDAR point clouds, and annotated HD maps, making it valuable for tasks like 3D tracking, motion forecasting, and stereo depth estimation.

How can I train an Ultralytics YOLO model using the Argoverse dataset?

To train a YOLOv8 model with the Argoverse dataset, use the provided YAML configuration file and the following code:

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="Argoverse.yaml", epochs=100, imgsz=640)
    ```


=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=Argoverse.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

For a detailed explanation of the arguments, refer to the model Training page.

What types of data and annotations are available in the Argoverse dataset?

The Argoverse dataset includes various sensor data types such as high-resolution camera images, LiDAR point clouds, and HD map data. Annotations include 3D bounding boxes, object tracks, and trajectory information. These comprehensive annotations are essential for accurate model training in tasks like 3D object tracking, motion forecasting, and stereo depth estimation.

How is the Argoverse dataset structured?

The dataset is divided into three main subsets:

  1. Argoverse 3D Tracking: Contains 113 scenes with over 290K labeled 3D object tracks, focusing on 3D object tracking tasks. It includes LiDAR point clouds, camera images, and sensor calibration information.
  2. Argoverse Motion Forecasting: Consists of 324K vehicle trajectories collected from 60 hours of driving data, suitable for motion forecasting tasks.
  3. Argoverse Stereo Depth Estimation: Includes over 10K stereo image pairs with corresponding LiDAR point clouds for ground truth depth estimation.

Where can I download the Argoverse dataset now that it has been removed from Amazon S3?

The Argoverse dataset *.zip file, previously available on Amazon S3, can now be manually downloaded from Google Drive.

What is the YAML configuration file used for with the Argoverse dataset?

A YAML file contains the dataset's paths, classes, and other essential information. For the Argoverse dataset, the configuration file, Argoverse.yaml, can be found at the following link: Argoverse.yaml.

For more information about YAML configurations, see our datasets guide.


comments: true
description: Explore the brain tumor detection dataset with MRI/CT images. Essential for training AI models for early diagnosis and treatment planning.
keywords: brain tumor dataset, MRI scans, CT scans, brain tumor detection, medical imaging, AI in healthcare, computer vision, early diagnosis, treatment planning

Brain Tumor Dataset

A brain tumor detection dataset consists of medical images from MRI or CT scans, containing information about brain tumor presence, location, and characteristics. This dataset is essential for training computer vision algorithms to automate brain tumor identification, aiding in early diagnosis and treatment planning.



Watch: Brain Tumor Detection using Ultralytics HUB

Dataset Structure

The brain tumor dataset is divided into two subsets:

  • Training set: Consisting of 893 images, each accompanied by corresponding annotations.
  • Testing set: Comprising 223 images, with annotations paired for each one.

Applications

The application of brain tumor detection using computer vision enables early diagnosis, treatment planning, and monitoring of tumor progression. By analyzing medical imaging data like MRI or CT scans, computer vision systems assist in accurately identifying brain tumors, aiding in timely medical intervention and personalized treatment strategies.

Dataset YAML

A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the brain tumor dataset, the brain-tumor.yaml file is maintained at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/brain-tumor.yaml.

!!! Example "ultralytics/cfg/datasets/brain-tumor.yaml"

```py
--8<-- "ultralytics/cfg/datasets/brain-tumor.yaml"
```

Usage

To train a YOLOv8n model on the brain tumor dataset for 100 epochs with an image size of 640, utilize the provided code snippets. For a detailed list of available arguments, consult the model's Training page.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="brain-tumor.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=brain-tumor.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

!!! Example "Inference Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("path/to/best.pt")  # load a brain-tumor fine-tuned model

    # Inference using the model
    results = model.predict("https://ultralytics.com/assets/brain-tumor-sample.jpg")
    ```

=== "CLI"

    ```py
    # Start prediction with a finetuned *.pt model
    yolo detect predict model='path/to/best.pt' imgsz=640 source="https://ultralytics.com/assets/brain-tumor-sample.jpg"
    ```

Sample Images and Annotations

The brain tumor dataset encompasses a wide array of images featuring diverse object categories and intricate scenes. Presented below are examples of images from the dataset, accompanied by their respective annotations

Brain tumor dataset sample image

  • Mosaiced Image: Displayed here is a training batch comprising mosaiced dataset images. Mosaicing, a training technique, consolidates multiple images into one, enhancing batch diversity. This approach aids in improving the model's capacity to generalize across various object sizes, aspect ratios, and contexts.

This example highlights the diversity and intricacy of images within the brain tumor dataset, underscoring the advantages of incorporating mosaicing during the training phase.

Citations and Acknowledgments

The dataset has been released available under the AGPL-3.0 License.

FAQ

What is the structure of the brain tumor dataset available in Ultralytics documentation?

The brain tumor dataset is divided into two subsets: the training set consists of 893 images with corresponding annotations, while the testing set comprises 223 images with paired annotations. This structured division aids in developing robust and accurate computer vision models for detecting brain tumors. For more information on the dataset structure, visit the Dataset Structure section.

How can I train a YOLOv8 model on the brain tumor dataset using Ultralytics?

You can train a YOLOv8 model on the brain tumor dataset for 100 epochs with an image size of 640px using both Python and CLI methods. Below are the examples for both:

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="brain-tumor.yaml", epochs=100, imgsz=640)
    ```


=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=brain-tumor.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

For a detailed list of available arguments, refer to the Training page.

What are the benefits of using the brain tumor dataset for AI in healthcare?

Using the brain tumor dataset in AI projects enables early diagnosis and treatment planning for brain tumors. It helps in automating brain tumor identification through computer vision, facilitating accurate and timely medical interventions, and supporting personalized treatment strategies. This application holds significant potential in improving patient outcomes and medical efficiencies.

How do I perform inference using a fine-tuned YOLOv8 model on the brain tumor dataset?

Inference using a fine-tuned YOLOv8 model can be performed with either Python or CLI approaches. Here are the examples:

!!! Example "Inference Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("path/to/best.pt")  # load a brain-tumor fine-tuned model

    # Inference using the model
    results = model.predict("https://ultralytics.com/assets/brain-tumor-sample.jpg")
    ```

=== "CLI"

    ```py
    # Start prediction with a finetuned *.pt model
    yolo detect predict model='path/to/best.pt' imgsz=640 source="https://ultralytics.com/assets/brain-tumor-sample.jpg"
    ```

Where can I find the YAML configuration for the brain tumor dataset?

The YAML configuration file for the brain tumor dataset can be found at brain-tumor.yaml. This file includes paths, classes, and additional relevant information necessary for training and evaluating models on this dataset.


comments: true
description: Explore the COCO dataset for object detection and segmentation. Learn about its structure, usage, pretrained models, and key features.
keywords: COCO dataset, object detection, segmentation, benchmarking, computer vision, pose estimation, YOLO models, COCO annotations

COCO Dataset

The COCO (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and pose estimation tasks.



Watch: Ultralytics COCO Dataset Overview

COCO Pretrained Models

Model size
(pixels)
mAPval
50-95
Speed
CPU ONNX
(ms)
Speed
A100 TensorRT
(ms)
params
(M)
FLOPs
(B)
YOLOv8n 640 37.3 80.4 0.99 3.2 8.7
YOLOv8s 640 44.9 128.4 1.20 11.2 28.6
YOLOv8m 640 50.2 234.7 1.83 25.9 78.9
YOLOv8l 640 52.9 375.2 2.39 43.7 165.2
YOLOv8x 640 53.9 479.1 3.53 68.2 257.8

Key Features

  • COCO contains 330K images, with 200K images having annotations for object detection, segmentation, and captioning tasks.
  • The dataset comprises 80 object categories, including common objects like cars, bicycles, and animals, as well as more specific categories such as umbrellas, handbags, and sports equipment.
  • Annotations include object bounding boxes, segmentation masks, and captions for each image.
  • COCO provides standardized evaluation metrics like mean Average Precision (mAP) for object detection, and mean Average Recall (mAR) for segmentation tasks, making it suitable for comparing model performance.

Dataset Structure

The COCO dataset is split into three subsets:

  1. Train2017: This subset contains 118K images for training object detection, segmentation, and captioning models.
  2. Val2017: This subset has 5K images used for validation purposes during model training.
  3. Test2017: This subset consists of 20K images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the COCO evaluation server for performance evaluation.

Applications

The COCO dataset is widely used for training and evaluating deep learning models in object detection (such as YOLO, Faster R-CNN, and SSD), instance segmentation (such as Mask R-CNN), and keypoint detection (such as OpenPose). The dataset's diverse set of object categories, large number of annotated images, and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners.

Dataset YAML

A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO dataset, the coco.yaml file is maintained at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco.yaml.

!!! Example "ultralytics/cfg/datasets/coco.yaml"

```py
--8<-- "ultralytics/cfg/datasets/coco.yaml"
```

Usage

To train a YOLOv8n model on the COCO dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="coco.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=coco.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

Sample Images and Annotations

The COCO dataset contains a diverse set of images with various object categories and complex scenes. Here are some examples of images from the dataset, along with their corresponding annotations:

Dataset sample image

  • Mosaiced Image: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts.

The example showcases the variety and complexity of the images in the COCO dataset and the benefits of using mosaicing during the training process.

Citations and Acknowledgments

If you use the COCO dataset in your research or development work, please cite the following paper:

!!! Quote ""

=== "BibTeX"

    ```py
    @misc{lin2015microsoft,
          title={Microsoft COCO: Common Objects in Context},
          author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár},
          year={2015},
          eprint={1405.0312},
          archivePrefix={arXiv},
          primaryClass={cs.CV}
    }
    ```

We would like to acknowledge the COCO Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the COCO dataset and its creators, visit the COCO dataset website.

FAQ

What is the COCO dataset and why is it important for computer vision?

The COCO dataset (Common Objects in Context) is a large-scale dataset used for object detection, segmentation, and captioning. It contains 330K images with detailed annotations for 80 object categories, making it essential for benchmarking and training computer vision models. Researchers use COCO due to its diverse categories and standardized evaluation metrics like mean Average Precision (mAP).

How can I train a YOLO model using the COCO dataset?

To train a YOLOv8 model using the COCO dataset, you can use the following code snippets:

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="coco.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=coco.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

Refer to the Training page for more details on available arguments.

What are the key features of the COCO dataset?

The COCO dataset includes:

  • 330K images, with 200K annotated for object detection, segmentation, and captioning.
  • 80 object categories ranging from common items like cars and animals to specific ones like handbags and sports equipment.
  • Standardized evaluation metrics for object detection (mAP) and segmentation (mean Average Recall, mAR).
  • Mosaicing technique in training batches to enhance model generalization across various object sizes and contexts.

Where can I find pretrained YOLOv8 models trained on the COCO dataset?

Pretrained YOLOv8 models on the COCO dataset can be downloaded from the links provided in the documentation. Examples include:

These models vary in size, mAP, and inference speed, providing options for different performance and resource requirements.

How is the COCO dataset structured and how do I use it?

The COCO dataset is split into three subsets:

  1. Train2017: 118K images for training.
  2. Val2017: 5K images for validation during training.
  3. Test2017: 20K images for benchmarking trained models. Results need to be submitted to the COCO evaluation server for performance evaluation.

The dataset's YAML configuration file is available at coco.yaml, which defines paths, classes, and dataset details.


comments: true
description: Explore the Ultralytics COCO8 dataset, a versatile and manageable set of 8 images perfect for testing object detection models and training pipelines.
keywords: COCO8, Ultralytics, dataset, object detection, YOLOv8, training, validation, machine learning, computer vision

COCO8 Dataset

Introduction

Ultralytics COCO8 is a small, but versatile object detection dataset composed of the first 8 images of the COCO train 2017 set, 4 for training and 4 for validation. This dataset is ideal for testing and debugging object detection models, or for experimenting with new detection approaches. With 8 images, it is small enough to be easily manageable, yet diverse enough to test training pipelines for errors and act as a sanity check before training larger datasets.



Watch: Ultralytics COCO Dataset Overview

This dataset is intended for use with Ultralytics HUB and YOLOv8.

Dataset YAML

A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO8 dataset, the coco8.yaml file is maintained at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco8.yaml.

!!! Example "ultralytics/cfg/datasets/coco8.yaml"

```py
--8<-- "ultralytics/cfg/datasets/coco8.yaml"
```

Usage

To train a YOLOv8n model on the COCO8 dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=coco8.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

Sample Images and Annotations

Here are some examples of images from the COCO8 dataset, along with their corresponding annotations:

Dataset sample image
  • Mosaiced Image: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts.

The example showcases the variety and complexity of the images in the COCO8 dataset and the benefits of using mosaicing during the training process.

Citations and Acknowledgments

If you use the COCO dataset in your research or development work, please cite the following paper:

!!! Quote ""

=== "BibTeX"

    ```py
    @misc{lin2015microsoft,
          title={Microsoft COCO: Common Objects in Context},
          author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár},
          year={2015},
          eprint={1405.0312},
          archivePrefix={arXiv},
          primaryClass={cs.CV}
    }
    ```

We would like to acknowledge the COCO Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the COCO dataset and its creators, visit the COCO dataset website.

FAQ

What is the Ultralytics COCO8 dataset used for?

The Ultralytics COCO8 dataset is a compact yet versatile object detection dataset consisting of the first 8 images from the COCO train 2017 set, with 4 images for training and 4 for validation. It is designed for testing and debugging object detection models and experimentation with new detection approaches. Despite its small size, COCO8 offers enough diversity to act as a sanity check for your training pipelines before deploying larger datasets. For more details, view the COCO8 dataset.

How do I train a YOLOv8 model using the COCO8 dataset?

To train a YOLOv8 model using the COCO8 dataset, you can employ either Python or CLI commands. Here's how you can start:

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=coco8.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

For a comprehensive list of available arguments, refer to the model Training page.

Why should I use Ultralytics HUB for managing my COCO8 training?

Ultralytics HUB is an all-in-one web tool designed to simplify the training and deployment of YOLO models, including the Ultralytics YOLOv8 models on the COCO8 dataset. It offers cloud training, real-time tracking, and seamless dataset management. HUB allows you to start training with a single click and avoids the complexities of manual setups. Discover more about Ultralytics HUB and its benefits.

What are the benefits of using mosaic augmentation in training with the COCO8 dataset?

Mosaic augmentation, demonstrated in the COCO8 dataset, combines multiple images into a single image during training. This technique increases the variety of objects and scenes in each training batch, improving the model's ability to generalize across different object sizes, aspect ratios, and contexts. This results in a more robust object detection model. For more details, refer to the training guide.

How can I validate my YOLOv8 model trained on the COCO8 dataset?

Validation of your YOLOv8 model trained on the COCO8 dataset can be performed using the model's validation commands. You can invoke the validation mode via CLI or Python script to evaluate the model's performance using precise metrics. For detailed instructions, visit the Validation page.


comments: true
description: Explore the Global Wheat Head Dataset to develop accurate wheat head detection models. Includes training images, annotations, and usage for crop management.
keywords: Global Wheat Head Dataset, wheat head detection, wheat phenotyping, crop management, deep learning, object detection, training datasets

Global Wheat Head Dataset

The Global Wheat Head Dataset is a collection of images designed to support the development of accurate wheat head detection models for applications in wheat phenotyping and crop management. Wheat heads, also known as spikes, are the grain-bearing parts of the wheat plant. Accurate estimation of wheat head density and size is essential for assessing crop health, maturity, and yield potential. The dataset, created by a collaboration of nine research institutes from seven countries, covers multiple growing regions to ensure models generalize well across different environments.

Key Features

  • The dataset contains over 3,000 training images from Europe (France, UK, Switzerland) and North America (Canada).
  • It includes approximately 1,000 test images from Australia, Japan, and China.
  • Images are outdoor field images, capturing the natural variability in wheat head appearances.
  • Annotations include wheat head bounding boxes to support object detection tasks.

Dataset Structure

The Global Wheat Head Dataset is organized into two main subsets:

  1. Training Set: This subset contains over 3,000 images from Europe and North America. The images are labeled with wheat head bounding boxes, providing ground truth for training object detection models.
  2. Test Set: This subset consists of approximately 1,000 images from Australia, Japan, and China. These images are used for evaluating the performance of trained models on unseen genotypes, environments, and observational conditions.

Applications

The Global Wheat Head Dataset is widely used for training and evaluating deep learning models in wheat head detection tasks. The dataset's diverse set of images, capturing a wide range of appearances, environments, and conditions, make it a valuable resource for researchers and practitioners in the field of plant phenotyping and crop management.

Dataset YAML

A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. For the case of the Global Wheat Head Dataset, the GlobalWheat2020.yaml file is maintained at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/GlobalWheat2020.yaml.

!!! Example "ultralytics/cfg/datasets/GlobalWheat2020.yaml"

```py
--8<-- "ultralytics/cfg/datasets/GlobalWheat2020.yaml"
```

Usage

To train a YOLOv8n model on the Global Wheat Head Dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="GlobalWheat2020.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=GlobalWheat2020.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

Sample Data and Annotations

The Global Wheat Head Dataset contains a diverse set of outdoor field images, capturing the natural variability in wheat head appearances, environments, and conditions. Here are some examples of data from the dataset, along with their corresponding annotations:

Dataset sample image

  • Wheat Head Detection: This image demonstrates an example of wheat head detection, where wheat heads are annotated with bounding boxes. The dataset provides a variety of images to facilitate the development of models for this task.

The example showcases the variety and complexity of the data in the Global Wheat Head Dataset and highlights the importance of accurate wheat head detection for applications in wheat phenotyping and crop management.

Citations and Acknowledgments

If you use the Global Wheat Head Dataset in your research or development work, please cite the following paper:

!!! Quote ""

=== "BibTeX"

    ```py
    @article{david2020global,
             title={Global Wheat Head Detection (GWHD) Dataset: A Large and Diverse Dataset of High-Resolution RGB-Labelled Images to Develop and Benchmark Wheat Head Detection Methods},
             author={David, Etienne and Madec, Simon and Sadeghi-Tehran, Pouria and Aasen, Helge and Zheng, Bangyou and Liu, Shouyang and Kirchgessner, Norbert and Ishikawa, Goro and Nagasawa, Koichi and Badhon, Minhajul and others},
             journal={arXiv preprint arXiv:2005.02162},
             year={2020}
    }
    ```

We would like to acknowledge the researchers and institutions that contributed to the creation and maintenance of the Global Wheat Head Dataset as a valuable resource for the plant phenotyping and crop management research community. For more information about the dataset and its creators, visit the Global Wheat Head Dataset website.

FAQ

What is the Global Wheat Head Dataset used for?

The Global Wheat Head Dataset is primarily used for developing and training deep learning models aimed at wheat head detection. This is crucial for applications in wheat phenotyping and crop management, allowing for more accurate estimations of wheat head density, size, and overall crop yield potential. Accurate detection methods help in assessing crop health and maturity, essential for efficient crop management.

How do I train a YOLOv8n model on the Global Wheat Head Dataset?

To train a YOLOv8n model on the Global Wheat Head Dataset, you can use the following code snippets. Make sure you have the GlobalWheat2020.yaml configuration file specifying dataset paths and classes:

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a pre-trained model (recommended for training)
    model = YOLO("yolov8n.pt")

    # Train the model
    results = model.train(data="GlobalWheat2020.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=GlobalWheat2020.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

For a comprehensive list of available arguments, refer to the model Training page.

What are the key features of the Global Wheat Head Dataset?

Key features of the Global Wheat Head Dataset include:

  • Over 3,000 training images from Europe (France, UK, Switzerland) and North America (Canada).
  • Approximately 1,000 test images from Australia, Japan, and China.
  • High variability in wheat head appearances due to different growing environments.
  • Detailed annotations with wheat head bounding boxes to aid object detection models.

These features facilitate the development of robust models capable of generalization across multiple regions.

Where can I find the configuration YAML file for the Global Wheat Head Dataset?

The configuration YAML file for the Global Wheat Head Dataset, named GlobalWheat2020.yaml, is available on GitHub. You can access it at this link. This file contains necessary information about dataset paths, classes, and other configuration details needed for model training in Ultralytics YOLO.

Why is wheat head detection important in crop management?

Wheat head detection is critical in crop management because it enables accurate estimation of wheat head density and size, which are essential for evaluating crop health, maturity, and yield potential. By leveraging deep learning models trained on datasets like the Global Wheat Head Dataset, farmers and researchers can better monitor and manage crops, leading to improved productivity and optimized resource use in agricultural practices. This technological advancement supports sustainable agriculture and food security initiatives.

For more information on applications of AI in agriculture, visit AI in Agriculture.


comments: true
description: Learn about dataset formats compatible with Ultralytics YOLO for robust object detection. Explore supported datasets and learn how to convert formats.
keywords: Ultralytics, YOLO, object detection datasets, dataset formats, COCO, dataset conversion, training datasets

Object Detection Datasets Overview

Training a robust and accurate object detection model requires a comprehensive dataset. This guide introduces various formats of datasets that are compatible with the Ultralytics YOLO model and provides insights into their structure, usage, and how to convert between different formats.

Supported Dataset Formats

Ultralytics YOLO format

The Ultralytics YOLO format is a dataset configuration format that allows you to define the dataset root directory, the relative paths to training/validation/testing image directories or *.txt files containing image paths, and a dictionary of class names. Here is an example:

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco8  # dataset root dir
train: images/train  # train images (relative to 'path') 4 images
val: images/val  # val images (relative to 'path') 4 images
test:  # test images (optional)

# Classes (80 COCO classes)
names:
  0: person
  1: bicycle
  2: car
  # ...
  77: teddy bear
  78: hair drier
  79: toothbrush

Labels for this format should be exported to YOLO format with one *.txt file per image. If there are no objects in an image, no *.txt file is required. The *.txt file should be formatted with one row per object in class x_center y_center width height format. Box coordinates must be in normalized xywh format (from 0 to 1). If your boxes are in pixels, you should divide x_center and width by image width, and y_center and height by image height. Class numbers should be zero-indexed (start with 0).

Example labelled image

The label file corresponding to the above image contains 2 persons (class 0) and a tie (class 27):

Example label file

When using the Ultralytics YOLO format, organize your training and validation images and labels as shown in the COCO8 dataset example below.

Example dataset directory structure

Usage

Here's how you can use these formats to train your model:

!!! Example

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=coco8.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

Supported Datasets

Here is a list of the supported datasets and a brief description for each:

  • Argoverse: A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
  • COCO: Common Objects in Context (COCO) is a large-scale object detection, segmentation, and captioning dataset with 80 object categories.
  • LVIS: A large-scale object detection, segmentation, and captioning dataset with 1203 object categories.
  • COCO8: A smaller subset of the first 4 images from COCO train and COCO val, suitable for quick tests.
  • Global Wheat 2020: A dataset containing images of wheat heads for the Global Wheat Challenge 2020.
  • Objects365: A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
  • OpenImagesV7: A comprehensive dataset by Google with 1.7M train images and 42k validation images.
  • SKU-110K: A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes.
  • VisDrone: A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
  • VOC: The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
  • xView: A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.
  • Roboflow 100: A diverse object detection benchmark with 100 datasets spanning seven imagery domains for comprehensive model evaluation.
  • Brain-tumor: A dataset for detecting brain tumors includes MRI or CT scan images with details on tumor presence, location, and characteristics.
  • African-wildlife: A dataset featuring images of African wildlife, including buffalo, elephant, rhino, and zebras.
  • Signature: A dataset featuring images of various documents with annotated signatures, supporting document verification and fraud detection research.

Adding your own dataset

If you have your own dataset and would like to use it for training detection models with Ultralytics YOLO format, ensure that it follows the format specified above under "Ultralytics YOLO format". Convert your annotations to the required format and specify the paths, number of classes, and class names in the YAML configuration file.

Port or Convert Label Formats

COCO Dataset Format to YOLO Format

You can easily convert labels from the popular COCO dataset format to the YOLO format using the following code snippet:

!!! Example

=== "Python"

    ```py
    from ultralytics.data.converter import convert_coco

    convert_coco(labels_dir="path/to/coco/annotations/")
    ```

This conversion tool can be used to convert the COCO dataset or any dataset in the COCO format to the Ultralytics YOLO format.

Remember to double-check if the dataset you want to use is compatible with your model and follows the necessary format conventions. Properly formatted datasets are crucial for training successful object detection models.

FAQ

What is the Ultralytics YOLO dataset format and how to structure it?

The Ultralytics YOLO format is a structured configuration for defining datasets in your training projects. It involves setting paths to your training, validation, and testing images and corresponding labels. For example:

path: ../datasets/coco8  # dataset root directory
train: images/train  # training images (relative to 'path')
val: images/val  # validation images (relative to 'path')
test:  # optional test images
names:
  0: person
  1: bicycle
  2: car
  # ...

Labels are saved in *.txt files with one file per image, formatted as class x_center y_center width height with normalized coordinates. For a detailed guide, see the COCO8 dataset example.

How do I convert a COCO dataset to the YOLO format?

You can convert a COCO dataset to the YOLO format using the Ultralytics conversion tools. Here's a quick method:

from ultralytics.data.converter import convert_coco

convert_coco(labels_dir="path/to/coco/annotations/")

This code will convert your COCO annotations to YOLO format, enabling seamless integration with Ultralytics YOLO models. For additional details, visit the Port or Convert Label Formats section.

Which datasets are supported by Ultralytics YOLO for object detection?

Ultralytics YOLO supports a wide range of datasets, including:

Each dataset page provides detailed information on the structure and usage tailored for efficient YOLOv8 training. Explore the full list in the Supported Datasets section.

How do I start training a YOLOv8 model using my dataset?

To start training a YOLOv8 model, ensure your dataset is formatted correctly and the paths are defined in a YAML file. Use the following script to begin training:

!!! Example

=== "Python"

    ```py
    from ultralytics import YOLO

    model = YOLO("yolov8n.pt")  # Load a pretrained model
    results = model.train(data="path/to/your_dataset.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    yolo detect train data=path/to/your_dataset.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

Refer to the Usage section for more details on utilizing different modes, including CLI commands.

Where can I find practical examples of using Ultralytics YOLO for object detection?

Ultralytics provides numerous examples and practical guides for using YOLOv8 in diverse applications. For a comprehensive overview, visit the Ultralytics Blog where you can find case studies, detailed tutorials, and community stories showcasing object detection, segmentation, and more with YOLOv8. For specific examples, check the Usage section in the documentation.


comments: true
description: Discover the LVIS dataset by Facebook AI Research, a benchmark for object detection and instance segmentation with a large, diverse vocabulary. Learn how to utilize it.
keywords: LVIS dataset, object detection, instance segmentation, Facebook AI Research, YOLO, computer vision, model training, LVIS examples

LVIS Dataset

The LVIS dataset is a large-scale, fine-grained vocabulary-level annotation dataset developed and released by Facebook AI Research (FAIR). It is primarily used as a research benchmark for object detection and instance segmentation with a large vocabulary of categories, aiming to drive further advancements in computer vision field.



Watch: YOLO World training workflow with LVIS dataset

LVIS Dataset example images

Key Features

  • LVIS contains 160k images and 2M instance annotations for object detection, segmentation, and captioning tasks.
  • The dataset comprises 1203 object categories, including common objects like cars, bicycles, and animals, as well as more specific categories such as umbrellas, handbags, and sports equipment.
  • Annotations include object bounding boxes, segmentation masks, and captions for each image.
  • LVIS provides standardized evaluation metrics like mean Average Precision (mAP) for object detection, and mean Average Recall (mAR) for segmentation tasks, making it suitable for comparing model performance.
  • LVIS uses exactly the same images as COCO dataset, but with different splits and different annotations.

Dataset Structure

The LVIS dataset is split into three subsets:

  1. Train: This subset contains 100k images for training object detection, segmentation, and captioning models.
  2. Val: This subset has 20k images used for validation purposes during model training.
  3. Minival: This subset is exactly the same as COCO val2017 set which has 5k images used for validation purposes during model training.
  4. Test: This subset consists of 20k images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the LVIS evaluation server for performance evaluation.

Applications

The LVIS dataset is widely used for training and evaluating deep learning models in object detection (such as YOLO, Faster R-CNN, and SSD), instance segmentation (such as Mask R-CNN). The dataset's diverse set of object categories, large number of annotated images, and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners.

Dataset YAML

A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the LVIS dataset, the lvis.yaml file is maintained at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/lvis.yaml.

!!! Example "ultralytics/cfg/datasets/lvis.yaml"

```py
--8<-- "ultralytics/cfg/datasets/lvis.yaml"
```

Usage

To train a YOLOv8n model on the LVIS dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="lvis.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=lvis.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

Sample Images and Annotations

The LVIS dataset contains a diverse set of images with various object categories and complex scenes. Here are some examples of images from the dataset, along with their corresponding annotations:

LVIS Dataset sample image

  • Mosaiced Image: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts.

The example showcases the variety and complexity of the images in the LVIS dataset and the benefits of using mosaicing during the training process.

Citations and Acknowledgments

If you use the LVIS dataset in your research or development work, please cite the following paper:

!!! Quote ""

=== "BibTeX"

    ```py
    @inproceedings{gupta2019lvis,
      title={{LVIS}: A Dataset for Large Vocabulary Instance Segmentation},
      author={Gupta, Agrim and Dollar, Piotr and Girshick, Ross},
      booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},
      year={2019}
    }
    ```

We would like to acknowledge the LVIS Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the LVIS dataset and its creators, visit the LVIS dataset website.

FAQ

What is the LVIS dataset, and how is it used in computer vision?

The LVIS dataset is a large-scale dataset with fine-grained vocabulary-level annotations developed by Facebook AI Research (FAIR). It is primarily used for object detection and instance segmentation, featuring over 1203 object categories and 2 million instance annotations. Researchers and practitioners use it to train and benchmark models like Ultralytics YOLO for advanced computer vision tasks. The dataset's extensive size and diversity make it an essential resource for pushing the boundaries of model performance in detection and segmentation.

How can I train a YOLOv8n model using the LVIS dataset?

To train a YOLOv8n model on the LVIS dataset for 100 epochs with an image size of 640, follow the example below. This process utilizes Ultralytics' framework, which offers comprehensive training features.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="lvis.yaml", epochs=100, imgsz=640)
    ```


=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=lvis.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

For detailed training configurations, refer to the Training documentation.

How does the LVIS dataset differ from the COCO dataset?

The images in the LVIS dataset are the same as those in the COCO dataset, but the two differ in terms of splitting and annotations. LVIS provides a larger and more detailed vocabulary with 1203 object categories compared to COCO's 80 categories. Additionally, LVIS focuses on annotation completeness and diversity, aiming to push the limits of object detection and instance segmentation models by offering more nuanced and comprehensive data.

Why should I use Ultralytics YOLO for training on the LVIS dataset?

Ultralytics YOLO models, including the latest YOLOv8, are optimized for real-time object detection with state-of-the-art accuracy and speed. They support a wide range of annotations, such as the fine-grained ones provided by the LVIS dataset, making them ideal for advanced computer vision applications. Moreover, Ultralytics offers seamless integration with various training, validation, and prediction modes, ensuring efficient model development and deployment.

Can I see some sample annotations from the LVIS dataset?

Yes, the LVIS dataset includes a variety of images with diverse object categories and complex scenes. Here is an example of a sample image along with its annotations:

LVIS Dataset sample image

This mosaiced image demonstrates a training batch composed of multiple dataset images combined into one. Mosaicing increases the variety of objects and scenes within each training batch, enhancing the model's ability to generalize across different contexts. For more details on the LVIS dataset, explore the LVIS dataset documentation.


comments: true
description: Explore the Objects365 Dataset with 2M images and 30M bounding boxes across 365 categories. Enhance your object detection models with diverse, high-quality data.
keywords: Objects365 dataset, object detection, machine learning, deep learning, computer vision, annotated images, bounding boxes, YOLOv8, high-resolution images, dataset configuration

Objects365 Dataset

The Objects365 dataset is a large-scale, high-quality dataset designed to foster object detection research with a focus on diverse objects in the wild. Created by a team of Megvii researchers, the dataset offers a wide range of high-resolution images with a comprehensive set of annotated bounding boxes covering 365 object categories.

Key Features

  • Objects365 contains 365 object categories, with 2 million images and over 30 million bounding boxes.
  • The dataset includes diverse objects in various scenarios, providing a rich and challenging benchmark for object detection tasks.
  • Annotations include bounding boxes for objects, making it suitable for training and evaluating object detection models.
  • Objects365 pre-trained models significantly outperform ImageNet pre-trained models, leading to better generalization on various tasks.

Dataset Structure

The Objects365 dataset is organized into a single set of images with corresponding annotations:

  • Images: The dataset includes 2 million high-resolution images, each containing a variety of objects across 365 categories.
  • Annotations: The images are annotated with over 30 million bounding boxes, providing comprehensive ground truth information for object detection tasks.

Applications

The Objects365 dataset is widely used for training and evaluating deep learning models in object detection tasks. The dataset's diverse set of object categories and high-quality annotations make it a valuable resource for researchers and practitioners in the field of computer vision.

Dataset YAML

A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. For the case of the Objects365 Dataset, the Objects365.yaml file is maintained at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Objects365.yaml.

!!! Example "ultralytics/cfg/datasets/Objects365.yaml"

```py
--8<-- "ultralytics/cfg/datasets/Objects365.yaml"
```

Usage

To train a YOLOv8n model on the Objects365 dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="Objects365.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=Objects365.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

Sample Data and Annotations

The Objects365 dataset contains a diverse set of high-resolution images with objects from 365 categories, providing rich context for object detection tasks. Here are some examples of the images in the dataset:

Dataset sample image

  • Objects365: This image demonstrates an example of object detection, where objects are annotated with bounding boxes. The dataset provides a wide range of images to facilitate the development of models for this task.

The example showcases the variety and complexity of the data in the Objects365 dataset and highlights the importance of accurate object detection for computer vision applications.

Citations and Acknowledgments

If you use the Objects365 dataset in your research or development work, please cite the following paper:

!!! Quote ""

=== "BibTeX"

    ```py
    @inproceedings{shao2019objects365,
      title={Objects365: A Large-scale, High-quality Dataset for Object Detection},
      author={Shao, Shuai and Li, Zeming and Zhang, Tianyuan and Peng, Chao and Yu, Gang and Li, Jing and Zhang, Xiangyu and Sun, Jian},
      booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
      pages={8425--8434},
      year={2019}
    }
    ```

We would like to acknowledge the team of researchers who created and maintain the Objects365 dataset as a valuable resource for the computer vision research community. For more information about the Objects365 dataset and its creators, visit the Objects365 dataset website.

FAQ

What is the Objects365 dataset used for?

The Objects365 dataset is designed for object detection tasks in machine learning and computer vision. It provides a large-scale, high-quality dataset with 2 million annotated images and 30 million bounding boxes across 365 categories. Leveraging such a diverse dataset helps improve the performance and generalization of object detection models, making it invaluable for research and development in the field.

How can I train a YOLOv8 model on the Objects365 dataset?

To train a YOLOv8n model using the Objects365 dataset for 100 epochs with an image size of 640, follow these instructions:

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="Objects365.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=Objects365.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

Refer to the Training page for a comprehensive list of available arguments.

Why should I use the Objects365 dataset for my object detection projects?

The Objects365 dataset offers several advantages for object detection tasks:

  1. Diversity: It includes 2 million images with objects in diverse scenarios, covering 365 categories.
  2. High-quality Annotations: Over 30 million bounding boxes provide comprehensive ground truth data.
  3. Performance: Models pre-trained on Objects365 significantly outperform those trained on datasets like ImageNet, leading to better generalization.

Where can I find the YAML configuration file for the Objects365 dataset?

The YAML configuration file for the Objects365 dataset is available at Objects365.yaml. This file contains essential information such as dataset paths and class labels, crucial for setting up your training environment.

How does the dataset structure of Objects365 enhance object detection modeling?

The Objects365 dataset is organized with 2 million high-resolution images and comprehensive annotations of over 30 million bounding boxes. This structure ensures a robust dataset for training deep learning models in object detection, offering a wide variety of objects and scenarios. Such diversity and volume help in developing models that are more accurate and capable of generalizing well to real-world applications. For more details on the dataset structure, refer to the Dataset YAML section.


comments: true
description: Explore the comprehensive Open Images V7 dataset by Google. Learn about its annotations, applications, and use YOLOv8 pretrained models for computer vision tasks.
keywords: Open Images V7, Google dataset, computer vision, YOLOv8 models, object detection, image segmentation, visual relationships, AI research, Ultralytics

Open Images V7 Dataset

Open Images V7 is a versatile and expansive dataset championed by Google. Aimed at propelling research in the realm of computer vision, it boasts a vast collection of images annotated with a plethora of data, including image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives.



Watch: Object Detection using OpenImagesV7 Pretrained Model

Open Images V7 Pretrained Models

Model size
(pixels)
mAPval
50-95
Speed
CPU ONNX
(ms)
Speed
A100 TensorRT
(ms)
params
(M)
FLOPs
(B)
YOLOv8n 640 18.4 142.4 1.21 3.5 10.5
YOLOv8s 640 27.7 183.1 1.40 11.4 29.7
YOLOv8m 640 33.6 408.5 2.26 26.2 80.6
YOLOv8l 640 34.9 596.9 2.43 44.1 167.4
YOLOv8x 640 36.3 860.6 3.56 68.7 260.6

Open Images V7 classes visual

Key Features

  • Encompasses ~9M images annotated in various ways to suit multiple computer vision tasks.
  • Houses a staggering 16M bounding boxes across 600 object classes in 1.9M images. These boxes are primarily hand-drawn by experts ensuring high precision.
  • Visual relationship annotations totaling 3.3M are available, detailing 1,466 unique relationship triplets, object properties, and human activities.
  • V5 introduced segmentation masks for 2.8M objects across 350 classes.
  • V6 introduced 675k localized narratives that amalgamate voice, text, and mouse traces highlighting described objects.
  • V7 introduced 66.4M point-level labels on 1.4M images, spanning 5,827 classes.
  • Encompasses 61.4M image-level labels across a diverse set of 20,638 classes.
  • Provides a unified platform for image classification, object detection, relationship detection, instance segmentation, and multimodal image descriptions.

Dataset Structure

Open Images V7 is structured in multiple components catering to varied computer vision challenges:

  • Images: About 9 million images, often showcasing intricate scenes with an average of 8.3 objects per image.
  • Bounding Boxes: Over 16 million boxes that demarcate objects across 600 categories.
  • Segmentation Masks: These detail the exact boundary of 2.8M objects across 350 classes.
  • Visual Relationships: 3.3M annotations indicating object relationships, properties, and actions.
  • Localized Narratives: 675k descriptions combining voice, text, and mouse traces.
  • Point-Level Labels: 66.4M labels across 1.4M images, suitable for zero/few-shot semantic segmentation.

Applications

Open Images V7 is a cornerstone for training and evaluating state-of-the-art models in various computer vision tasks. The dataset's broad scope and high-quality annotations make it indispensable for researchers and developers specializing in computer vision.

Dataset YAML

Typically, datasets come with a YAML (Yet Another Markup Language) file that delineates the dataset's configuration. For the case of Open Images V7, a hypothetical OpenImagesV7.yaml might exist. For accurate paths and configurations, one should refer to the dataset's official repository or documentation.

!!! Example "OpenImagesV7.yaml"

```py
--8<-- "ultralytics/cfg/datasets/open-images-v7.yaml"
```

Usage

To train a YOLOv8n model on the Open Images V7 dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

!!! Warning

The complete Open Images V7 dataset comprises 1,743,042 training images and 41,620 validation images, requiring approximately **561 GB of storage space** upon download.

Executing the commands provided below will trigger an automatic download of the full dataset if it's not already present locally. Before running the below example it's crucial to:

- Verify that your device has enough storage capacity.
- Ensure a robust and speedy internet connection.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a COCO-pretrained YOLOv8n model
    model = YOLO("yolov8n.pt")

    # Train the model on the Open Images V7 dataset
    results = model.train(data="open-images-v7.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Train a COCO-pretrained YOLOv8n model on the Open Images V7 dataset
    yolo detect train data=open-images-v7.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

Sample Data and Annotations

Illustrations of the dataset help provide insights into its richness:

Dataset sample image

  • Open Images V7: This image exemplifies the depth and detail of annotations available, including bounding boxes, relationships, and segmentation masks.

Researchers can gain invaluable insights into the array of computer vision challenges that the dataset addresses, from basic object detection to intricate relationship identification.

Citations and Acknowledgments

For those employing Open Images V7 in their work, it's prudent to cite the relevant papers and acknowledge the creators:

!!! Quote ""

=== "BibTeX"

    ```py
    @article{OpenImages,
      author = {Alina Kuznetsova and Hassan Rom and Neil Alldrin and Jasper Uijlings and Ivan Krasin and Jordi Pont-Tuset and Shahab Kamali and Stefan Popov and Matteo Malloci and Alexander Kolesnikov and Tom Duerig and Vittorio Ferrari},
      title = {The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale},
      year = {2020},
      journal = {IJCV}
    }
    ```

A heartfelt acknowledgment goes out to the Google AI team for creating and maintaining the Open Images V7 dataset. For a deep dive into the dataset and its offerings, navigate to the official Open Images V7 website.

FAQ

What is the Open Images V7 dataset?

Open Images V7 is an extensive and versatile dataset created by Google, designed to advance research in computer vision. It includes image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives, making it ideal for various computer vision tasks such as object detection, segmentation, and relationship detection.

How do I train a YOLOv8 model on the Open Images V7 dataset?

To train a YOLOv8 model on the Open Images V7 dataset, you can use both Python and CLI commands. Here's an example of training the YOLOv8n model for 100 epochs with an image size of 640:

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a COCO-pretrained YOLOv8n model
    model = YOLO("yolov8n.pt")

    # Train the model on the Open Images V7 dataset
    results = model.train(data="open-images-v7.yaml", epochs=100, imgsz=640)
    ```


=== "CLI"

    ```py
    # Train a COCO-pretrained YOLOv8n model on the Open Images V7 dataset
    yolo detect train data=open-images-v7.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

For more details on arguments and settings, refer to the Training page.

What are some key features of the Open Images V7 dataset?

The Open Images V7 dataset includes approximately 9 million images with various annotations:

  • Bounding Boxes: 16 million bounding boxes across 600 object classes.
  • Segmentation Masks: Masks for 2.8 million objects across 350 classes.
  • Visual Relationships: 3.3 million annotations indicating relationships, properties, and actions.
  • Localized Narratives: 675,000 descriptions combining voice, text, and mouse traces.
  • Point-Level Labels: 66.4 million labels across 1.4 million images.
  • Image-Level Labels: 61.4 million labels across 20,638 classes.

What pretrained models are available for the Open Images V7 dataset?

Ultralytics provides several YOLOv8 pretrained models for the Open Images V7 dataset, each with different sizes and performance metrics:

Model size
(pixels)
mAPval
50-95
Speed
CPU ONNX
(ms)
Speed
A100 TensorRT
(ms)
params
(M)
FLOPs
(B)
YOLOv8n 640 18.4 142.4 1.21 3.5 10.5
YOLOv8s 640 27.7 183.1 1.40 11.4 29.7
YOLOv8m 640 33.6 408.5 2.26 26.2 80.6
YOLOv8l 640 34.9 596.9 2.43 44.1 167.4
YOLOv8x 640 36.3 860.6 3.56 68.7 260.6

What applications can the Open Images V7 dataset be used for?

The Open Images V7 dataset supports a variety of computer vision tasks including:

  • Image Classification
  • Object Detection
  • Instance Segmentation
  • Visual Relationship Detection
  • Multimodal Image Descriptions

Its comprehensive annotations and broad scope make it suitable for training and evaluating advanced machine learning models, as highlighted in practical use cases detailed in our applications section.


comments: true
description: Explore the Roboflow 100 dataset featuring 100 diverse datasets designed to test object detection models across various domains, from healthcare to video games.
keywords: Roboflow 100, Ultralytics, object detection, dataset, benchmarking, machine learning, computer vision, diverse datasets, model evaluation

Roboflow 100 Dataset

Roboflow 100, developed by Roboflow and sponsored by Intel, is a groundbreaking object detection benchmark. It includes 100 diverse datasets sampled from over 90,000 public datasets. This benchmark is designed to test the adaptability of models to various domains, including healthcare, aerial imagery, and video games.

Roboflow 100 Overview

Key Features

  • Includes 100 datasets across seven domains: Aerial, Video games, Microscopic, Underwater, Documents, Electromagnetic, and Real World.
  • The benchmark comprises 224,714 images across 805 classes, thanks to over 11,170 hours of labeling efforts.
  • All images are resized to 640x640 pixels, with a focus on eliminating class ambiguity and filtering out underrepresented classes.
  • Annotations include bounding boxes for objects, making it suitable for training and evaluating object detection models.

Dataset Structure

The Roboflow 100 dataset is organized into seven categories, each with a distinct set of datasets, images, and classes:

  • Aerial: Consists of 7 datasets with a total of 9,683 images, covering 24 distinct classes.
  • Video Games: Includes 7 datasets, featuring 11,579 images across 88 classes.
  • Microscopic: Comprises 11 datasets with 13,378 images, spanning 28 classes.
  • Underwater: Contains 5 datasets, encompassing 18,003 images in 39 classes.
  • Documents: Consists of 8 datasets with 24,813 images, divided into 90 classes.
  • Electromagnetic: Made up of 12 datasets, totaling 36,381 images in 41 classes.
  • Real World: The largest category with 50 datasets, offering 110,615 images across 495 classes.

This structure enables a diverse and extensive testing ground for object detection models, reflecting real-world application scenarios.

Benchmarking

Dataset benchmarking evaluates machine learning model performance on specific datasets using standardized metrics like accuracy, mean average precision and F1-score.

!!! Tip "Benchmarking"

Benchmarking results will be stored in "ultralytics-benchmarks/evaluation.txt"

!!! Example "Benchmarking example"

=== "Python"

    ```py
    import os
    import shutil
    from pathlib import Path

    from ultralytics.utils.benchmarks import RF100Benchmark

    # Initialize RF100Benchmark and set API key
    benchmark = RF100Benchmark()
    benchmark.set_key(api_key="YOUR_ROBOFLOW_API_KEY")

    # Parse dataset and define file paths
    names, cfg_yamls = benchmark.parse_dataset()
    val_log_file = Path("ultralytics-benchmarks") / "validation.txt"
    eval_log_file = Path("ultralytics-benchmarks") / "evaluation.txt"

    # Run benchmarks on each dataset in RF100
    for ind, path in enumerate(cfg_yamls):
        path = Path(path)
        if path.exists():
            # Fix YAML file and run training
            benchmark.fix_yaml(str(path))
            os.system(f"yolo detect train data={path} model=yolov8s.pt epochs=1 batch=16")

            # Run validation and evaluate
            os.system(f"yolo detect val data={path} model=runs/detect/train/weights/best.pt > {val_log_file} 2>&1")
            benchmark.evaluate(str(path), str(val_log_file), str(eval_log_file), ind)

            # Remove the 'runs' directory
            runs_dir = Path.cwd() / "runs"
            shutil.rmtree(runs_dir)
        else:
            print("YAML file path does not exist")
            continue

    print("RF100 Benchmarking completed!")
    ```

Applications

Roboflow 100 is invaluable for various applications related to computer vision and deep learning. Researchers and engineers can use this benchmark to:

  • Evaluate the performance of object detection models in a multi-domain context.
  • Test the adaptability of models to real-world scenarios beyond common object recognition.
  • Benchmark the capabilities of object detection models across diverse datasets, including those in healthcare, aerial imagery, and video games.

For more ideas and inspiration on real-world applications, be sure to check out our guides on real-world projects.

Usage

The Roboflow 100 dataset is available on both GitHub and Roboflow Universe.

You can access it directly from the Roboflow 100 GitHub repository. In addition, on Roboflow Universe, you have the flexibility to download individual datasets by simply clicking the export button within each dataset.

Sample Data and Annotations

Roboflow 100 consists of datasets with diverse images and videos captured from various angles and domains. Here's a look at examples of annotated images in the RF100 benchmark.

Sample Data and Annotations

The diversity in the Roboflow 100 benchmark that can be seen above is a significant advancement from traditional benchmarks which often focus on optimizing a single metric within a limited domain.

Citations and Acknowledgments

If you use the Roboflow 100 dataset in your research or development work, please cite the following paper:

!!! Quote ""

=== "BibTeX"

    ```py
    @misc{2211.13523,
        Author = {Floriana Ciaglia and Francesco Saverio Zuppichini and Paul Guerrie and Mark McQuade and Jacob Solawetz},
        Title = {Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},
        Eprint = {arXiv:2211.13523},
    }
    ```

Our thanks go to the Roboflow team and all the contributors for their hard work in creating and sustaining the Roboflow 100 dataset.

If you are interested in exploring more datasets to enhance your object detection and machine learning projects, feel free to visit our comprehensive dataset collection.

FAQ

What is the Roboflow 100 dataset, and why is it significant for object detection?

The Roboflow 100 dataset, developed by Roboflow and sponsored by Intel, is a crucial object detection benchmark. It features 100 diverse datasets from over 90,000 public datasets, covering domains such as healthcare, aerial imagery, and video games. This diversity ensures that models can adapt to various real-world scenarios, enhancing their robustness and performance.

How can I use the Roboflow 100 dataset for benchmarking my object detection models?

To use the Roboflow 100 dataset for benchmarking, you can implement the RF100Benchmark class from the Ultralytics library. Here's a brief example:

!!! Example "Benchmarking example"

=== "Python"

    ```py
    import os
    import shutil
    from pathlib import Path

    from ultralytics.utils.benchmarks import RF100Benchmark

    # Initialize RF100Benchmark and set API key
    benchmark = RF100Benchmark()
    benchmark.set_key(api_key="YOUR_ROBOFLOW_API_KEY")

    # Parse dataset and define file paths
    names, cfg_yamls = benchmark.parse_dataset()
    val_log_file = Path("ultralytics-benchmarks") / "validation.txt"
    eval_log_file = Path("ultralytics-benchmarks") / "evaluation.txt"

    # Run benchmarks on each dataset in RF100
    for ind, path in enumerate(cfg_yamls):
        path = Path(path)
        if path.exists():
            # Fix YAML file and run training
            benchmark.fix_yaml(str(path))
            os.system(f"yolo detect train data={path} model=yolov8s.pt epochs=1 batch=16")

            # Run validation and evaluate
            os.system(f"yolo detect val data={path} model=runs/detect/train/weights/best.pt > {val_log_file} 2>&1")
            benchmark.evaluate(str(path), str(val_log_file), str(eval_log_file), ind)

            # Remove 'runs' directory
            runs_dir = Path.cwd() / "runs"
            shutil.rmtree(runs_dir)
        else:
            print("YAML file path does not exist")
            continue

    print("RF100 Benchmarking completed!")
    ```

Which domains are covered by the Roboflow 100 dataset?

The Roboflow 100 dataset spans seven domains, each providing unique challenges and applications for object detection models:

  1. Aerial: 7 datasets, 9,683 images, 24 classes
  2. Video Games: 7 datasets, 11,579 images, 88 classes
  3. Microscopic: 11 datasets, 13,378 images, 28 classes
  4. Underwater: 5 datasets, 18,003 images, 39 classes
  5. Documents: 8 datasets, 24,813 images, 90 classes
  6. Electromagnetic: 12 datasets, 36,381 images, 41 classes
  7. Real World: 50 datasets, 110,615 images, 495 classes

This setup allows for extensive and varied testing of models across different real-world applications.

How do I access and download the Roboflow 100 dataset?

The Roboflow 100 dataset is accessible on GitHub and Roboflow Universe. You can download the entire dataset from GitHub or select individual datasets on Roboflow Universe using the export button.

What should I include when citing the Roboflow 100 dataset in my research?

When using the Roboflow 100 dataset in your research, ensure to properly cite it. Here is the recommended citation:

!!! Quote

=== "BibTeX"

    ```py
    @misc{2211.13523,
        Author = {Floriana Ciaglia and Francesco Saverio Zuppichini and Paul Guerrie and Mark McQuade and Jacob Solawetz},
        Title = {Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},
        Eprint = {arXiv:2211.13523},
    }
    ```

For more details, you can refer to our comprehensive dataset collection.


comments: true
description: Discover the Signature Detection Dataset for training models to identify and verify human signatures in various documents. Perfect for document verification and fraud prevention.
keywords: Signature Detection Dataset, document verification, fraud detection, computer vision, YOLOv8, Ultralytics, annotated signatures, training dataset

Signature Detection Dataset

This dataset focuses on detecting human written signatures within documents. It includes a variety of document types with annotated signatures, providing valuable insights for applications in document verification and fraud detection. Essential for training computer vision algorithms, this dataset aids in identifying signatures in various document formats, supporting research and practical applications in document analysis.

Dataset Structure

The signature detection dataset is split into three subsets:

  • Training set: Contains 143 images, each with corresponding annotations.
  • Validation set: Includes 35 images, each with paired annotations.

Applications

This dataset can be applied in various computer vision tasks such as object detection, object tracking, and document analysis. Specifically, it can be used to train and evaluate models for identifying signatures in documents, which can have applications in document verification, fraud detection, and archival research. Additionally, it can serve as a valuable resource for educational purposes, enabling students and researchers to study and understand the characteristics and behaviors of signatures in different document types.

Dataset YAML

A YAML (Yet Another Markup Language) file defines the dataset configuration, including paths and classes information. For the signature detection dataset, the signature.yaml file is located at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/signature.yaml.

!!! Example "ultralytics/cfg/datasets/signature.yaml"

```py
--8<-- "ultralytics/cfg/datasets/signature.yaml"
```

Usage

To train a YOLOv8n model on the signature detection dataset for 100 epochs with an image size of 640, use the provided code samples. For a comprehensive list of available parameters, refer to the model's Training page.

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

    # Train the model
    results = model.train(data="signature.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    # Start training from a pretrained *.pt model
    yolo detect train data=signature.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

!!! Example "Inference Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a model
    model = YOLO("path/to/best.pt")  # load a signature-detection fine-tuned model

    # Inference using the model
    results = model.predict("https://ultralytics.com/assets/signature-s.mp4", conf=0.75)
    ```

=== "CLI"

    ```py
    # Start prediction with a finetuned *.pt model
    yolo detect predict model='path/to/best.pt' imgsz=640 source="https://ultralytics.com/assets/signature-s.mp4" conf=0.75
    ```

Sample Images and Annotations

The signature detection dataset comprises a wide variety of images showcasing different document types and annotated signatures. Below are examples of images from the dataset, each accompanied by its corresponding annotations.

Signature detection dataset sample image

  • Mosaiced Image: Here, we present a training batch consisting of mosaiced dataset images. Mosaicing, a training technique, combines multiple images into one, enriching batch diversity. This method helps enhance the model's ability to generalize across different signature sizes, aspect ratios, and contexts.

This example illustrates the variety and complexity of images in the signature Detection Dataset, emphasizing the benefits of including mosaicing during the training process.

Citations and Acknowledgments

The dataset has been released available under the AGPL-3.0 License.

FAQ

What is the Signature Detection Dataset, and how can it be used?

The Signature Detection Dataset is a collection of annotated images aimed at detecting human signatures within various document types. It can be applied in computer vision tasks such as object detection and tracking, primarily for document verification, fraud detection, and archival research. This dataset helps train models to recognize signatures in different contexts, making it valuable for both research and practical applications.

How do I train a YOLOv8n model on the Signature Detection Dataset?

To train a YOLOv8n model on the Signature Detection Dataset, follow these steps:

  1. Download the signature.yaml dataset configuration file from signature.yaml.
  2. Use the following Python script or CLI command to start training:

!!! Example "Train Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load a pretrained model
    model = YOLO("yolov8n.pt")

    # Train the model
    results = model.train(data="signature.yaml", epochs=100, imgsz=640)
    ```

=== "CLI"

    ```py
    yolo detect train data=signature.yaml model=yolov8n.pt epochs=100 imgsz=640
    ```

For more details, refer to the Training page.

What are the main applications of the Signature Detection Dataset?

The Signature Detection Dataset can be used for:

  1. Document Verification: Automatically verifying the presence and authenticity of human signatures in documents.
  2. Fraud Detection: Identifying forged or fraudulent signatures in legal and financial documents.
  3. Archival Research: Assisting historians and archivists in the digital analysis and cataloging of historical documents.
  4. Education: Supporting academic research and teaching in the fields of computer vision and machine learning.

How can I perform inference using a model trained on the Signature Detection Dataset?

To perform inference using a model trained on the Signature Detection Dataset, follow these steps:

  1. Load your fine-tuned model.
  2. Use the below Python script or CLI command to perform inference:

!!! Example "Inference Example"

=== "Python"

    ```py
    from ultralytics import YOLO

    # Load the fine-tuned model
    model = YOLO("path/to/best.pt")

    # Perform inference
    results = model.predict("https://ultralytics.com/assets/signature-s.mp4", conf=0.75)
    ```

=== "CLI"

    ```py
    yolo detect predict model='path/to/best.pt' imgsz=640 source="https://ultralytics.com/assets/signature-s.mp4" conf=0.75
    ```

What is the structure of the Signature Detection Dataset, and where can I find more information?

The Signature Detection Dataset is divided into two subsets:

  • Training Set: Contains 143 images with annotations.
  • Validation Set: Includes 35 images with annotations.

For detailed information, you can refer to the Dataset Structure section. Additionally, view the complete dataset configuration in the signature.yaml file located at signature.yaml.

posted @ 2024-09-05 11:58  绝不原创的飞龙  阅读(0)  评论(0编辑  收藏  举报