This repository contains two Python scripts for managing the export and training of YOLO datasets using data from labelstudio.nina.no:
exportlabelstudio.py
: Automates the export of labeled data from LabelStudio, downloads associated images, and extracts YOLO-formatted data for use in training.trainlabelstudio.py
: Prepares the YOLO dataset for training by splitting it into training and validation sets, creating a configuration file, and initiating training using the YOLOv8 framework.
The files are made for projects containing images and bounding boxes, but can also be used to export video files by replacing .jpg
with .mp4
in the exportlabelstudio.py
. Bounding boxes from videos are not included in the YOLO download, but one can export .json
manually from the website.
Ensure the following Python libraries are installed:
requests
sklearn
pyyaml
ultralytics
zipfile
Install them using pip:
pip install requests scikit-learn pyyaml ultralytics
Both scripts use an API token for LabelStudio authentication. Ensure you have done the following:
- LabelStudio Authentication: Create a file
authenticate.py
in the root of this repo. The content of the file should beAPI_TOKEN = "[your-api-key]"
. - Project ID: Replace the
PROJECT_ID
placeholder with your LabelStudio project ID.
Create a directory structure for storing the exported and processed data:
mkdir -p exported_data/yolo/images/train exported_data/yolo/images/val
mkdir -p exported_data/yolo/labels/train exported_data/yolo/labels/val
- Export labeled data from LabelStudio in YOLO format.
- Download images corresponding to the annotations.
- Unzip and extract the YOLO data.
-
Configure the script with:
BASE_URL
: The base URL of your LabelStudio instance.PROJECT_ID
: The ID of the LabelStudio project to export.EXPORT_FORMAT
: Export format (default isYOLO
).EXPORT_PATH
: Directory to save the exported data.
-
Run the script:
python exportlabelstudio.py
-
Results:
- Labeled data is saved in
exported_data/yolo
. - Associated images are downloaded and saved in
exported_data/yolo/images
.
- Labeled data is saved in
- Split the YOLO dataset into training and validation sets.
- Generate a
config.yaml
file for YOLO training. - Train a YOLOv8 model using the prepared dataset.
- Ensure the exported data from
exportlabelstudio.py
is inexported_data/yolo
. - Run the script:
python trainlabelstudio.py
- Results:
- Data is split into
train
andval
folders underexported_data/yolo/images
andexported_data/yolo/labels
. - A
config.yaml
file is created for YOLO training. - Training starts using YOLOv8 with the defined parameters.
- Data is split into
- Model:
yolov8n.pt
(a pretrained YOLOv8 model). - Training parameters:
- Epochs: 100
- Image size: 640
- Batch size: 1 (adjust as needed).
- API Rate Limiting: The
exportlabelstudio.py
script includes a delay (time.sleep(2)
) between image download requests to avoid overloading the server. - Customization: Modify
EXPORT_FORMAT
,PROJECT_ID
, and other configuration variables in the scripts to suit your project. - Error Handling: If the export fails, the script prints the error response from the LabelStudio server for debugging.
After running both scripts, the folder structure will look like this:
exported_data/
└── yolo/
├── images/
│ ├── train/
│ └── val/
├── labels/
│ ├── train/
│ └── val/
├── config.yaml
├── classes.txt
- Export Fails: Check the
API_TOKEN
,PROJECT_ID
, and server URL (BASE_URL
). - Training Issues: Ensure
ultralytics
is installed and YOLOv8 is correctly configured.
For further assistance, refer to LabelStudio's API Documentation and Ultralytics YOLO.