BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models

Introduction

This repo contains a collections of pluggable state-of-the-art multi-object trackers for segmentation, object detection and pose estimation models. For the methods using appearance description, both heavy (CLIPReID) and lightweight state-of-the-art ReID models (LightMBN, OSNet and more) are available for automatic download. I have tested these models on one video people.mp4 to show how to use this package together with popular object detection models such as: Yolov8, Yolo-NAS and YOLOX. Furthermore, i downloaded MOT17 dataset for benchmarking tracking results.

Tracker	HOTA↑	MOTA↑	IDF1↑
BoTSORT	77.8	78.9	88.9
DeepOCSORT	77.4	78.4	89.0
OCSORT	77.4	78.4	89.0
HybridSORT	77.3	77.9	88.8
ByteTrack	75.6	74.6	86.0
StrongSORT

_{NOTES: performed on the 10 first frames of each MOT17 sequence. The detector used is ByteTrack's YoloXm, trained on: CrowdHuman, MOT17, Cityperson and ETHZ. Each tracker is configured with its original parameters found in their respective official repository.}

Tutorials

Experiments

In inverse chronological order:

News

HybridSORT available (August 2023)
SOTA CLIP-ReID people and vehicle models available (August 2023)

Why BOXMOT?

Today's multi-object tracking options are heavily dependant on the computation capabilities of the underlaying hardware. BOXMOT provides a great variety of setup options that meet different hardware limitations: CPU only, low memory GPUs... Everything is designed with simplicity and flexibility in mind. If tracking results ARE NOT GOOD on custom dataset with the out-of-the-box tracker configurations, use the examples/evolve.py script for tracker hyperparameter tuning.

Installation

Start with Python>=3.8 environment.

If you want to run the YOLOv8, YOLO-NAS or YOLOX examples:

git clone https://github.com/mikel-brostrom/yolo_tracking.git
cd yolo_tracking
pip install -v -e .

but if you only want to import the tracking modules you can simply:

pip install boxmot

YOLOv8 | YOLO-NAS | YOLOX examples

Tracking

Yolo models

# Tracking

usage: track.py [-h] [--yolo-model YOLO_MODEL] [--reid-model REID_MODEL] [--tracking-method TRACKING_METHOD] [--source SOURCE]
                [--imgsz IMGSZ [IMGSZ ...]] [--conf CONF] [--iou IOU] [--device DEVICE] [--show] [--save]
                [--classes CLASSES [CLASSES ...]] [--project PROJECT] [--name NAME] [--exist-ok] [--half] [--vid-stride VID_STRIDE]
                [--show-labels] [--show-conf] [--save-txt] [--save-id-crops] [--save-mot] [--line-width LINE_WIDTH] [--per-class]
                [--verbose] [--vid_stride VID_STRIDE]

For the time being, we can use above command in this manner:

$ python track.py --yolo-model yolov8n   --source people.mp4    # bboxes only
  python track.py --yolo-model yolo_nas_s    --source people.mp4  # bboxes only
  python track.py --yolo-model yolox_n      --source people.mp4 # bboxes only
                                        yolov8n-seg  --source people.mp4  # bboxes + segmentation masks
                                        yolov8n-pose  --source path/to/your/video/file.mp4 # bboxes + pose estimation

Results : with different detectors , for object detection as well as segmentation models following results of tracking are achieved

Detector	Speed	inference	postprocess	tracking per image at shape (1, 3, 384, 640)
yolov8n	1.1ms	6.1ms	0.9ms	46.1ms
yolo_nas_s	1.4ms	47.8ms	0.2ms	32.7ms
yolox_n	1.8ms	10.5ms	10.5ms	127.8ms
yolov8n-seg	1.0ms	7.0ms	1.3ms	44.6ms
yolov8n-pose	1.0ms	6.8ms	0.9ms	22.9ms

Tracking methods

$ python track.py --tracking-method deepocsort --source people.mp4 
                                             strongsort
                                             ocsort
                                             bytetrack
                                             botsort

Results : with different trackers , following results of tracking are achieved

Tracker	Speed	inference	postprocess	tracking per image at shape (1, 3, 384, 640)
deepocsort	1.0ms	5.9ms	0.8ms	45.8ms
strongsort	1.3ms	12.4ms	1.5ms	123.4ms
ocsort	1.0ms	5.8ms	0.8ms	4.5ms
bytetrack	1.0ms	5.7ms	0.8ms	5.2ms
botsort	1.0ms	5.9ms	0.8ms	44.3ms

Tracking sources

Tracking can be run on most video formats

$ python examples/track.py --source 0                               # webcam
                                    img.jpg                         # image
                                    vid.mp4                         # video
                                    path/                           # directory
                                    path/*.jpg                      # glob
                                    'https://youtu.be/Zgi9g1ksQHc'  # YouTube
                                    'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream

Select ReID model

Some tracking methods combine appearance description and motion in the process of tracking. For those which use appearance, you can choose a ReID model based on your needs from this ReID model zoo. These model can be further optimized for you needs by the reid_export.py script

$ python examples/track.py --source 0 --reid-model lmbn_n_cuhk03_d.pt               # lightweight
                                                   osnet_x0_25_market1501.pt
                                                   mobilenetv2_x1_4_msmt17.engine
                                                   resnet50_msmt17.onnx
                                                   osnet_x1_0_msmt17.pt
                                                   clip_market1501.pt               # heavy
                                                   clip_vehicleid.pt
                                                   ...

Filter tracked classes

By default the tracker tracks all MS COCO classes.

If you want to track a subset of the classes that you model predicts, add their corresponding index after the classes flag,

python track.py --source anyvideoofyourchoice.mp4 --yolo-model yolov8s.pt --classes 16 17  # COCO yolov8 model. Track cats and dogs, only
python track.py --source people.mp4 --yolo-model yolov8s.pt --classes 16 17  # COCO yolov8 model. Track people only

Results : Speed: 1.0ms preprocess, 6.2ms inference, 0.9ms postprocess, 42.8ms tracking per image at shape (1, 3, 384, 640)

Here is a list of all the possible objects that a Yolov8 model trained on MS COCO can detect. Notice that the indexing for the classes in this repo starts at zero

MOT compliant results

Can be saved to your experiment folder runs/track/exp*/ by

python track.py --source people.mp4 --yolo-model yolov8s.pt --save-mot
python track.py --source people.mp4 --yolo-model yolov8s.pt --save-mot --show  --save

Results for command 1 : Speed: 1.0ms preprocess, 6.2ms inference, 0.8ms postprocess, 43.1ms tracking per image at shape (1, 3, 384, 640)

MOT results saved to /home/caic/Downloads/yolo_tracking-master/runs/track/exp/mot/people.mp4.txt

Results for command 2 : Speed: 1.0ms preprocess, 6.2ms inference, 0.8ms postprocess, 43.4ms tracking per image at shape (1, 3, 384, 640)

Results saved to /home/caic/Downloads/yolo_tracking-master/runs/track/exp2 MOT results saved to /home/caic/Downloads/yolo_tracking-master/runs/track/exp2/mot/people.mp4.txt

Evaluation

Evaluate a combination of detector, tracking method and ReID model on standard MOT dataset or you custom one by. Person re-identification (ReID) is an intelligent video surveillance technology that retrieves the same person from different cameras. This task is extremely challenging due to changes in person poses, different camera views, and occlusion

$ python3 examples/val.py --yolo-model yolo_nas_s.pt --reid-model osnetx1_0_dukemtcereid.pt --tracking-method deepocsort --benchmark MOT16
                          --yolo-model yolox_n.pt    --reid-model osnet_ain_x1_0_msmt17.pt  --tracking-method ocsort     --benchmark MOT17
                          --yolo-model yolov8s.pt    --reid-model lmbn_n_market.pt          --tracking-method strongsort --benchmark <your-custom-dataset>

Evolution

We use a fast and elitist multiobjective genetic algorithm for tracker hyperparameter tuning. By default the objectives are: HOTA, MOTA, IDF1. Run it by

$ python examples/evolve.py --tracking-method strongsort --benchmark MOT17 --n-trials 100  # tune strongsort for MOT17
                            --tracking-method ocsort     --benchmark <your-custom-dataset> --objective HOTA # tune ocsort for maximizing HOTA on your custom tracking dataset

The set of hyperparameters leading to the best HOTA result are written to the tracker's config file.

Custom object detection model tracking example

Minimalistic

import cv2
import numpy as np
from pathlib import Path

from boxmot import DeepOCSORT


tracker = DeepOCSORT(
    model_weights=Path('osnet_x0_25_msmt17.pt'), # which ReID model to use
    device='cuda:0',
    fp16=False,
)

vid = cv2.VideoCapture(0)

while True:
    ret, im = vid.read()

    # substitute by your object detector, output has to be N X (x, y, x, y, conf, cls)
    dets = np.array([[144, 212, 578, 480, 0.82, 0],
                    [425, 281, 576, 472, 0.56, 65]])

    tracks = tracker.update(dets, im) # --> (x, y, x, y, id, conf, cls, ind)

Complete

import cv2
import numpy as np
from pathlib import Path

from boxmot import DeepOCSORT


tracker = DeepOCSORT(
    model_weights=Path('osnet_x0_25_msmt17.pt'), # which ReID model to use
    device='cuda:0',
    fp16=True,
)

vid = cv2.VideoCapture(0)
color = (0, 0, 255)  # BGR
thickness = 2
fontscale = 0.5

while True:
    ret, im = vid.read()

    # substitute by your object detector, input to tracker has to be N X (x, y, x, y, conf, cls)
    dets = np.array([[144, 212, 578, 480, 0.82, 0],
                    [425, 281, 576, 472, 0.56, 65]])

    tracks = tracker.update(dets, im) # --> (x, y, x, y, id, conf, cls, ind)

    xyxys = tracks[:, 0:4].astype('int') # float64 to int
    ids = tracks[:, 4].astype('int') # float64 to int
    confs = tracks[:, 5]
    clss = tracks[:, 6].astype('int') # float64 to int
    inds = tracks[:, 7].astype('int') # float64 to int

    # in case you have segmentations or poses alongside with your detections you can use
    # the ind variable in order to identify which track is associated to each seg or pose by:
    # segs = segs[inds]
    # poses = poses[inds]
    # you can then zip them together: zip(tracks, poses)

    # print bboxes with their associated id, cls and conf
    if tracks.shape[0] != 0:
        for xyxy, id, conf, cls in zip(xyxys, ids, confs, clss):
            im = cv2.rectangle(
                im,
                (xyxy[0], xyxy[1]),
                (xyxy[2], xyxy[3]),
                color,
                thickness
            )
            cv2.putText(
                im,
                f'id: {id}, conf: {conf}, c: {cls}',
                (xyxy[0], xyxy[1]-10),
                cv2.FONT_HERSHEY_SIMPLEX,
                fontscale,
                color,
                thickness
            )

    # show image with bboxes, ids, classes and confidences
    cv2.imshow('frame', im)

    # break on pressing q
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

vid.release()
cv2.destroyAllWindows()

Tiled inference

from sahi import AutoDetectionModel
from sahi.predict import get_sliced_prediction
import cv2
import numpy as np
from pathlib import Path
from boxmot import DeepOCSORT


tracker = DeepOCSORT(
    model_weights=Path('osnet_x0_25_msmt17.pt'), # which ReID model to use
    device='cpu',
    fp16=False,
)

detection_model = AutoDetectionModel.from_pretrained(
    model_type='yolov8',
    model_path='yolov8n.pt',
    confidence_threshold=0.5,
    device="cpu",  # or 'cuda:0'
)

vid = cv2.VideoCapture(0)
color = (0, 0, 255)  # BGR
thickness = 2
fontscale = 0.5

while True:
    ret, im = vid.read()

    # get sliced predictions
    result = get_sliced_prediction(
        im,
        detection_model,
        slice_height=256,
        slice_width=256,
        overlap_height_ratio=0.2,
        overlap_width_ratio=0.2
    )
    num_predictions = len(result.object_prediction_list)
    dets = np.zeros([num_predictions, 6], dtype=np.float32)
    for ind, object_prediction in enumerate(result.object_prediction_list):
        dets[ind, :4] = np.array(object_prediction.bbox.to_xyxy(), dtype=np.float32)
        dets[ind, 4] = object_prediction.score.value
        dets[ind, 5] = object_prediction.category.id

    tracks = tracker.update(dets, im) # --> (x, y, x, y, id, conf, cls, ind)

    if tracks.shape[0] != 0:

        xyxys = tracks[:, 0:4].astype('int') # float64 to int
        ids = tracks[:, 4].astype('int') # float64 to int
        confs = tracks[:, 5].round(decimals=2)
        clss = tracks[:, 6].astype('int') # float64 to int
        inds = tracks[:, 7].astype('int') # float64 to int

        # print bboxes with their associated id, cls and conf
        for xyxy, id, conf, cls in zip(xyxys, ids, confs, clss):
            im = cv2.rectangle(
                im,
                (xyxy[0], xyxy[1]),
                (xyxy[2], xyxy[3]),
                color,
                thickness
            )
            cv2.putText(
                im,
                f'id: {id}, conf: {conf}, c: {cls}',
                (xyxy[0], xyxy[1]-10),
                cv2.FONT_HERSHEY_SIMPLEX,
                fontscale,
                color,
                thickness
            )

    # show image with bboxes, ids, classes and confidences
    cv2.imshow('frame', im)

    # break on pressing q
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

vid.release()
cv2.destroyAllWindows()

Contributors

Contact

For Yolo tracking bugs and feature requests please visit GitHub Issues. For business inquiries or professional support requests please send an email to: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 1,872 Commits
.github		.github
assets		assets
boxmot		boxmot
examples		examples
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
people.mp4		people.mp4
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models

Introduction

News

Why BOXMOT?

Installation

YOLOv8 | YOLO-NAS | YOLOX examples

Results : Speed: 1.0ms preprocess, 6.2ms inference, 0.9ms postprocess, 42.8ms tracking per image at shape (1, 3, 384, 640)

Results for command 1 : Speed: 1.0ms preprocess, 6.2ms inference, 0.8ms postprocess, 43.1ms tracking per image at shape (1, 3, 384, 640)

Results for command 2 : Speed: 1.0ms preprocess, 6.2ms inference, 0.8ms postprocess, 43.4ms tracking per image at shape (1, 3, 384, 640)

Custom object detection model tracking example

Contributors

Contact

About

Releases

Packages

Languages

License

Faryalaurooj/yolo_tracking

Folders and files

Latest commit

History

Repository files navigation

BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models

Introduction

News

Why BOXMOT?

Installation

YOLOv8 | YOLO-NAS | YOLOX examples

Results : Speed: 1.0ms preprocess, 6.2ms inference, 0.9ms postprocess, 42.8ms tracking per image at shape (1, 3, 384, 640)

Results for command 1 : Speed: 1.0ms preprocess, 6.2ms inference, 0.8ms postprocess, 43.1ms tracking per image at shape (1, 3, 384, 640)

Results for command 2 : Speed: 1.0ms preprocess, 6.2ms inference, 0.8ms postprocess, 43.4ms tracking per image at shape (1, 3, 384, 640)

Custom object detection model tracking example

Contributors

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages