feat: skip dataset re-download and ensure safe dataset syncing #220

JSabadin · 2025-01-06T10:36:48Z

Skip Redownload Dataset and Fix Potential Race Conditions

A new parameter, update_mode, has been introduced in the LuxonisLoader. This parameter allows more control over when the dataset from the cloud is already downloaded locally.

If set to UpdateMode.ALWAYS, the loader will always re-download the dataset locally.
If set to UpdateMode.IF_EMPTY, it will only download the dataset if it does not already exist locally.

Example Usage:

from luxonis_loader import LuxonisLoader, UpdateMode

loader = LuxonisLoader(
    dataset,
    update_mode=UpdateMode.IF_EMPTY
)

This ensures that the loader uses the local dataset if one is already present, helping to avoid unnecessary downloads.

FileLock Added for Safe Syncing

To address potential race conditions, a FileLock mechanism has been implemented in the following areas:

sync_from_cloud Method: This prevents multiple processes from attempting to sync the dataset simultaneously in a distributed environment (e.g., DDP on GCP).
_get_metadata in Dataset Initialization: Ensures safe concurrent access when multiple processes initialize the dataset and loader before setting up the DDP environment.

Additionally, the _load_df_offline method has been updated to:

Only access local storage.
Avoid attempting to download dataset annotations from the cloud, as sync_from_cloud already handles this.

These changes help to ensure safe and efficient operation in distributed environments by preventing redundant downloads and race conditions.

codecov · 2025-01-06T10:48:26Z

Codecov Report

Attention: Patch coverage is 92.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 95.81%. Comparing base (1cbc795) to head (2c49d12).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
luxonis_ml/data/datasets/luxonis_dataset.py	88.23%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #220      +/-   ##
==========================================
- Coverage   95.84%   95.81%   -0.04%     
==========================================
  Files          88       88              
  Lines        4571     4583      +12     
==========================================
+ Hits         4381     4391      +10     
- Misses        190      192       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…om_cloud)

kozlov721 · 2025-01-07T11:29:16Z

Doesn't the force flag already do what the skip_redownload_dataset does?

JSabadin · 2025-01-07T11:53:39Z

Doesn't the force flag already do what the skip_redownload_dataset does?

They serve different purposes:

force always triggers a fresh download, ignoring what’s on disk.
skip_redownload_dataset simply prevents downloads if data already exists locally.

Probably a string parameter like update_mode could encapsulate all scenarios better:

"always" = force a fresh download
"if_empty" = only download if there’s no local data
"never" = skip the download entirely

kozlov721 · 2025-01-07T11:58:45Z

Probably a string parameter like update_mode could encapsulate all scenarios better:

"always" = force a fresh download

"if_empty" = only download if there’s no local data

"never" = skip the download entirely

Merging those two parameters to one would be better. Imho it's not very clear what's the difference between force=False and skip_redownloading_dataset=True.

luxonis_ml/data/datasets/luxonis_dataset.py

luxonis_ml/data/loaders/luxonis_loader.py

feat: skip dataset re-download

a414b7e

JSabadin requested a review from a team as a code owner January 6, 2025 10:36

JSabadin requested review from kozlov721, klemen1999, tersekmatija and conorsim and removed request for a team January 6, 2025 10:36

github-actions bot assigned JSabadin Jan 6, 2025

github-actions bot added the enhancement New feature or request label Jan 6, 2025

feat: add FileLock

5ded561

JSabadin changed the title ~~feat: skip dataset re-download~~ feat: skip dataset re-download and added FileLock for safe syncing Jan 6, 2025

JSabadin added 3 commits January 6, 2025 14:22

fix bug

7135911

fix: remove redundant annotation download (already handled by sync_fr…

aab2522

…om_cloud)

tracker hotfix

4a745ad

github-actions bot added the tracker Changes affecting luxonis_ml.tracker subpackage label Jan 7, 2025

hotfix comment

f86d8d5

JSabadin added 2 commits January 7, 2025 13:42

refactor sync_from_cloud for improved clarity

f92881d

fix failing tests

05f64c8

github-actions bot added the data Changes affecting luxonis_ml.data subpackage label Jan 7, 2025

klemen1999 approved these changes Jan 7, 2025

View reviewed changes

luxonis_ml/data/datasets/luxonis_dataset.py Outdated Show resolved Hide resolved

luxonis_ml/data/datasets/luxonis_dataset.py Show resolved Hide resolved

luxonis_ml/data/loaders/luxonis_loader.py Show resolved Hide resolved

JSabadin added 2 commits January 7, 2025 17:00

fix docs, address additional potential race conditions

2ec48a9

fix rare race conditions

2c49d12

JSabadin changed the title ~~feat: skip dataset re-download and added FileLock for safe syncing~~ feat: skip dataset re-download and ensure safe dataset syncing Jan 8, 2025

JSabadin merged commit 11ee765 into main Jan 8, 2025
10 of 12 checks passed

JSabadin deleted the feat/dataset-redownload branch January 8, 2025 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: skip dataset re-download and ensure safe dataset syncing #220

feat: skip dataset re-download and ensure safe dataset syncing #220

JSabadin commented Jan 6, 2025 •

edited

Loading

codecov bot commented Jan 6, 2025 •

edited

Loading

kozlov721 commented Jan 7, 2025

JSabadin commented Jan 7, 2025

kozlov721 commented Jan 7, 2025

feat: skip dataset re-download and ensure safe dataset syncing #220

feat: skip dataset re-download and ensure safe dataset syncing #220

Conversation

JSabadin commented Jan 6, 2025 • edited Loading

Skip Redownload Dataset and Fix Potential Race Conditions

FileLock Added for Safe Syncing

codecov bot commented Jan 6, 2025 • edited Loading

Codecov Report

kozlov721 commented Jan 7, 2025

JSabadin commented Jan 7, 2025

kozlov721 commented Jan 7, 2025

JSabadin commented Jan 6, 2025 •

edited

Loading

codecov bot commented Jan 6, 2025 •

edited

Loading