-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: skip dataset re-download and ensure safe dataset syncing #220
Conversation
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #220 +/- ##
==========================================
- Coverage 95.84% 95.81% -0.04%
==========================================
Files 88 88
Lines 4571 4583 +12
==========================================
+ Hits 4381 4391 +10
- Misses 190 192 +2 ☔ View full report in Codecov by Sentry. |
Doesn't the |
They serve different purposes:
Probably a string parameter like
|
Merging those two parameters to one would be better. Imho it's not very clear what's the difference between |
Skip Redownload Dataset and Fix Potential Race Conditions
A new parameter,
update_mode
, has been introduced in theLuxonisLoader
. This parameter allows more control over when the dataset from the cloud is already downloaded locally.UpdateMode.ALWAYS
, the loader will always re-download the dataset locally.UpdateMode.IF_EMPTY
, it will only download the dataset if it does not already exist locally.Example Usage:
This ensures that the loader uses the local dataset if one is already present, helping to avoid unnecessary downloads.
FileLock Added for Safe Syncing
To address potential race conditions, a
FileLock
mechanism has been implemented in the following areas:sync_from_cloud
Method: This prevents multiple processes from attempting to sync the dataset simultaneously in a distributed environment (e.g., DDP on GCP)._get_metadata
in Dataset Initialization: Ensures safe concurrent access when multiple processes initialize the dataset and loader before setting up the DDP environment.Additionally, the
_load_df_offline
method has been updated to:sync_from_cloud
already handles this.These changes help to ensure safe and efficient operation in distributed environments by preventing redundant downloads and race conditions.