Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/progress #1335

Merged
merged 11 commits into from
Sep 16, 2024
Merged

Feat/progress #1335

merged 11 commits into from
Sep 16, 2024

Conversation

PGijsbers
Copy link
Collaborator

@PGijsbers PGijsbers commented Apr 2, 2024

  • Adds a progress bar to MinIO downloads
  • When download_all_files=True, only download the files which do not yet exist in cache. Overwriting is no longer necessary, as it was a crutch that was built before force_refresh_cache was added to the get_dataset method. This means we now can also avoid re-downloading image datasets over and over again.

WIP:

  • add/update test for cache behavior
  • update release notes

There is now a way to force a cache clear, so always redownloading
is not useful anymore.
@PGijsbers PGijsbers marked this pull request as ready for review September 13, 2024 12:59
@codecov-commenter
Copy link

codecov-commenter commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 62.96296% with 10 lines in your changes missing coverage. Please review.

Project coverage is 84.28%. Comparing base (b4d038f) to head (dcbdb78).

Files with missing lines Patch % Lines
openml/utils.py 40.00% 9 Missing ⚠️
openml/config.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1335      +/-   ##
===========================================
+ Coverage    84.25%   84.28%   +0.03%     
===========================================
  Files           38       38              
  Lines         5270     5293      +23     
===========================================
+ Hits          4440     4461      +21     
- Misses         830      832       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

openml/_api_calls.py Show resolved Hide resolved
openml/config.py Outdated
@@ -255,6 +259,7 @@ def _setup(config: _Config | None = None) -> None:
avoid_duplicate_runs = config["avoid_duplicate_runs"]
apikey = config["apikey"]
server = config["server"]
show_progress = config.get("show_progress", _defaults["show_progress"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this explicitly cast to bool somewhere? I found this line in the parsing of the Config which seems to suggest it needs explicit handling to convert to booleans.

if isinstance(config["FAKE_SECTION"]["avoid_duplicate_runs"], str):
config["FAKE_SECTION"]["avoid_duplicate_runs"] = config["FAKE_SECTION"].getboolean(
"avoid_duplicate_runs"
) # type: ignore
return dict(config.items("FAKE_SECTION")) # type: ignore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, can't believe I missed that. Thanks.
It turns out that this didn't work either. Setting avoid_duplicate_runs would actually give me a TypeError (wasn't allowed to set a boolean value to the Config object), so I rewrote this and added a regression test.

@eddiebergman
Copy link
Collaborator

Did a brief look at configparser and get_boolean() for common methods of specifying True/False in raw text. Covered my concerns, never knew about it, seems super useful!

@PGijsbers
Copy link
Collaborator Author

Honestly if we did it again I would probably just use toml files, which are typed.

@PGijsbers
Copy link
Collaborator Author

@eddiebergman the review request is still open. It needs explicit approval.

@eddiebergman
Copy link
Collaborator

Ahh sorry, thought it was just a thing of getting some feedback, not explicit approval, my bad

@PGijsbers PGijsbers merged commit 1d707e6 into develop Sep 16, 2024
11 of 14 checks passed
@PGijsbers PGijsbers deleted the feat/progress branch September 16, 2024 16:22
@PGijsbers PGijsbers mentioned this pull request Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants