-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/progress #1335
Feat/progress #1335
Conversation
There is now a way to force a cache clear, so always redownloading is not useful anymore.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1335 +/- ##
===========================================
+ Coverage 84.25% 84.28% +0.03%
===========================================
Files 38 38
Lines 5270 5293 +23
===========================================
+ Hits 4440 4461 +21
- Misses 830 832 +2 ☔ View full report in Codecov by Sentry. |
openml/config.py
Outdated
@@ -255,6 +259,7 @@ def _setup(config: _Config | None = None) -> None: | |||
avoid_duplicate_runs = config["avoid_duplicate_runs"] | |||
apikey = config["apikey"] | |||
server = config["server"] | |||
show_progress = config.get("show_progress", _defaults["show_progress"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this explicitly cast to bool somewhere? I found this line in the parsing of the Config which seems to suggest it needs explicit handling to convert to booleans.
openml-python/openml/config.py
Lines 336 to 340 in 8d03243
if isinstance(config["FAKE_SECTION"]["avoid_duplicate_runs"], str): | |
config["FAKE_SECTION"]["avoid_duplicate_runs"] = config["FAKE_SECTION"].getboolean( | |
"avoid_duplicate_runs" | |
) # type: ignore | |
return dict(config.items("FAKE_SECTION")) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, can't believe I missed that. Thanks.
It turns out that this didn't work either. Setting avoid_duplicate_runs would actually give me a TypeError (wasn't allowed to set a boolean value to the Config object), so I rewrote this and added a regression test.
Did a brief look at |
Honestly if we did it again I would probably just use toml files, which are typed. |
@eddiebergman the review request is still open. It needs explicit approval. |
Ahh sorry, thought it was just a thing of getting some feedback, not explicit approval, my bad |
download_all_files=True
, only download the files which do not yet exist in cache. Overwriting is no longer necessary, as it was a crutch that was built beforeforce_refresh_cache
was added to theget_dataset
method. This means we now can also avoid re-downloading image datasets over and over again.WIP: