-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache dimension snakecase #181
Conversation
It turns out that this piece of code gets run very frequently and accounts for quite a lot of runtime for xbrl extraction. In the test scenario, this improvements seems to produce ~15% reduction in runtime.
For more information, see https://pre-commit.ci
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #181 +/- ##
==========================================
+ Coverage 93.43% 93.44% +0.01%
==========================================
Files 8 8
Lines 609 610 +1
==========================================
+ Hits 569 570 +1
Misses 40 40 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I bet there are other speedups lurking in here too. Thanks for finding this.
Yep, I would like to test this in the full PUDL ETL before committing this PR, so I'm trying to figure out the right (and low effort) way to go about that as I'm assuming that this may not be the last such change. I should probably also do a write-up/wiki doc for how to do profiling of the code to find stuff like this. It turns out that the multi-processing is kind of annoying and use of process executors here definitely got in the way of getting clean profiles. |
For testing the altered
That should replace the version of Once you're satisfied with the changes to |
Thanks, that definitely sounds like the easiest way to go about testing this. |
I have tested this in the context of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we might as well merge it in. It doesn't really add complexity and it'll improve performance in some computational contexts even if it doesn't right now, so why not?
I would like to re-run the analysis and that would be harder once this is merged in. In general, I'm a bit wary of optimizations that actually don't help :-/ |
Haha, okay well I'm fine with just closing it PR too. Splitting off the FERC extractions in our CI reorganizations will be a much much larger performance boost. |
Let's merge. It may not help but it won't hurt and realistically, I don't have enough cycles to dig into this before I free up my plate of other things. You're right that more significant speedups will likely be brought by the sharding of |
It turns out that this piece of code gets run very frequently and accounts for quite a lot of runtime for xbrl extraction. In the test scenario, this improvements seems to produce ~15% reduction in extraction time.