-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements for Regress Out #2781
Comments
@fidelram implemented this, and he’s sadly no longer active in open source software. I personally don’t have any experience with joblib, so I don’t know what could be the reason. 🤔 Generally, Python’s GIL means that Python code can’t run in parallel. I assume @fidelram used it for a good reason and there’s enough parallelizable code in there that it matters, but that’s the only lead I can think of. |
There are two spawning approaches in joblib, processes and threads. Default is processes but sometimes it doesn't work properly (at least it happened to me), so you may want to change. Also, I see there's now dask support, so it may be worth a try. |
Not sure, according to this page function code should be changed. It would be handy to add the backend as option to |
Done in #3110 |
What kind of feature would you like to request?
Additional function parameters / changed functionality / changed defaults?
Please describe your wishes
I am using
regress_out
and it is painfully slow. Even on a system where I setn_jobs=36
andsc.settings.n_jobs = 36
, each core of which has 36Gib of memory, I find thatis practically unusable. At the moment that calculation is at
985
minutes. Looking athtop
while the memory is certainly allocated (191 Gib / 1.48Tb
), it feels like settingn_jobs
at all actually hinders performance....I've checked the source code so I know that
n_jobs
should be set correctly.
Looking at
_regress_out_chunk
, there really doesn't seem to be anything necessarily bottlenecking the performance except for either setting the chunk length inregress_out
, or just the size of the dataset...Am I missing something?
The text was updated successfully, but these errors were encountered: