Performance improvements for Regress Out #2781

dsm-72 · 2023-12-17T14:54:26Z

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

I am using regress_out and it is painfully slow. Even on a system where I set n_jobs=36 and sc.settings.n_jobs = 36, each core of which has 36Gib of memory, I find that

sc.pp.regress_out(adata, ['total_counts', ], n_jobs=n_jobs)

is practically unusable. At the moment that calculation is at 985 minutes. Looking at htop while the memory is certainly allocated (191 Gib / 1.48Tb), it feels like setting n_jobs at all actually hinders performance....

I've checked the source code so I know that n_jobs
should be set correctly.

n_jobs = sett.n_jobs if n_jobs is None else n_jobs # NOTE: sett.n_jobs defaults to 1

# ...

res = Parallel(n_jobs=n_jobs)(delayed(_regress_out_chunk)(task) for task in tasks)

Looking at _regress_out_chunk, there really doesn't seem to be anything necessarily bottlenecking the performance except for either setting the chunk length in regress_out, or just the size of the dataset...

len_chunk = np.ceil(min(1000, X.shape[1]) / n_jobs).astype(int)

Am I missing something?

The text was updated successfully, but these errors were encountered:

flying-sheep · 2023-12-18T16:13:52Z

@fidelram implemented this, and he’s sadly no longer active in open source software. I personally don’t have any experience with joblib, so I don’t know what could be the reason. 🤔

Generally, Python’s GIL means that Python code can’t run in parallel. I assume @fidelram used it for a good reason and there’s enough parallelizable code in there that it matters, but that’s the only lead I can think of.

dawe · 2023-12-18T16:21:52Z

There are two spawning approaches in joblib, processes and threads. Default is processes but sometimes it doesn't work properly (at least it happened to me), so you may want to change. Also, I see there's now dask support, so it may be worth a try.
latest docs here

dsm-72 · 2023-12-18T16:24:36Z

@dawe any idea how to just use Dask for this without having to reimplement this function? is there something like numba's @jit and @njit decorators?

dawe · 2023-12-18T16:28:16Z

Not sure, according to this page function code should be changed. It would be handy to add the backend as option to regress_out

flying-sheep · 2024-07-02T12:32:22Z

Done in #3110

dsm-72 added the Enhancement ✨ label Dec 17, 2023

Intron7 self-assigned this Jun 3, 2024

flying-sheep closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements for Regress Out #2781

Performance improvements for Regress Out #2781

dsm-72 commented Dec 17, 2023

flying-sheep commented Dec 18, 2023

dawe commented Dec 18, 2023

dsm-72 commented Dec 18, 2023

dawe commented Dec 18, 2023

flying-sheep commented Jul 2, 2024

Performance improvements for Regress Out #2781

Performance improvements for Regress Out #2781

Comments

dsm-72 commented Dec 17, 2023

What kind of feature would you like to request?

Please describe your wishes

flying-sheep commented Dec 18, 2023

dawe commented Dec 18, 2023

dsm-72 commented Dec 18, 2023

dawe commented Dec 18, 2023

flying-sheep commented Jul 2, 2024