enhancement: addresses #29 using vectorization instead of multiprocessing #132

waynemaranga · 2024-07-19T01:10:14Z

From #29 :
TLDR: multprocessing might be overkill for this task. Vectorisation using numpy arrays works. I've swapped the lists for ndarray where possible and it's worked. Run the test using pytest to show the results are the same and the new BiaxialBendingResults.get_results_list() is faster.

Why `multiprocessing` might not work

I feel the need to explain this because my solution veers away from your original idea of parallelising the operation.
I've worked quite a bit with async in web servers so I have a good idea of how multiprocessing works, including in Python. Compared to the database operations and handling http requests, accessing the contents of a list are quite small. So small that spreading out the processes using a pool might make the operation run even slower than it already is.

Vectorisation with `numpy` arrays

Vector operations will give you that sense of "parallel computing" without the overhead of spreading the tasks over CPU cores. The simplest way to do this is with numpy arrays, i.e changing from this

for result in self.results:
    m_x_list.append(result.
    m_y_list.append(result.m_y)

to this

np_results: NdArray = np.array(
    [ (m_x, result_.m_y) for result_ in results ], dtype=dtype )

works well enough because:

NumPy is already significantly faster than pure python and;
this kind of operation (creating a sequence) is limited more by RAM, not by CPU.

Here's a github gist of the tests I ran with random results data between 1 result and 1,000,000. The benchmark is well into the 10,000x speed up for thousands of results.

There's a link to the Colab Notebook so you can run the test yourself.
I am in the process of doing something similar for MomentInteractionResults.get_results_list(). Now, I would have added the implementations of multiprocessing that I tried out but I felt it unnecessary given the explanations above, but you're welcome to ask for them. I like this library very much btw.
]

Added tests for changed code. The code itself is in tests
Updated documentation for changed code. Not sure if I should do this when writing a test. More like a draft
Run the Nox test suite to check for errors and warnings. All Passing

…ts_lists() in results

feature: address robbievanleeuwen#29 using vectorization on get_resul…

9da02cb

…ts_lists() in results

waynemaranga changed the title ~~feature: addresses #29 using vectorization instead of multiprocessing~~ enhancement: addresses #29 using vectorization instead of multiprocessing Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement: addresses #29 using vectorization instead of multiprocessing #132

enhancement: addresses #29 using vectorization instead of multiprocessing #132

waynemaranga commented Jul 19, 2024 •

edited

Loading

enhancement: addresses #29 using vectorization instead of multiprocessing #132

Are you sure you want to change the base?

enhancement: addresses #29 using vectorization instead of multiprocessing #132

Conversation

waynemaranga commented Jul 19, 2024 • edited Loading

Why multiprocessing might not work

Vectorisation with numpy arrays

waynemaranga commented Jul 19, 2024 •

edited

Loading

Why `multiprocessing` might not work

Vectorisation with `numpy` arrays