perf(swordfish): Parallel expression evaluation #3593

colin-ho · 2024-12-17T23:51:44Z

Addresses: #3389. More generally, this PR optimizes for projections with many expressions, particularly memory intensive expressions like UDFs.

Problem:

Currently, swordfish parallelizes projections across morsels, with 1 CPU per morsel. However, if each projection has many memory intensive expressions, we could experience a massive inflation in memory because we will have many materialized morsels living in memory at once.

Proposed solution:

Instead, we can parallelize the expressions within the projection (but only for expressions that require compute). This way, we still have good CPU utilization, but we keep a lower number of materialized morsels in memory.

In the linked issue above, we see that a 128cpu machine will parallelize morsels across the cores, each doing multiple udfs, resulting in "317GB allocations and duration 351 secs".

This PR reduces that to 7.8GB peak memory and runtime of 66 seconds.

Notes:

Found a bug with the loole channels where an async send to a sync receive was not respecting capacity constraints, and was allowing sends even though the receive did not happen. Moved over to https://github.com/fereidani/kanal, which worked much better.

Todos for next time:

We should also be able to parallelize expression evaluation within a single expression, since it is a tree. We can calculate max width of the tree and set that as max parallel tasks.

codspeed-hq · 2024-12-18T00:04:52Z

CodSpeed Performance Report

Merging #3593 will improve performances by 38.13%

_{Comparing colin/par-eval-expr (bcd8887) with main (bae106c)}

Summary

⚡ 2 improvements
✅ 25 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`colin/par-eval-expr`	Change
⚡	`test_count[1 Small File]`	3.8 ms	3.4 ms	+14.69%
⚡	`test_iter_rows_first_row[100 Small Files]`	208.1 ms	150.7 ms	+38.13%

codecov · 2024-12-18T05:07:18Z

Codecov Report

Attention: Patch coverage is 92.34973% with 14 lines in your changes missing coverage. Please review.

Project coverage is 77.93%. Comparing base (bae106c) to head (bcd8887).

Files with missing lines	Patch %	Lines
src/daft-dsl/src/expr/mod.rs	68.00%	8 Missing ⚠️
...ft-local-execution/src/intermediate_ops/project.rs	92.00%	4 Missing ⚠️
src/daft-local-execution/src/run.rs	91.30%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3593      +/-   ##
==========================================
+ Coverage   77.91%   77.93%   +0.01%     
==========================================
  Files         727      727              
  Lines       91129    91588     +459     
==========================================
+ Hits        71007    71379     +372     
- Misses      20122    20209      +87

Files with missing lines	Coverage Δ
src/daft-local-execution/src/channel.rs	`98.14% <100.00%> (-0.10%)`	⬇️
src/daft-local-execution/src/dispatcher.rs	`93.52% <100.00%> (ø)`
...-execution/src/intermediate_ops/intermediate_op.rs	`83.33% <100.00%> (+0.66%)`	⬆️
src/daft-local-execution/src/pipeline.rs	`90.73% <100.00%> (+0.06%)`	⬆️
src/daft-local-execution/src/runtime_stats.rs	`67.17% <100.00%> (-0.74%)`	⬇️
...rc/daft-local-execution/src/sinks/blocking_sink.rs	`85.15% <100.00%> (ø)`
...c/daft-local-execution/src/sinks/streaming_sink.rs	`82.01% <100.00%> (ø)`
src/daft-local-execution/src/sources/source.rs	`58.44% <100.00%> (ø)`
...rc/daft-micropartition/src/ops/eval_expressions.rs	`97.77% <100.00%> (+0.76%)`	⬆️
src/daft-table/src/lib.rs	`84.72% <100.00%> (+0.97%)`	⬆️
... and 3 more

... and 11 files with indirect coverage changes

colin-ho · 2024-12-19T00:37:52Z

src/daft-local-execution/src/channel.rs

@@ -36,7 +32,7 @@ pub(crate) fn create_ordering_aware_receiver_channel<T: Clone>(
 ) -> (Vec<Sender<T>>, OrderingAwareReceiver<T>) {
    match ordered {
        true => {
-            let (senders, receiver) = (0..buffer_size).map(|_| create_channel::<T>(1)).unzip();
+            let (senders, receiver) = (0..buffer_size).map(|_| create_channel::<T>(0)).unzip();


Setting channel sizes to 0 can elide heap allocation.

par eval expr

19672d1

github-actions bot added the perf label Dec 17, 2024

remove unused dep

2aa778d

Colin Ho and others added 2 commits December 17, 2024 17:43

Merge branch 'main' into colin/par-eval-expr

8ae863b

fix connect

ba31a88

Colin Ho added 2 commits December 18, 2024 16:23

cleanup

6e6a6ed

Merge branch main into colin/par-eval-expr

a8a0fb1

colin-ho commented Dec 19, 2024

View reviewed changes

colin-ho marked this pull request as ready for review December 19, 2024 00:38

colin-ho requested a review from samster25 December 19, 2024 01:45

Colin Ho and others added 5 commits January 6, 2025 11:59

Merge branch main into colin/par-eval-expr

1e9c9cc

Merge branch main into colin/par-eval-expr

41e1b95

Merge branch main into colin/par-eval-expr

97123e3

Merge branch main into colin/par-eval-expr

ec3702b

Merge branch main into colin/par-eval-expr

bcd8887

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(swordfish): Parallel expression evaluation #3593

perf(swordfish): Parallel expression evaluation #3593

colin-ho commented Dec 17, 2024 •

edited

Loading

codspeed-hq bot commented Dec 18, 2024 •

edited

Loading

codecov bot commented Dec 18, 2024 •

edited

Loading

colin-ho Dec 19, 2024

perf(swordfish): Parallel expression evaluation #3593

Are you sure you want to change the base?

perf(swordfish): Parallel expression evaluation #3593

Conversation

colin-ho commented Dec 17, 2024 • edited Loading

codspeed-hq bot commented Dec 18, 2024 • edited Loading

CodSpeed Performance Report

Merging #3593 will improve performances by 38.13%

Summary

Benchmarks breakdown

codecov bot commented Dec 18, 2024 • edited Loading

Codecov Report

colin-ho Dec 19, 2024

Choose a reason for hiding this comment

colin-ho commented Dec 17, 2024 •

edited

Loading

codspeed-hq bot commented Dec 18, 2024 •

edited

Loading

codecov bot commented Dec 18, 2024 •

edited

Loading