Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add fair unified memory pool #1369

Merged
merged 12 commits into from
Feb 10, 2025

Conversation

kazuyukitanimura
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Current Comet unified memory pool is a greedy pool. One thread (consumer) can take a large amount of memory that causes OOM for other threads, especially for aggregation.

What changes are included in this PR?

Added a fair version of unified memory pool similar to DataFusion FairSpilPool that caps the memory usage at pool_size/num
The fair unified memory pool is the default for off-heap mode with this PR

How are these changes tested?

Exisiting tests

@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 39.06%. Comparing base (f09f8af) to head (05fd0f1).
Report is 22 commits behind head on main.

Additional details and impacted files
@@              Coverage Diff              @@
##               main    #1369       +/-   ##
=============================================
- Coverage     56.12%   39.06%   -17.06%     
- Complexity      976     2071     +1095     
=============================================
  Files           119      263      +144     
  Lines         11743    60746    +49003     
  Branches       2251    12909    +10658     
=============================================
+ Hits           6591    23733    +17142     
- Misses         4012    32530    +28518     
- Partials       1140     4483     +3343     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kazuyukitanimura
Copy link
Contributor Author

@@ -48,7 +48,7 @@ Comet provides the following configuration settings.
| spark.comet.exec.hashJoin.enabled | Whether to enable hashJoin by default. | true |
| spark.comet.exec.initCap.enabled | Whether to enable initCap by default. | false |
| spark.comet.exec.localLimit.enabled | Whether to enable localLimit by default. | true |
| spark.comet.exec.memoryPool | The type of memory pool to be used for Comet native execution. Available memory pool types are 'greedy', 'fair_spill', 'greedy_task_shared', 'fair_spill_task_shared', 'greedy_global' and 'fair_spill_global', By default, this config is 'greedy_task_shared'. | greedy_task_shared |
| spark.comet.exec.memoryPool | The type of memory pool to be used for Comet native execution. Available memory pool types are 'greedy', 'fair_spill', 'greedy_task_shared', 'fair_spill_task_shared', 'greedy_global' and 'fair_spill_global'. For off-eap types are 'unified' and `fair_unified`. | greedy_task_shared |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add default is fair_unified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to omit what the default value is. When default value is changed, the option description is often forgotten to be updated because there is no checking for the description. So we deleted default value explanations in the past

impl MemoryPool for CometFairMemoryPool {
fn register(&self, _: &MemoryConsumer) {
let mut state = self.state.lock();
state.num = state.num.checked_add(1).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this unwraps can be super confusing if they happen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't happen in reality, num is usize. I wouldn't worry about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to expect() with a message

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kazuyukitanimura nice job
unwraps probably could be wrapped into user friendly messages or its gonna super confusing to unwrap panic when it is supposed to be a memory or memory manager issue.

state.num = state.num.checked_sub(1).unwrap();
}

fn grow(&self, reservation: &MemoryReservation, additional: usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method implementation seems to be different from memory_pool, but I'm not sure which one is reasonable.

https://github.com/apache/datafusion-comet/blob/main/native/core/src/execution/memory_pool.rs#L83-L87

@kazuyukitanimura
Copy link
Contributor Author

@andygrove I would like to make sure to include this for the 0.6.0 release

@andygrove
Copy link
Member

@andygrove I would like to make sure to include this for the 0.6.0 release

I added this PR to #1361

@andygrove
Copy link
Member

I tried running TPC-H locally with this PR but there is excessive spilling and the benchmark is extremely slow. q3 took 3.4 minutes compared to 16 seconds on main branch.

2025-02-08_12-07

@kazuyukitanimura
Copy link
Contributor Author

Thanks @andygrove for trying. You may have to increase the memory because the fair memory pool limits memory usage earlier in order to leave the the rest for other threads.
Or I can keep the default to the original memory pool and make this fair pool optional

@kazuyukitanimura
Copy link
Contributor Author

Went ahead and kept the default pool choice @andygrove

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kazuyukitanimura kazuyukitanimura merged commit f099e6e into apache:main Feb 10, 2025
74 checks passed
@kazuyukitanimura
Copy link
Contributor Author

Merged, thanks @andygrove

EmilyMatt pushed a commit to EmilyMatt/datafusion-comet that referenced this pull request Feb 12, 2025
## Which issue does this PR close?

## Rationale for this change

Current Comet unified memory pool is a greedy pool. One thread (consumer) can take a large amount of memory that causes OOM for other threads, especially for aggregation.

## What changes are included in this PR?

Added a fair version of unified memory pool similar to DataFusion `FairSpilPool` that caps the memory usage at `pool_size/num`
The fair unified memory pool is the default for off-heap mode with this PR

## How are these changes tested?

Exisiting tests
kazuyukitanimura added a commit that referenced this pull request Feb 13, 2025
## Which issue does this PR close?

Closes #1388 

## Rationale for this change

Following up on #1369 and #1386 

## What changes are included in this PR?

Updated the doc

## How are these changes tested?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants