Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Executor memory overhead overriding #1462

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

LukMRVC
Copy link

@LukMRVC LukMRVC commented Mar 2, 2025

Which issue does this PR close?

Closes #1460 .

Rationale for this change

Even when using unified memory, Comet would override spark.executor.memoryOverhead using inconsistent byte units.
This situation should not have happened in the first place.

What changes are included in this PR?

Fixes for

  • shouldOverrideMemoryConf
  • Adds result of getCometShuffleMemorySizeInMiB to executor memoryOverhead

How are these changes tested?

Added CometPluginsNonOverrideUnifiedModeSuite to test that spark.executor.memoryOverhead is not overridden, which failed in previous versions.

@andygrove
Copy link
Member

Thanks @LukMRVC. Could you fix the code formatting issue?

There are instructions in https://datafusion.apache.org/comet/contributor-guide/development.html#submitting-a-pull-request

@LukMRVC
Copy link
Author

LukMRVC commented Mar 2, 2025

Yes, it should be fixed by now.

@codecov-commenter
Copy link

codecov-commenter commented Mar 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.82%. Comparing base (f09f8af) to head (6115fcc).
Report is 56 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1462      +/-   ##
============================================
+ Coverage     56.12%   57.82%   +1.69%     
- Complexity      976      985       +9     
============================================
  Files           119      122       +3     
  Lines         11743    12178     +435     
  Branches       2251     2298      +47     
============================================
+ Hits           6591     7042     +451     
+ Misses         4012     3958      -54     
- Partials       1140     1178      +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wForget
Copy link
Member

wForget commented Mar 3, 2025

Thank you for finding this issue, which was introduced in my previous PR #1379

// comet shuffle unified memory manager is disabled, so we need to add overhead memory
CometSparkSessionExtensions.getCometShuffleMemorySize(sc.getConf)
}
CometSparkSessionExtensions.getCometShuffleMemorySizeInMiB(sc.getConf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCometShuffleMemorySizeInMiB is only used for shuffle memory, we should also consider using getCometMemoryOverheadInMiB when unified memory manager is disabled

@wForget
Copy link
Member

wForget commented Mar 3, 2025

Thank you for finding this issue, which was introduced in my previous PR #1379

My initial proposal was:

  • When unified memroy manager is disabled, we should use getCometMemoryOverheadInMiB override spark executor memory overhead.
  • When unified memroy manager is enabled but shuffle unified memroy manager is disabled, we should use getCometShuffleMemorySizeInMiB override spark executor memory overhead.
  • When unified memroy manager and shuffle unified memroy manager are enabled, we don't need to override spark executor memory overhead.

@LukMRVC
Copy link
Author

LukMRVC commented Mar 3, 2025

Okay, @wForget I reverted some of my changes back to align with your proposal of overriding executor memory.

@andygrove
Copy link
Member

PR builds are currently failing. This should be fixed by #1465

@wForget
Copy link
Member

wForget commented Mar 4, 2025

Okay, @wForget I reverted some of my changes back to align with your proposal of overriding executor memory.

Does newly added test case also need to be changed?

@LukMRVC
Copy link
Author

LukMRVC commented Mar 4, 2025

That one should be fine, they passed with current changes to overriding spark executor memory, but fail without them.

conf
}

/** Since using unified memory, executor memory should not be overridden */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one should be fine, they passed with current changes to overriding spark executor memory, but fail without them.

Indeed, I successfully run this test locally, but this comment seems incorrect

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I overlooked it. I changed the comment.

@LukMRVC LukMRVC force-pushed the executor-memory-overhead-fix branch from 6115fcc to de101cd Compare March 4, 2025 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comet executor memory overriding to absurd numbers (unified mode)
4 participants