Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Renderer Performance #545

Merged
merged 1 commit into from
Aug 27, 2024

Conversation

tophyr
Copy link
Contributor

@tophyr tophyr commented Aug 24, 2024

Pull Request Type

  • GitHub Workflow changes
  • Documentation or Wiki changes
  • Build and Dependency changes
  • Runtime changes
    • Render changes
    • Audio changes
    • Input changes
    • Network changes
    • Other changes

Description

Use orphaned "buffer update streaming" to eliminate synchronization delays, due to CPU->GPU latency, causing framerate slowdowns. This sends vertex data to the GPU via partial updates to a buffer and reallocates the buffer once it fills up, ensuring that no synchronization is ever needed. The buffer is sized to balance memory usage vs allocation rate, and the GL driver ensures that "orphaned" buffers are only destroyed when all GL commands using them are retired.

Related Issues

Fixes #543

Screenshots (if applicable)

Screenshot 2024-08-24 at 12 44 59 AM

Checklist

  • I have tested my changes locally and verified that they work as intended.
  • I have documented any new or modified functionality.
  • I have reviewed the changes to ensure they do not introduce any unnecessary complexity or duplicate code.
  • I understand that by submitting this pull request, I am agreeing to license my contributions under the project's license.

@MaddTheSane
Copy link
Contributor

I can confirm that this fixed whatever was wrong on my M3 MacBook Pro.

@tophyr tophyr force-pushed the pr/improve-render-perf branch from 1d07172 to a693814 Compare August 24, 2024 22:11
Use orphaned "buffer update streaming" to eliminate synchronization
delays, due to CPU->GPU latency, causing framerate slowdowns. This sends
vertex data to the GPU via partial updates to a buffer and reallocates
the buffer once it fills up, ensuring that no synchronization is ever
needed. The buffer is sized to balance memory usage vs allocation rate,
and the GL driver ensures that "orphaned" buffers are only destroyed
when all GL commands using them are retired.
@tophyr tophyr force-pushed the pr/improve-render-perf branch from a693814 to be59b88 Compare August 24, 2024 22:19
@tophyr
Copy link
Contributor Author

tophyr commented Aug 24, 2024

Fixed Windows build.

@winterheart
Copy link
Collaborator

I tested on Linux, and there no issues (mine framerate is capped by 60 FPS by default). I'd request additional testings on Windows though.

@Lgt2x
Copy link
Member

Lgt2x commented Aug 26, 2024

On my windows laptop (no dedicated GPU), FPS goes from ~60 to ~20 on the Bluedevil stage with this commit (debug build)

@MaddTheSane
Copy link
Contributor

What GPU does the laptop have?

@Lgt2x
Copy link
Member

Lgt2x commented Aug 26, 2024

Thinkpad T480s, probably some Intel integrated chip. FPS drop is less dramatic on other campaign missions but still noticeable (60->50). I also get some screen tearing that I didnt have before the renderer rework

@tophyr
Copy link
Contributor Author

tophyr commented Aug 26, 2024

Tearing is interesting. I can't think of anything in this merge that would directly affect that, but maybe the synchronization overhead caused some problems with vsync/frame atomicity at the driver level.

@Lgt2x
Copy link
Member

Lgt2x commented Aug 26, 2024

Tearing already happened after #527, not new here though

@tophyr
Copy link
Contributor Author

tophyr commented Aug 26, 2024

On my windows laptop (no dedicated GPU), FPS goes from ~60 to ~20 on the Bluedevil stage with this commit (debug build)

oh wait wow i thought this comment was on the already-committed PR. this one makes your perf worse? what was your laptop's FPS on 1.5?

@tophyr
Copy link
Contributor Author

tophyr commented Aug 26, 2024

Speculating: Integrated GPU likely means shared VRAM. I wonder if that alters the effectiveness of the buffer streaming approach.

1. FPS before #527: ???
2. FPS  after #527: 60 (?)
3. FPS  after #545: 20

Is that right? I would hope that there would also be a significant drop between 1 and 2, otherwise I'd be pretty confused tbh.

My first thought is maybe to play around with the allocated buffer size - if it's ignoring the GL_MAP_UNSYNCHRONIZED_BIT hint, then a smaller buffer will likely help by reducing contention (up to some local maxima where reallocation overhead might start costing time again).

@Lgt2x
Copy link
Member

Lgt2x commented Aug 27, 2024

Speculating: Integrated GPU likely means shared VRAM. I wonder if that alters the effectiveness of the buffer streaming approach.

1. FPS before #527: ???
2. FPS  after #527: 60 (?)
3. FPS  after #545: 20

Is that right? I would hope that there would also be a significant drop between 1 and 2, otherwise I'd be pretty confused tbh.

My first thought is maybe to play around with the allocated buffer size - if it's ignoring the GL_MAP_UNSYNCHRONIZED_BIT hint, then a smaller buffer will likely help by reducing contention (up to some local maxima where reallocation overhead might start costing time again).

yes, something like that. FPS in 1. was about 60, in 2. much less consistent, and 3. even lower. I'll do more testing and proper benchmark, it's important that we support lower-end devices correctly

@tophyr
Copy link
Contributor Author

tophyr commented Aug 27, 2024

Definitely important, and definitely concerning. This will potentially be a difficult case to test/fix because I don't have any low-end machinery >_<

If you could profile your performance or run it with Renderdoc or another introspection tool, that may give some clues.

@Lgt2x
Copy link
Member

Lgt2x commented Aug 27, 2024

On the same device, both Linux and Windows release build gets 60FPS pretty consistently, so it's only about the Windows debug build. We've had slowdowns on Win debug before, IMO it's acceptable to ignore for the time being.

@Lgt2x Lgt2x merged commit 2db85ca into DescentDevelopers:main Aug 27, 2024
10 checks passed
@tophyr
Copy link
Contributor Author

tophyr commented Aug 28, 2024

Aha! Great find. This PR makes use of iterators instead of bare pointers, and MSVC's std::array<T>::iterator does not decay to T* like clang's and gcc's do. Feels likely that MSVC might be doing a bunch more logic in its iterators on the Debug build than on the Release one.

Thanks for merging!

@tophyr tophyr deleted the pr/improve-render-perf branch August 28, 2024 01:24
@winterheart
Copy link
Collaborator

MSVC uses STL iterators debugging (https://learn.microsoft.com/en-us/cpp/standard-library/debug-iterator-support) which greatly reduces performance on iterating STL containers like std::array in Debug mode.

@Lgt2x
Copy link
Member

Lgt2x commented Aug 28, 2024

MSVC uses STL iterators debugging (https://learn.microsoft.com/en-us/cpp/standard-library/debug-iterator-support) which greatly reduces performance on iterating STL containers like std::array in Debug mode.

I was not aware of that! Is there any way we could kindly ask MSVC to disable iterators debugging for this iterator in particular?

@winterheart
Copy link
Collaborator

#define _ITERATOR_DEBUG_LEVEL 0 should do trick, but I never succeeded in my builds.

@tophyr
Copy link
Contributor Author

tophyr commented Aug 28, 2024

MSVC uses STL iterators debugging (https://learn.microsoft.com/en-us/cpp/standard-library/debug-iterator-support) which greatly reduces performance on iterating STL containers like std::array in Debug mode.

I was not aware of that! Is there any way we could kindly ask MSVC to disable iterators debugging for this iterator in particular?

#550 should hopefully mitigate this. @winterheart's suggestion of _ITERATOR_DEBUG_LEVEL would only work if we did it project-wide, otherwise it'd result in ODR violations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Runtime Issue]: New renderer slowing down the game
4 participants