-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Futureproof RetroArch with precision frame pacing presenter thread (for VRR, for BFI, for beamracing, for 120Hz/180Hz/240Hz/VRR BFI, rolling-scan CRT emulators, etc) #11390
Comments
For more advanced reading about the presenter thread idea, please read the comments section of the pull request LINK: Pull Request Talk: "Variable BFI" on Presenter Thread IdeaThere is MANY, MANY ideas there too -- and why the presenter thread does a big universal future-proofing move for RetroArch for all non-fixed-60Hz workflows (including VRR and BFI, plus future workflows such as beamracing). However, please be warned, they are BIG WALLS OF TEXT there. This github item simplifies it into an easier-to-read algorithm. Also, for those unfamiliar, this github item is a useful improver / pre-requisite for all the following:
However, you don't need to understand those fully in order to implement this github item; the Present Thread concept. |
Napkin Exercise: Use Cases of a Universal Precision Frame Pacing ThreadThere will be cases where emulator may need to execute faster/slower than the display, so in theory, the present thread or another thread may need to provide synchronization services -- to govern the speed of the speed of emulator at a ratio higher/lower than the actual display refresh cycle itself -- for different reasons. (Note: For RunAhead workflows, the "x speed emu execute" applies only to the final frame of the RunAhead per emulator frame. All other rewound RunAhead frames can run at max speed in all workflows below)
Many workflows already exist (e.g. WinUAE can already hardware beamrace a VRR refresh cycle), this napkin exercise simply provides the developer to correctly think a universal futureproof workflow. It shows we need all of them: emuHz<realHz, emuHz=realHz, emuHz>realHz -- for the purposes of precision frame presenting -- for different use cases. Glossary
|
Now that #15299 (minor tweak) is solved... TL;DR Version Of This Feature Request
This solves a hell of a lot of problems with framepacing -- and improves many algorithms. It will makes perfect BFI flicker possible during CPU-stutter situations (it'd just look like a CRT stuttering -- with no modification to flicker rate). Even those hard cores like bsnes would still BFI perfectly at 240Hz, even if bsnes runs at 57fps and stutters a bit. It may temporarily cease to be beamraceable (Lagless VSYNC #6984 that optionally might want to run in sync with beam simulators like BFIv3 #10757) but the BFI would still flicker at a constant rate like a CRT, despite the underlying stutters. It would allow a lot of feature additions, including zero-latency emulation (lagless VSYNC like WinUAE) to make RetroArch match the latency of an FPGA (to within one frameslice, ala #6984) This solves a hell of a lot of problems with framepacing -- and improves many algorithms. It will makes perfect BFI flicker possible during CPU-stutter situations (it'd just look like a CRT stuttering -- with no modification to flicker rate). This will improve reliability even further and add some magical powers. |
Long Version:
Note: Crossposted feature suggestion from semi-related 240Hz BFI pull request at #11342
This was too major an ask to be inside a github pull request so I am creating a new thread. To pave groundwork for improving RetroArch compatibility with improving display emulation (improving variable refresh rate to reduce lag/stytters, improving BFI with 240Hz BFI to better emulate a CRT, etc).
This is a universal generic algorithm that should eventually become a best-practice for emulators in the next ten years.
Goals
Problem: Existing frame pacing algorithm not future proof enough
Currently, G-SYNC uses a software-based frame pacing algorithm in the emulator, but it is not currently optimized in a future-proof way yet:
The upgrade to existing frame pacing algorithm
I propose a separate thread responsible for frame presenting (e.g. Present() or glxxSwapBuffers() or whichever API) that does the following:
Presents will have no jitter even if emulator modules use darn near 100% CPU
This can be done via busywait instead (or in addition to) of timer events -- because some algorithms (beamrace or BFI+VRR) have visibile artifacts with sub-millisecond errors. Also, one can also timer-event to 0.5ms prior, then busywait on high-precision-clock the rest of way.
(VSYNC ON, VSYNC OFF, DWM, triple buffer, AMD Enhanced Sync, NVIDIA Fast Sync, FreeSync, G-SYNC, VESA Adaptive-Sync, BFI, etc), making it much easier to combine them (e.g. BFI during VRR)
(e.g. rolling BFI CRT emulators, or lagless VSYNC beam racing), with little or no modification to emulator rendering
Largely a Streamlining of Existing Workflow
The existing present call would be replaced by a wrapper that passes the frame to a frame presentation thread. The presentation thread will time the presentation itself.
Some of the workflow already exists, it just needs to be re-jigged into an official unified workflow with capability of improved precision.
Suggested Stage 1 Workflow
Metaphorically, this workflow is a metaphorically software-based VSYNC ON emulator, hiding the quirks of GPU drivers or destination displays away from emulator rendering. While simultaneously improving user-friendliness (things just works automatically upon startup) and making things less buggy (no VRR stutters, no BFI flicker) and future proofing (even BFI made VRR compatible, hardware-based beamrace, software-based beamrace, CRT beam emulators, not-yet-invented display algorithms).
In a 60Hz VSYNC ON scenario, this is just defacto passthrough behavior (Present Thread will immediately present), while allowing one framepacing algorithm to work with ALL sync technologies more reliably. And it adds no extra workflow lag.
Don't worry about BFI for now (#10754 and/or #10757), don't worry about beamracing for now (#6984 and/or #10757); those are solvable in future (e.g. wrappers for PresentScanLine() can be added later to pass one pixel row between Rendering Thread to the Present Thread, as an example). For now, just focus on generic crossplatform full-frame workflow.
Easy Debugging Tip for 60Hz-Only Developers: VSYNC OFF
Testing without VRR can be done via 60Hz VSYNC OFF while using CPU-heavy emulation/emulation settings. Use 60Hz VSYNC OFF, and use tearline jitter as a timing-precision debugger. If the tearline erratically moves or jitters/vibrates massively, your present timing is not "best-effort microsecond-accurate". If the tearline is stationary or rolls slowly up/down, your present timing is nearly microsecond-accurate.
1080p 60Hz is a horizontal scanrate of 67.5 kilohertz (approx 67500 pixel rows per per second, including VBI). So a 1/67500th second delay moves a VSYNC OFF tearline downwards by 1 pixel. Modern displays still scan from top-to-bottom (high speed videos) and VSYNC OFF tearlines are a raster artifact.
So if your tearline is vibrating by 50 pixels up/down, that means you've got a 50/67500th second imprecision in your Present() or glxxSwapBuffers() timing. Thusly, VSYNC OFF 60Hz is an excellent timing debugger, since VSYNC OFF tearline is a real-display raster where the new real GPU framebuffer splices into the destination display's scanout position. Run a horizontal-panning videogame (such as a platformer) to find the tearline.
If you have a high-Hz display, you can also test 60fps at VSYNC OFF 120Hz or VSYNC OFF 240Hz for more sensitive timing-precision debugging (1/135000th second for a 120Hz tearline moving downwards by 1 pixel, for example).
When you succeed in generating a stable VSYNC OFF tearline, it automatically translates to VRR users get amazing framepacing, and BFI users getting artifactless flicker-free operation (even if you never test VRR or BFI) Thus, use 60 Hz VSYNC OFF as a clever easy debugger for frame-present timing precision if you don't have 144Hz or VRR or BFI!
The text was updated successfully, but these errors were encountered: