Skip to content

Commit

Permalink
Dolphin benchmark post
Browse files Browse the repository at this point in the history
  • Loading branch information
kvark committed Mar 22, 2019
1 parent 7c0976c commit 6de2aea
Show file tree
Hide file tree
Showing 2 changed files with 47 additions and 0 deletions.
47 changes: 47 additions & 0 deletions _posts/2019-03-22-dolphin-macos-performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
layout: post
title: Portability benchmark of Dolphin Emulator on MacOS
---

## Intro

gfx-rs is a Rust project aiming to make graphics programming more accessible and portable, focusing on exposing a universal Vulkan-like API. It's a single Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL. We are also building a Vulkan Portability [implementation](https://github.com/gfx-rs/portability) based on it, which allows non-Rust applications using Vulkan to run everywhere. This post is focused on the Metal backend only.

Previously, we [benchmarked Dota2](https://gfx-rs.github.io/2018/08/10/dota2-macos-performance.html) and were able to run many other applications and engines successfully, including [Dolphin Emulator](https://gfx-rs.github.io/2018/09/03/rpcs3-dolphin.html). For Dolphin, we previously focused on visual correctness. After games appeared to render correctly, we shifted our focus to performance to ensure they also render quickly.

## Setup

[@MayImilae proposed a simple benchmark scenario](https://github.com/dolphin-emu/dolphin/pull/7039#issuecomment-473520861): run the game Metroid Prime 2 (US), load into Sanctuary Fortress, wait for the animation to finish, and finally record 20 seconds of frame times (without providing any input to the game). We ensure the game window is on screen and in focus while being benchmarked.

![Metroid Prime 2](/img/dolphin-metroid-prime-2.jpg)

The Dolphin settings used for the benchmark were:
- Store EFB Copies to Texture Only must be enabled
- Speed Limit: Unlimited
- 4x native internal resolution
- Vsync: Off

As with Dota2, gfx's Metal backend was tested in 2 modes: one with Immediate command recording and one with Deferred. These where configured using `GFX_METAL_RECORDING` environment. gfx-portability itself was selected by pointing `LIBVULKAN_PATH` environment to it. The library was built from [tag 0.5](https://github.com/gfx-rs/portability/tree/0.5) using a simple `make version-release` command. We also played a bit with Dolphin's "Backend multi-threading" option (or "MT" for short) because we had doubts whether this is the right approach when used with a normal Vulkan driver.

## Results

| Test/Library | gfx/immediate/-MT | gfx/immediate/+MT | gfx/deferred/+MT | MoltenVK/-MT | MoltenVK/+MT |
| -------------------------------- | ----------------- | ----------------- | ---------------- | ------------ | ------------ |
| _platform A_ (Intel, dual-core) | | | | | |
| frame time average | 14.933781 | 15.989498 | **14.827277** | 15.731309 | 15.492961 |
| frame time variance | 2.3165195 | 2.1808865 | **1.753293** | 3.0022306 | 4.5931387 |
| _platform B_ (AMD, quad core) | | | | | |
| frame time average | 14.572058 | **14.32026** | 14.479047 | 18.306593 | 18.41038 |
| frame time variance | 17.192923 | **2.0200737** | 2.1380246 | 30.974926 | 29.487541 |

Frame times where gathered using Dolphin's built-in logging, which was manually turned on/off for that 20 second time span. The output was then fed to a simple [analysis tool](https://github.com/kvark/fps-stat/commit/179a6cc16f799a36ac4e5f661bedfb03e3f668f2) which produced the average and variance of the numbers.

## Conclusions

In Dolphin, gfx-portability provides faster and more consistent frame rates. The average frame times decreased by 4% on Intel machines, and significantly decreased by 22% on AMD machines. Consistency difference is especially visible on AMD, where we produce rock solid frame rate. Subjectively the game plays much smoother in gfx-portability as well.

Of the gfx configurations tested, the Deferred+MT showed best results. This is similar to Dota2 results, but we still find it surprising that Immediate did not get ahead this time. Unlike Dota, in this case we didn't have many small command buffers that the Deferred recording would be able to stitch together. Thus, we conclude that the explanation lies in Metal implementations/drivers, which work most efficiently when the hardware queue is immediately available (which is not the case for Immediate recording).

Rust is still showing it's strength (and potential!), although we approach a point where zero cost abstractions start breaking (quantum level?). For example, [copyless](https://github.com/kvark/copyless) crate allows us to use the same standard containers but with fewer memcpy instructions generated by LLVM. Hopefully, the optimization story of Rust will keep evolving, and eventually we'll be able to deprecate the crate and programs will run faster out of the box.

Finally, a usual disclaimer that we are not benchmarking specialists, and the results here might be taken with a grain of salt. We'll be happy to assist any party that attempts to reproduce them.
Binary file added img/dolphin-metroid-prime-2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 6de2aea

Please sign in to comment.