Skip to content

Commit

Permalink
Bolder captions
Browse files Browse the repository at this point in the history
  • Loading branch information
kvark committed Aug 10, 2018
1 parent 1891e11 commit 5c2013d
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions _posts/2018-08-10-dota2-macos-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: post
title: Portability benchmark of Dota2 on MacOS
---

### The Race
## The Race

gfx-rs is a Rust project aiming to make graphics programming more accessible and portable, focusing on exposing a universal Vulkan-like API. It's a single Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL. We are also building a Vulkan Portability [implementation](https://github.com/gfx-rs/portability) based on it, which allows non-Rust applications using Vulkan to run everywhere. This post is focused on the Metal backend only.

Expand All @@ -13,7 +13,7 @@ Once Dota2 became functional under our portability implementation, we entered a

![simple test with gfx-portability](/img/dota-simple.jpg)

#### Modes
### Modes

Our portability library has been tested in two different modes of the Metal backend:
- "Immediate" - We record the underlying Metal command buffers "live" as the corresponding Vulkan command buffers are recorded. Doing this allows the user to parallelize the recording as they see fit, allowing them to make sure everything is ready to be submitted.
Expand All @@ -23,20 +23,20 @@ Please note that technically "Immediate" recording has less CPU overhead. Additi

Another interesting observation is that 71% of time spent by gfx/immediate (when recording the commands only) is actually spent inside the driver, according to our instrumented profile. This gives us an upper bound for Vulkan Portability overhead on Metal to be about 40%, although that includes some OS and Metal run-time bits as well.

### Results
## Results

| Test/Library | gfx/immediate | gfx/deferred | MoltenVK | OpenGL |
| -------------------------------- | ------------- | ------------ | ------------ | ----------- |
| CPU % of Main thread | 35% | 12% | 21% | ? |
| _platform A_ (Intel, dual-core) | | | |
| fps/variability on low settings | 41.5 / 4.4 | 47.9 / 4.6 | 40.5 / 6.3 | 45.0 / 5.2 |
| fps/variability on high settings | 33.9 / 3.5 | 41.3 / 4.0 | 35.9 / 5.3 | 34.9 / 6.6 |
| fps/variability on low settings | 41.5 / 4.4 | **47.9** / 4.6 | 40.5 / 6.3 | 45.0 / 5.2 |
| fps/variability on high settings | 33.9 / 3.5 | **41.3** / 4.0 | 35.9 / 5.3 | 34.9 / 6.6 |
| _platform B_ (AMD, quad core) | | | |
| fps/variability on low settings | 58.1 / 11.4 | 74.5 / 11.2 | 71.7 / 12.6 | 77.0 / 12.7 |
| fps/variability on high settings | 51.1 / 9.3 | 59.2 / 7.4 | 61.4 / 10.0 | 49.0 / 5.8 |
| fps/variability on low settings | 58.1 / 11.4 | 74.5 / 11.2 | 71.7 / 12.6 | **77.0** / 12.7 |
| fps/variability on high settings | 51.1 / 9.3 | 59.2 / 7.4 | **61.4** / 10.0 | 49.0 / 5.8 |
| _platform C_ (NV, quad core) | | | |
| fps/variability on low settings | 54.3 / 10.0 | 66.0 / 7.7 | 64.0 / 7.8 | 56.7 / 7.0 |
| fps/variability on high settings | 40.6 / 4.4 | 43.1 / 3.8 | 42.1 / 3.9 | 37.6 / 3.2 |
| fps/variability on low settings | 54.3 / 10.0 | **66.0** / 7.7 | 64.0 / 7.8 | 56.7 / 7.0 |
| fps/variability on high settings | 40.6 / 4.4 | **43.1** / 3.8 | 42.1 / 3.9 | 37.6 / 3.2 |

The first metric shows how much time of the main thread is spent inside the portability library, compared to the total execution time (which is the actual time minus all the sleeping). It was measured with Time Profiler on a simple scene (see screenshot). Note that the submission is done on a separate thread by Dota2, so that time isn't taken into account here. Interestingly, the less time we spend on the main thread the faster our frame rate ends up being. Or, in other words, it's a race of who gets to the submission first :)

Expand All @@ -49,7 +49,7 @@ make dota-bench-orig # for MoltenVK bundled with Dota2
make dota-bench-gfx GFX_METAL_RECORDING=deferred # for gfx-portability with "Deferred" recording
```

#### Platforms
### Platforms

![macbook fleet](/img/macbook-fleet.jpg)

Expand All @@ -62,7 +62,7 @@ make dota-bench-gfx GFX_METAL_RECORDING=deferred # for gfx-portability with "Def
| OS | macOS 10.14 beta | macOS 10.14 beta | macOS 10.13.6 |
| resolution | 1440 x 900 | 1680 x 1050 | 1440 x 900 |

### Conclusions
## Conclusions

MoltenVK does a good job translating Vulkan to Metal. Interestingly, though, it can be slower than OpenGL on low settings. This doesn't match Phoronix nor Valve/MoltenVK's numbers. We suspect the difference to be explained by Phoronix using a 10x shorter run with pre-loaded pipeline caches. In this case GL would struggle creating all the new pipelines and will not have time to reach the full speed - not exactly an Apples to Apples comparison :)

Expand All @@ -72,7 +72,7 @@ Either way, our benchmarks show that OpenGL is still fairly good on MacOS, and i

We believe that "Immediate" command recording has great potential that hasn't yet been realized with Dota2 as it is architectured today with it's one big submission per frame, which increases latency on the already latency-limited program. Hopefully, we'll see more applications taking advantage of this in the future.

#### Rust
### Rust

Rust has proven itself viable in complex high-performance systems. We were able to build solid abstractions and hide the complexity behind tiny interfaces, while still being able to reason about and optimize the low-level performance. Iterating on large architectural changes was a breeze - we would just change one most important piece and then fix all the compile errors.

Expand Down

0 comments on commit 5c2013d

Please sign in to comment.