WIP: MVKCommandBuffer: Optimize management of command storage. #1678

js6i · 2022-08-11T14:41:32Z

Note: it's a large-ish change and probably would require some more clean up. For example, I think it no longer makes sense to have a separate MVKCmdRenderPassBase. Still, I'm submitting for overall review and suggestions.

The main motivation behind this commit is to reduce the time needed to free or
reset a command buffer. With the current scheme, we need to loop over all
recorded commands and return them to their pools. This can take a significant
amount of time in games, of the order of a millisecond.

With this change, instead of pooling individual commands by their type, we
instead keep chunks of memory in each command buffer, from which we bump
allocate memory for the commands to be recorded. MVKCommandVector is introduced
to use the same memory pool for dynamic arrays of data associated with the
commands.

Pointers to commands that actually have non-trivial destructors (i.e. they keep
references to other objects) are stored separately, so that we can call their
destructors when reseting the command buffer. All the memory simply gets
discarded by setting the current chunk's index and offset to zero.

As a side note, specific applications might opt to not retain pipeline layouts
in bind/push descriptor commands (introduced by 30113fc)
if it's known not to be needed, to further reduce time spent resetting command buffers.

cdavis5e

So much boilerplate... I'm amazed you had the patience to slog through all those code changes!

cdavis5e · 2022-08-12T07:12:59Z

MoltenVK/MoltenVK/Utility/MVKSmallVector.h

-  Allocator  alc;
-
+
 public:
+  Allocator  alc;
+


I wonder if you shouldn't add a constructor to MVKSmallVector<T> and its base to initialize the allocator. Then you wouldn't have to reach into the allocator in order to set the command buffer for the bump allocator.

Maybe, to me that seems to add complication for no clear reason, thus I'm leaving it for others to decide. Are you thinking doing something of the sort:
_bindings = MVKCommandVector(MVKCommandVectorAllocator<BlahBlah>(cmdBuff));
in setContent functions, or passing the command buffer to all MVKCommand constructors and putting
_bindings(MVKCommandVector(MVKCommandVectorAllocator<BlahBlah>(cmdBuff)))
in the inializer list, or something else?

@cdavis5e, @billhollings, any preferences on this? I'm largely indifferent, other than doing the first is less intrusive.

@js6i , I highly discourage this change to MVKSmallVector. The allocator should not be exposed at all and the overall design wasn't ment to be used with a specific constructor as well. The allocator type is a pure template argument.
I think the problem is, that MVKCommandVectorAllocator is already bad by design, it should not have the member cmdBuffer and it should not call mvkPushCommandMemory directly.
There are multiple ways of redesigning this, but since I am not really familiar with MoltenVK, I can't suggest a good solution right now, since I am afraid I will brake other things.
But please do not change MVKSmallVector with the above change, it will introduce all kind of issues in the future.

@aerofly, I don't see what's the argument you're making, other than saying it's bad. Setting the allocator with accompanying data for an expandable array is a very simple thing to do, and I don't understand how that's going to be a source of issues.

One thing that I'd probably be in favor of changing is removing the allocator as a template argument, and making a common interface for it, i.e. some sort of struct Allocator.

Sorry, I didn't mean to be rude here. What I tried to say is, that exposing the allocator and actually accessing allocator elements from another class is basically mixing separate concepts. The allocator as well as the actual vector should not be intertwined like this.

I have added MVKSmallVector to MoltenVK to have a std::vector compatible container that allows the easy use of pre allocated stack elements to be used, to avoid new/delete's.
The way it is used now would not be STL compatible.

Now for this special case, I recommend to write a special class that encapsulates everything that is intended here.

Regarding the first point, I still don't get it. If you want to use some allocator for an array or whatever, you gotta set it and its things. I don't see how that's mixing concepts. If I put the pointer to command buffer directly into the vector class, then I'd agree it would be mixing unrelated things.

I don't know if we need STL compatibility for anything. If we do, that's probably a valid argument.

That said, I also don't mind having dedicated code for this.

billhollings · 2022-08-12T14:50:17Z

Thanks for submitting this. I'll review this next week, after the SDK release this weekend.

italomandara · 2022-08-18T19:12:52Z

Bad news: compiled libMoltenVK.dylib from this branch via xcodebuild build -project MoltenVKPackaging.xcodeproj -scheme "MoltenVK Package (macOS only)" -configuration "Debug" -arch "x86_64", even though it works fine for most games, it causes one particular game to freeze randomly. I wasn't able to trace the error i followed the instructions to enable moltenvk logging but it doesn't work with this build.
I'd love to help but i need to understand why logging doesn't work on crossover. (any help appreciated)

js6i · 2022-08-19T12:54:03Z

Bad news: compiled libMoltenVK.dylib from this branch via xcodebuild build -project MoltenVKPackaging.xcodeproj -scheme "MoltenVK Package (macOS only)" -configuration "Debug" -arch "x86_64", even though it works fine for most games, it causes one particular game to freeze randomly. I wasn't able to trace the error i followed the instructions to enable moltenvk logging but it doesn't work with this build. I'd love to help but i need to understand why logging doesn't work on crossover. (any help appreciated)

What is the game? You can save logs in CrossOver with the "run with options" dialog, which I think should include MoltenVK errors by default, but you can also set MVK_CONFIG_LOG_LEVEL=3 environment variable. It may be useful to also set METAL_DEVICE_WRAPPER_TYPE=1 to enable Metal API validation.

italomandara · 2022-08-19T16:45:02Z

@js6i it worked, i was able to enable the logs, thanks.
Game: "Elite dangerous: Odyssey"

please find the log attached:
edody.7z.txt

js6i · 2022-08-25T12:18:34Z

@italomandara Do you use any workarounds to launch this game with CrossOver? I'm getting an error saying "Unhandled Exception: mono-io-layer-error" on startup.

italomandara · 2022-08-25T13:20:39Z

@italomandara Do you use any workarounds to launch this game with CrossOver? I'm getting an error saying "Unhandled Exception: mono-io-layer-error" on startup.

ok to install Elite dangerous you need to install microsoft .net framework 4.8
Also you need dxvk, it crashes immediately on wined3d (this time not because of this WIP changes)

phush0 · 2022-08-26T08:55:39Z

This WIP also crash Guild Wars 2. The easiest way to reproduce is fast travel. Upon appear on new spot, loading textures and assets will make game crash or freeze entirely.

js6i · 2022-08-26T13:07:48Z

@italomandara, how to trigger this freeze? Can I expect to hit it by playing for a few minutes through the tutorial?

phush0 · 2022-08-26T13:09:19Z

you have to unlock fast travel point and travel from somewhere to it that's it for me triggers every time

italomandara · 2022-08-26T13:16:40Z

@js6i the freeze is random in Elite dangerous odyssey, it frequently happens in the main menu, but also while playing, I'm using M1Max, can't confirm you get the same on x86 machines and discrete GPUs

phush0 · 2022-08-26T13:49:50Z

I am also with m1 max

js6i · 2022-09-15T09:29:13Z

@italomandara I think I'll need more info about your setup. On my M1 Air Elite Dangerous seems to work fine with this change. Which version of CrossOver are you using? Do you use the version of DXVK that ships with it? Did you change any potentially important registry settings or environment variables? Esync? High res/retina mode? Fullscreen, windowed fullscreen, other graphics settings in game? Unfortunately I don't have direct access to M1 Max specifically..

js6i · 2022-09-16T12:30:48Z

I didn't reproduce the freeze in GW2 either, but I spotted a possible problem. This PR is now updated and also rebased against current master. Feel free to retest..
@italomandara @phush0

phush0 · 2022-09-16T13:45:32Z

yes now it will not crash, also seems that helps on fps in group events. I have played with patch like hour - hour and a half and traveled several times in very populated zones where before will hang and is ok

italomandara · 2022-09-16T18:25:06Z

@js6i It works now i also experimented with -Ofast still no crashes at all.
Also, noticeable speed improvements.

phush0 · 2022-09-23T12:37:59Z

Sadly after prolonged gaming Guild Wars 2 hangs again, last time it stall whole laptop. I don't have such problems with current master.

edit: mistake is mine because I have forced MTLEvents for Rosetta if use single queue every think is working as expected

EDIT 2: It seems that problem is triggered if I use async version of DXVK with "DXVK_ASYNC" = "1" it bottle settings

italomandara · 2022-10-01T15:46:52Z

When I tested it all went well, i turned on async in dxvk but never forced MTLEvents, so this might narrow down the issue to MTLEvents?

phush0 · 2022-10-01T16:39:12Z

It happens with single queue also when async is enabled

js6i · 2022-10-05T11:36:46Z

Rebased on current master and some minor clean ups.

@phush0, are you getting any error messages from DXVK or MoltenVK? What about Metal runtime? (run with METAL_DEVICE_WRAPPER_TYPE=1) Isn't dxvk-async just compiling shaders in a thread? That in principle shouldn't care about how commands are stored in buffers.

phush0 · 2022-10-05T12:32:00Z

Will check and will report back. Basically display just hangs. Actually async will skip draw at first and introduce it later as it is compiled in background, so it is more a hack than solution. May be this skip will lead to crash I am not entirely sure.

Gcenx · 2022-10-05T17:52:24Z

I’d applied the first version of this on 1.11.1 and haven’t seen any issue with Witcher3.

Tested in the training grounds where you’d usually get a lockup when hit (need to have MVK_CONFIG_RESUME_LOST_DEVICE=1 set) completed it without problem.

edit:
Also tested this on master and it's still working well for me, needed to also pull-in #1735

Also tested on Lego Starwars the skywalker saga still no issues both Witcher3 and this have async enabled.

CLAassistant · 2022-10-06T09:47:46Z

All committers have signed the CLA.

Gcenx · 2022-10-09T21:21:11Z

@js6i any reason to still consider this WIP?

italomandara · 2022-10-10T20:00:27Z

I tested this on my entire game library and it works

phush0 · 2022-10-12T06:12:23Z

with latest version I don't have hangs also

Gcenx · 2022-10-12T19:06:20Z

While it may be working for everyone it’s possible that @js6i still considering this wip for some issue they’ve encountered, that’s why I was asking.

oscarbg · 2022-10-13T21:25:31Z

sadly seems will miss the next Vulkan SDK feature cutoff (#1736)
still interesting if can be merged soon after, seeing it works now..

Gcenx · 2022-10-13T21:46:39Z

@oscarbg agreed but we don’t know if there’s a reason it’s still marked as WIP

Once the next release is marked I’ll provide the usual stock package along with another that includes this PR and a coupe of other interesting patches/hacks.

oscarbg · 2022-10-13T22:18:58Z

@Gcenx nice.. it's one of these additional patches the UE4 black screen fix: https://github.com/nastys/MoltenVK/releases/tag/ue4-workaround-1?
I just opened an issue (#1741) asking if this workaround makes sense to be integrated into MoltenVK and enabled via some env variable given it fixes lots of games on MacOS..

phush0 · 2022-10-19T12:13:48Z

I still have hangs in Guild Wars 2 even with the last patch. This is what I have in log as errors

[mvk-error] VK_ERROR_INVALID_SHADER_NV: Unable to convert SPIR-V to MSL:
MSL conversion error: Vertex attribute type mismatch between host and shader


[mvk-error] VK_ERROR_INVALID_SHADER_NV: Vertex shader function could not be compiled into pipeline. See previous logged error.
err:   DxvkGraphicsPipeline: Failed to compile pipeline
err:     vs  : VS_7628e2c6c3fd5510b370e8166966c9f45137cad7
err:     fs  : FS_6dbf94ee014a451d8ba108e51e15e38b602ce07a
err:     attr 0 : location 0, binding 0, format VK_FORMAT_R32G32B32_SFLOAT, offset 0
err:     attr 1 : location 1, binding 0, format VK_FORMAT_R32G32B32_SFLOAT, offset 12
err:     attr 2 : location 2, binding 0, format VK_FORMAT_R8G8B8A8_UINT, offset 0
err:     attr 3 : location 3, binding 0, format VK_FORMAT_R16G16_SFLOAT, offset 48
err:     binding 0 : binding 0, stride 52, rate 0, divisor 0
[mvk-error] VK_ERROR_INVALID_SHADER_NV: Unable to convert SPIR-V to MSL:
MSL conversion error: Vertex attribute type mismatch between host and shader


[mvk-error] VK_ERROR_INVALID_SHADER_NV: Vertex shader function could not be compiled into pipeline. See previous logged error.
err:   DxvkGraphicsPipeline: Failed to compile pipeline
err:     vs  : VS_f89fed9e2085bf1ab8b74645384b6177cf16fa67
err:     fs  : FS_847fb2fbf55b3d81b0d05c9786d0576f485e7be9
err:     attr 0 : location 0, binding 0, format VK_FORMAT_R32G32B32_SFLOAT, offset 0
err:     attr 1 : location 1, binding 0, format VK_FORMAT_R32G32B32_SFLOAT, offset 12
err:     attr 2 : location 2, binding 0, format VK_FORMAT_R8G8B8A8_UINT, offset 0
err:     attr 3 : location 3, binding 0, format VK_FORMAT_R32G32_SFLOAT, offset 24
err:     attr 4 : location 7, binding 1, format VK_FORMAT_R32G32B32A32_SFLOAT, offset 48
err:     attr 5 : location 6, binding 1, format VK_FORMAT_R32G32B32A32_SFLOAT, offset 32
err:     attr 6 : location 5, binding 1, format VK_FORMAT_R32G32B32A32_SFLOAT, offset 16
err:     attr 7 : location 4, binding 1, format VK_FORMAT_R32G32B32A32_SFLOAT, offset 0
err:     binding 0 : binding 0, stride 32, rate 0, divisor 0
err:     binding 1 : binding 1, stride 64, rate 1, divisor 1
[mvk-error] VK_ERROR_INVALID_SHADER_NV: Unable to convert SPIR-V to MSL:
MSL conversion error: Vertex attribute type mismatch between host and shader

after several such hangs of the app I reach the point where whole laptop freeze and reboots itself

everything other is info - create this create that nothing more, no errors logged or smth

Gcenx · 2022-10-19T12:19:47Z

If your using CrossOver-22.0.1 did you use the shipped DXVK or replace it with a third-party build?

phush0 · 2022-10-19T12:22:34Z

I am using latest build from your page and yes it is CrossOver-22.0.1

Gcenx · 2022-10-19T12:46:28Z

I am using latest build from your page and yes it is CrossOver-22.0.1

The latest DXVK-macOS build would be 1.10.3 that includes DXVK-CX hacks (that are still relevant) along with DXVK-async that’s only auto enabled on select titles.

phush0 · 2022-10-19T13:05:02Z

I don't use async for guild wars. Easiest way to reproduce e port to big city from somewhere. Loading objects will lead to hang.

italomandara · 2022-10-20T11:21:40Z

That seems a dxvk error not a moltenvk one... still strange that you don't get errors with the stock MoltenVK tho...

phush0 · 2022-10-20T12:32:41Z

this don't change the fact that Guild Wars hangs with this pr and not without it

phush0 · 2022-10-27T06:15:01Z

-[MTLDebugBlitCommandEncoder dealloc]:89: failed assertion `Command encoder released without endEncoding.'
0174:02fc:trace:seh:dispatch_exception code=c000001d flags=0 addr=00007FF8004E0D5D ip=00007FF8004E0D5D tid=02fc
code=c000001d flags=0 addr=0x689d768a ip=689d768a tid=02fc

This is what I found when game hang. I can upload detailed log just it is 170 MB

last shader before hang:

Estimated original GLSL:
#version 450

layout(constant_id = 1) const bool cb0_bound = true;
layout(constant_id = 10) const bool t0_bound = true;
layout(constant_id = 11) const bool t1_bound = true;
layout(constant_id = 12) const bool t2_bound = true;
layout(constant_id = 13) const bool t3_bound = true;
layout(constant_id = 14) const bool t4_bound = true;
layout(constant_id = 15) const bool t5_bound = true;
layout(constant_id = 16) const bool t6_bound = true;
layout(constant_id = 17) const bool t12_bound = true;
layout(constant_id = 1216) const uint omap0 = 12816u;

layout(set = 0, binding = 1, std140) uniform cb0_t
{
    vec4 m[21];
} cb0;

layout(set = 0, binding = 2) uniform sampler s0;
layout(set = 0, binding = 3) uniform sampler s1;
layout(set = 0, binding = 4) uniform sampler s2;
layout(set = 0, binding = 5) uniform sampler s3;
layout(set = 0, binding = 6) uniform sampler s4;
layout(set = 0, binding = 7) uniform sampler s5;
layout(set = 0, binding = 8) uniform sampler s6;
layout(set = 0, binding = 9) uniform sampler s12;
layout(set = 0, binding = 10) uniform texture2D t0;
layout(set = 0, binding = 11) uniform texture2D t1;
layout(set = 0, binding = 12) uniform texture2D t2;
layout(set = 0, binding = 13) uniform texture2D t3;
layout(set = 0, binding = 14) uniform texture2D t4;
layout(set = 0, binding = 15) uniform texture2D t5;
layout(set = 0, binding = 16) uniform texture2D t6;
layout(set = 0, binding = 17) uniform texture2D t12;

layout(location = 0) in vec4 v0;
layout(location = 1) in vec4 v1;
layout(location = 2) in vec4 v2;
layout(location = 3) in vec3 v3;
layout(location = 4) in vec4 v4;
layout(location = 0, index = 0) out vec4 o0;
vec4 shader_in[6];
vec4 r0;
vec4 r1;
vec4 r2;
vec4 r3;

void ps_main()
{
    vec2 _80 = vec2(cb0.m[1].x) * vec2(cb0.m[12].y, cb0.m[12].x);
    r0 = vec4(_80.x, _80.y, r0.z, r0.w);
    vec2 _87 = fract(r0.xy);
    r0 = vec4(_87.x, _87.y, r0.z, r0.w);
    vec4 _97 = r0;
    _97.x = fma(r0.x, uintBitsToFloat(1073741824u), uintBitsToFloat(3212836864u));
    r0 = _97;
    vec4 _109 = r0;
    _109.z = (-cb0.m[12].z) + cb0.m[12].w;
    r0 = _109;
    vec4 _120 = r0;
    _120.x = fma(abs(r0.x), r0.z, cb0.m[12].z);
    r0 = _120;
    vec4 _131 = r1;
    _131.z = fma(shader_in[0].y, r0.x, r0.y);
    r1 = _131;
    vec4 _139 = r1;
    _139.x = r0.x * shader_in[0].x;
    r1 = _139;
    vec2 _154 = mix(vec2(0.0), texture(sampler2D(t2, s2), r1.xzxx.xy).xy, bvec2(t2_bound, t2_bound));
    r0 = vec4(_154.x, _154.y, r0.z, r0.w);
    vec2 _164 = fma(r0.xy, uintBitsToFloat(uvec2(1073741824u)), uintBitsToFloat(uvec2(3212836864u)));
    r0 = vec4(_164.x, _164.y, r0.z, r0.w);
    vec2 _178 = vec2(cb0.m[1].x) * vec2(cb0.m[13].y, cb0.m[13].x);
    r0 = vec4(r0.x, r0.y, _178.x, _178.y);
    vec2 _183 = fract(r0.zw);
    r0 = vec4(r0.x, r0.y, _183.x, _183.y);
    vec4 _191 = r0;
    _191.z = fma(r0.z, uintBitsToFloat(1073741824u), uintBitsToFloat(3212836864u));
    r0 = _191;
    vec4 _201 = r1;
    _201.x = (-cb0.m[13].z) + cb0.m[13].w;
    r1 = _201;
    vec4 _212 = r0;
    _212.z = fma(abs(r0.z), r1.x, cb0.m[13].z);
    r0 = _212;
    vec4 _222 = r1;
    _222.z = fma(shader_in[0].y, r0.z, r0.w);
    r1 = _222;
    vec4 _230 = r1;
    _230.x = r0.z * shader_in[0].x;
    r1 = _230;
    vec2 _241 = mix(vec2(0.0), texture(sampler2D(t3, s3), r1.xzxx.xy).xy, bvec2(t3_bound, t3_bound));
    r0 = vec4(r0.x, r0.y, _241.x, _241.y);
    vec2 _248 = fma(r0.zw, uintBitsToFloat(uvec2(1073741824u)), uintBitsToFloat(uvec2(3212836864u)));
    r0 = vec4(r0.x, r0.y, _248.x, _248.y);
    vec2 _255 = r0.zw + r0.xy;
    r0 = vec4(_255.x, _255.y, r0.z, r0.w);
    vec2 _265 = r0.xy * vec2(cb0.m[14].x);
    r0 = vec4(r0.x, r0.y, _265.x, _265.y);
    vec2 _279 = vec2(cb0.m[1].x) * vec2(cb0.m[10].y, cb0.m[10].x);
    r1 = vec4(_279.x, _279.y, r1.z, r1.w);
    vec2 _284 = fract(r1.xy);
    r1 = vec4(_284.x, _284.y, r1.z, r1.w);
    vec4 _292 = r1;
    _292.x = fma(r1.x, uintBitsToFloat(1073741824u), uintBitsToFloat(3212836864u));
    r1 = _292;
    vec4 _302 = r1;
    _302.z = (-cb0.m[10].z) + cb0.m[10].w;
    r1 = _302;
    vec4 _313 = r1;
    _313.x = fma(abs(r1.x), r1.z, cb0.m[10].z);
    r1 = _313;
    vec4 _324 = r2;
    _324.z = fma(shader_in[0].y, r1.x, r1.y);
    r2 = _324;
    vec4 _332 = r2;
    _332.x = r1.x * shader_in[0].x;
    r2 = _332;
    vec4 _344 = r1;
    _344.x = fma(cb0.m[11].x, cb0.m[1].x, uintBitsToFloat(3204448256u));
    r1 = _344;
    vec4 _349 = r1;
    _349.x = fract(r1.x);
    r1 = _349;
    vec2 _357 = fma(r0.zw, r1.xx, r2.xz);
    r1 = vec4(r1.x, _357.x, _357.y, r1.w);
    vec4 _369 = r1;
    _369.y = t1_bound ? texture(sampler2D(t1, s1), r1.yzyy.xy).y : 0.0;
    r1 = _369;
    vec4 _378 = r1;
    _378.z = cb0.m[1].x * cb0.m[11].x;
    r1 = _378;
    vec4 _383 = r1;
    _383.z = fract(r1.z);
    r1 = _383;
    vec2 _391 = fma(r1.zz, r0.zw, r2.xz);
    r0 = vec4(r0.x, r0.y, _391.x, _391.y);
    vec4 _403 = r0;
    _403.z = t1_bound ? texture(sampler2D(t1, s1), r0.zwzz.xy).y : 0.0;
    r0 = _403;
    vec4 _411 = r0;
    _411.w = (-r0.z) + r1.y;
    r0 = _411;
    vec4 _418 = r1;
    _418.y = fma(r1.z, uintBitsToFloat(1073741824u), uintBitsToFloat(3212836864u));
    r1 = _418;
    vec4 _428 = r0;
    _428.z = fma(abs(r1.y), r0.w, r0.z);
    r0 = _428;
    vec3 _446 = mix(vec3(0.0), texture(sampler2D(t6, s6), vec4(cb0.m[4].x, cb0.m[4].y, cb0.m[4].x, cb0.m[4].x).xy).xyz, bvec3(t6_bound, t6_bound, t6_bound));
    r2 = vec4(_446.x, _446.y, _446.z, r2.w);
    vec3 _461 = fma(r2.xyz, vec3(cb0.m[5].x), vec3(cb0.m[6].x));
    r3 = vec4(_461.x, _461.y, _461.z, r3.w);
    vec4 _474 = r0;
    _474.w = dot(r2.xyz, uintBitsToFloat(uvec3(1050253722u, 1058474557u, 1038174126u)));
    r0 = _474;
    vec4 _480 = r1;
    _480.w = dot(r3.xyz, uintBitsToFloat(uvec3(1050253722u, 1058474557u, 1038174126u)));
    r1 = _480;
    vec3 _487 = (-r1.www) + r3.xyz;
    r2 = vec4(_487.x, _487.y, _487.z, r2.w);
    vec3 _499 = fma(vec3(cb0.m[7].x), r2.xyz, r1.www);
    r2 = vec4(_499.x, _499.y, _499.z, r2.w);
    vec3 _506 = r0.www * r2.xyz;
    r2 = vec4(_506.x, _506.y, _506.z, r2.w);
    vec3 _520 = r0.www * vec3(cb0.m[8].x, cb0.m[8].y, cb0.m[8].z);
    r3 = vec4(_520.x, _520.y, _520.z, r3.w);
    vec3 _527 = r3.xyz + r3.xyz;
    r3 = vec4(_527.x, _527.y, _527.z, r3.w);
    vec3 _534 = r2.xyz + r2.xyz;
    r2 = vec4(_534.x, _534.y, _534.z, r2.w);
    vec3 _541 = mix(mix(max(r3.xyz, r2.xyz), r2.xyz, isnan(r3.xyz)), r3.xyz, isnan(r2.xyz));
    r3 = vec4(_541.x, _541.y, _541.z, r3.w);
    vec3 _558 = fma(r2.xyz, vec3(cb0.m[9].x, cb0.m[9].y, cb0.m[9].z), -r3.xyz);
    r2 = vec4(_558.x, _558.y, _558.z, r2.w);
    vec3 _567 = fma(r0.zzz, r2.xyz, r3.xyz);
    r2 = vec4(_567.x, _567.y, _567.z, r2.w);
    vec4 _575 = r0;
    _575.z = r0.z + uintBitsToFloat(1028443341u);
    r0 = _575;
    float _581 = r0.z * uintBitsToFloat(1073892526u);
    float _992 = isnan(0.0) ? _581 : (isnan(_581) ? 0.0 : max(_581, 0.0));
    vec4 _584 = r0;
    _584.z = isnan(1.0) ? _992 : (isnan(_992) ? 1.0 : min(_992, 1.0));
    r0 = _584;
    vec4 _593 = r0;
    _593.w = fma(r0.z, uintBitsToFloat(3221225472u), uintBitsToFloat(1077936128u));
    r0 = _593;
    vec4 _600 = r0;
    _600.z = r0.z * r0.z;
    r0 = _600;
    vec4 _607 = r3;
    _607.w = r0.z * r0.w;
    r3 = _607;
    vec3 _613 = r2.xyz * r3.www;
    r2 = vec4(_613.x, _613.y, _613.z, r2.w);
    vec3 _621 = r2.xyz * uintBitsToFloat(uvec3(1072064102u));
    r3 = vec4(_621.x, _621.y, _621.z, r3.w);
    r2 = r3 * vec4(cb0.m[15].x);
    vec4 _641 = r0;
    _641.z = t4_bound ? texture(sampler2D(t4, s4), shader_in[1].xyxx.xy).x : 0.0;
    r0 = _641;
    vec4 _652 = r0;
    _652.w = (-cb0.m[16].x) + cb0.m[16].y;
    r0 = _652;
    vec4 _662 = r0;
    _662.z = fma(r0.z, r0.w, cb0.m[16].x);
    r0 = _662;
    vec2 _668 = r0.zz * r0.xy;
    r0 = vec4(_668.x, _668.y, r0.z, r0.w);
    vec2 _678 = r0.xy * vec2(cb0.m[17].x);
    r0 = vec4(_678.x, _678.y, r0.z, r0.w);
    vec2 _688 = fma(r0.xy, r1.xx, shader_in[0].zw);
    r0 = vec4(r0.x, r0.y, _688.x, _688.y);
    vec2 _698 = fma(r0.xy, r1.zz, shader_in[0].zw);
    r0 = vec4(_698.x, _698.y, r0.z, r0.w);
    vec4 _710 = r0;
    _710.x = t0_bound ? texture(sampler2D(t0, s0), r0.xyxx.xy).y : 0.0;
    r0 = _710;
    vec4 _721 = r0;
    _721.y = t0_bound ? texture(sampler2D(t0, s0), r0.zwzz.xy).y : 0.0;
    r0 = _721;
    vec4 _729 = r0;
    _729.y = (-r0.x) + r0.y;
    r0 = _729;
    float _738 = fma(abs(r1.y), r0.y, r0.x);
    float _1003 = isnan(0.0) ? _738 : (isnan(_738) ? 0.0 : max(_738, 0.0));
    vec4 _740 = r0;
    _740.x = isnan(1.0) ? _1003 : (isnan(_1003) ? 1.0 : min(_1003, 1.0));
    r0 = _740;
    r0 = r0.xxxx * r2;
    vec4 _755 = r1;
    _755.x = dot(shader_in[3].xyz, shader_in[2].yzw);
    r1 = _755;
    float _759 = abs(r1.x);
    float _761 = uintBitsToFloat(1065353216u);
    vec4 _763 = r1;
    _763.x = isnan(_761) ? _759 : (isnan(_759) ? _761 : min(_759, _761));
    r1 = _763;
    vec4 _770 = r1;
    _770.x = (-r1.x) + uintBitsToFloat(1065353216u);
    r1 = _770;
    vec4 _775 = r1;
    _775.x = log2(r1.x);
    r1 = _775;
    vec4 _784 = r1;
    _784.x = r1.x * cb0.m[18].x;
    r1 = _784;
    vec4 _789 = r1;
    _789.x = exp2(r1.x);
    r1 = _789;
    vec4 _796 = r1;
    _796.x = (-r1.x) + uintBitsToFloat(1065353216u);
    r1 = _796;
    float _800 = uintBitsToFloat(0u);
    vec4 _802 = r1;
    _802.x = isnan(_800) ? r1.x : (isnan(r1.x) ? _800 : max(r1.x, _800));
    r1 = _802;
    r0 *= r1.xxxx;
    vec2 _815 = shader_in[5].xy + vec2(cb0.m[3].x);
    r1 = vec4(_815.x, _815.y, r1.z, r1.w);
    vec2 _826 = r1.xy * vec2(cb0.m[0].x, cb0.m[0].y);
    r1 = vec4(_826.x, _826.y, r1.z, r1.w);
    vec4 _838 = r1;
    _838.x = t12_bound ? texture(sampler2D(t12, s12), r1.xyxx.xy).x : 0.0;
    r1 = _838;
    vec4 _850 = r1;
    _850.x = fma(r1.x, cb0.m[0].w, -cb0.m[0].z);
    r1 = _850;
    vec4 _859 = r1;
    _859.x = r1.x + (-shader_in[2].x);
    r1 = _859;
    float _867 = r1.x / cb0.m[19].x;
    float _1024 = isnan(0.0) ? _867 : (isnan(_867) ? 0.0 : max(_867, 0.0));
    vec4 _869 = r1;
    _869.x = isnan(1.0) ? _1024 : (isnan(_1024) ? 1.0 : min(_1024, 1.0));
    r1 = _869;
    vec4 _876 = r1;
    _876.y = fma(r1.x, uintBitsToFloat(3221225472u), uintBitsToFloat(1077936128u));
    r1 = _876;
    vec4 _883 = r1;
    _883.x = r1.x * r1.x;
    r1 = _883;
    vec4 _890 = r1;
    _890.x = r1.x * r1.y;
    r1 = _890;
    r0 *= r1.xxxx;
    vec4 _906 = r1;
    _906.x = t5_bound ? texture(sampler2D(t5, s5), shader_in[1].zwzz.xy).x : 0.0;
    r1 = _906;
    r0 *= r1.xxxx;
    r0 *= vec4(cb0.m[20].x);
    vec3 _929 = r0.xyz * vec3(cb0.m[2].x, cb0.m[2].y, cb0.m[2].z);
    r0 = vec4(_929.x, _929.y, _929.z, r0.w);
    r0 *= vec4(cb0.m[2].w);
    vec4 _944 = r1;
    _944.x = (-shader_in[4].w) + uintBitsToFloat(1065353216u);
    r1 = _944;
    o0 = r0 * r1.xxxx;
}

void main()
{
    shader_in[0] = v0;
    shader_in[1] = v1;
    shader_in[2] = v2;
    shader_in[3] = vec4(v3.x, v3.y, v3.z, shader_in[3].w);
    shader_in[4] = v4;
    vec4 _964 = gl_FragCoord;
    _964.w = 1.0 / _964.w;
    shader_in[5] = vec4(_964.xy.x, _964.xy.y, shader_in[5].z, shader_in[5].w);
    ps_main();
    o0 = vec4(o0[bitfieldExtract(omap0, int(0u), int(4u))], o0[bitfieldExtract(omap0, int(4u), int(4u))], o0[bitfieldExtract(omap0, int(8u), int(4u))], o0[bitfieldExtract(omap0, int(12u), int(4u))]);
}


End GLSL


[mvk-info] Compiling Metal shader with FastMath enabled.

Gcenx · 2022-10-31T13:45:42Z

@phush0 I noticed at the end of the log [mvk-info] Compiling Metal shader with FastMath enabled. have you tried disabling FastMath?, on discord nas had shown some titles having graphical issues with this enabled so maybe that's a contributing factor.

phush0 · 2022-10-31T13:48:33Z

@phush0 I noticed at the end of the log [mvk-info] Compiling Metal shader with FastMath enabled. have you tried disabling FastMath?, on discord nas had shown some titles having graphical issues with this enabled so maybe that's a contributing factor.

yes no change same assert and hang

Gcenx · 2022-12-10T15:43:05Z

@js6i could you please rebase this

marzent · 2022-12-12T10:41:49Z

Took the liberty of rebasing here https://github.com/marzent/MoltenVK/tree/cmdbuffer-storage2 if anyone needs it

The main motivation behind this commit is to reduce the time needed to free or reset a command buffer. With the current scheme, we need to loop over all recorded commands and return them to their pools. This can take a significant amount of time in games, of the order of a millisecond. With this change, instead of pooling individual commands by their type, we instead keep chunks of memory in each command buffer, from which we bump allocate memory for the commands to be recorded. MVKCommandVector is introduced to use the same memory pool for dynamic arrays of data associated with the commands. Pointers to commands that actually have non-trivial destructors (i.e. they keep references to other objects) are stored separately, so that we can call their destructors when reseting the command buffer. All the memory simply gets discarded by setting the current chunk's index and offset to zero. As a side note, specific applications might opt to not retain pipeline layouts in bind/push descriptor commands (introduced by 30113fc) if it's known not to be needed, to further reduce time spent resetting command buffers.

italomandara · 2022-12-14T00:27:40Z

@js6i this is not from your code changes but maybe your changes revealed some existing issues
in MVKCmdTransfer.mm this line and this line i can see [mtlRendEnc popDebugGroup] comes before [mtlRendEnc endEncoding]
now if the error is failed assertion 'Command encoder released without endEncoding.' could it be because [mtlRendEnc endEncoding] should be called before releasing the command encoder instead?
Correct me if I'm wrong.

cdavis5e · 2022-12-14T01:03:44Z

could it be because [mtlRendEnc endEncoding] should be called before releasing the command encoder instead?

No. [mtlRendEnc popDebugGroup] doesn't release the command encoder. It undoes the effect of a prior -pushDebugGroup:.

italomandara · 2022-12-14T01:29:51Z

could it be because [mtlRendEnc endEncoding] should be called before releasing the command encoder instead?

No. [mtlRendEnc popDebugGroup] doesn't release the command encoder. It undoes the effect of a prior -pushDebugGroup:.

right it's even done in the same order in the apple docs, sorry.

Gcenx · 2022-12-14T15:54:54Z

@js6i is there and issues you’ve encountered when using this that’s caused it to be still marked as WIP?

Gcenx · 2023-03-23T20:46:55Z

@js6i could you rebase this PR to master, also is there any reason this is sitting in limbo?

KhronosGroup#1678

js6i · 2023-05-19T13:17:59Z

I'll most likely rebase on a newer version soon, the problem is that I haven't been able to reproduce the issue @phush0 was getting, and understandably @billhollings is not eager to merge regressive changes. Perhaps it's something that will go away on its own after rebasing.

Gcenx · 2023-05-19T13:46:19Z

@js6i here is an easy example of a game crashing quickly with this applied.

Jedi: Fallen Order, fails to load a save without the game will load the save though the game will eventually crash out when remembering an ability (first one being wall running) when using DXVK not the case when using WineD3D Vulkan though it’s unplayable with that. The issue is probably with spirv-cross.

Though the above may have been resolved via 75b4d26 as something fails to compile causing the pipeline to get locked up.

I suspect this PR was just uncovering these issues much sooner than without.

phush0 · 2023-05-19T18:46:11Z

Just rebase will test again

cdavis5e requested changes Aug 12, 2022

View reviewed changes

billhollings changed the title ~~MVKCommandBuffer: Optimize management of command storage.~~ WIP: MVKCommandBuffer: Optimize management of command storage. Aug 18, 2022

js6i force-pushed the cmdbuffer-storage branch from 05965c7 to daf1323 Compare September 16, 2022 12:27

js6i force-pushed the cmdbuffer-storage branch from daf1323 to 4042693 Compare October 5, 2022 11:33

js6i force-pushed the cmdbuffer-storage branch from f9674d3 to b272684 Compare October 6, 2022 09:54

js6i added 4 commits December 13, 2022 16:21

MVKCmdRenderPass: Removed MVKCmdBeginRenderPassBase.

5556f89

MVKCommandBuffer: Add destroyOnReset() wrapper method. Explain.

4bc8192

MVKCommandPool: Release unused command buffers in trim().

1ba86d9

js6i force-pushed the cmdbuffer-storage branch from b2edd30 to 1ba86d9 Compare December 13, 2022 15:24

Gcenx pushed a commit to Kegworks-App/MoltenVK that referenced this pull request Mar 24, 2023

MVKCommandBuffer: Optimize management of command storage.

cc5e36e

KhronosGroup#1678

WIP: MVKCommandBuffer: Optimize management of command storage. #1678

Are you sure you want to change the base?

WIP: MVKCommandBuffer: Optimize management of command storage. #1678

Conversation

js6i commented Aug 11, 2022

cdavis5e left a comment

Choose a reason for hiding this comment

cdavis5e Aug 12, 2022

Choose a reason for hiding this comment

js6i Sep 16, 2022

Choose a reason for hiding this comment

js6i Oct 14, 2022

Choose a reason for hiding this comment

aerofly Oct 14, 2022

Choose a reason for hiding this comment

js6i Oct 14, 2022 • edited Loading

Choose a reason for hiding this comment

aerofly Oct 14, 2022

Choose a reason for hiding this comment

js6i Oct 14, 2022

Choose a reason for hiding this comment

billhollings commented Aug 12, 2022

italomandara commented Aug 18, 2022 • edited Loading

js6i commented Aug 19, 2022

italomandara commented Aug 19, 2022 • edited Loading

js6i commented Aug 25, 2022

italomandara commented Aug 25, 2022 • edited Loading

phush0 commented Aug 26, 2022

js6i commented Aug 26, 2022

phush0 commented Aug 26, 2022

italomandara commented Aug 26, 2022 • edited Loading

phush0 commented Aug 26, 2022

js6i commented Sep 15, 2022

js6i commented Sep 16, 2022

phush0 commented Sep 16, 2022

italomandara commented Sep 16, 2022

phush0 commented Sep 23, 2022 • edited Loading

italomandara commented Oct 1, 2022

phush0 commented Oct 1, 2022

js6i commented Oct 5, 2022

phush0 commented Oct 5, 2022

Gcenx commented Oct 5, 2022 • edited Loading

CLAassistant commented Oct 6, 2022 • edited Loading

Gcenx commented Oct 9, 2022

italomandara commented Oct 10, 2022

phush0 commented Oct 12, 2022

Gcenx commented Oct 12, 2022

oscarbg commented Oct 13, 2022

Gcenx commented Oct 13, 2022

oscarbg commented Oct 13, 2022

phush0 commented Oct 19, 2022 • edited Loading

Gcenx commented Oct 19, 2022

phush0 commented Oct 19, 2022 • edited Loading

Gcenx commented Oct 19, 2022

phush0 commented Oct 19, 2022

italomandara commented Oct 20, 2022 • edited Loading

phush0 commented Oct 20, 2022

phush0 commented Oct 27, 2022 • edited Loading

Gcenx commented Oct 31, 2022

phush0 commented Oct 31, 2022

Gcenx commented Dec 10, 2022

marzent commented Dec 12, 2022

italomandara commented Dec 14, 2022 • edited Loading

cdavis5e commented Dec 14, 2022

italomandara commented Dec 14, 2022

Gcenx commented Dec 14, 2022

Gcenx commented Mar 23, 2023

js6i commented May 19, 2023

Gcenx commented May 19, 2023

phush0 commented May 19, 2023

js6i Oct 14, 2022 •

edited

Loading

italomandara commented Aug 18, 2022 •

edited

Loading

italomandara commented Aug 19, 2022 •

edited

Loading

italomandara commented Aug 25, 2022 •

edited

Loading

italomandara commented Aug 26, 2022 •

edited

Loading

phush0 commented Sep 23, 2022 •

edited

Loading

Gcenx commented Oct 5, 2022 •

edited

Loading

CLAassistant commented Oct 6, 2022 •

edited

Loading

phush0 commented Oct 19, 2022 •

edited

Loading

phush0 commented Oct 19, 2022 •

edited

Loading

italomandara commented Oct 20, 2022 •

edited

Loading

phush0 commented Oct 27, 2022 •

edited

Loading

italomandara commented Dec 14, 2022 •

edited

Loading