Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add robustness to GPU shaders #537
Add robustness to GPU shaders #537
Changes from 1 commit
1e6cb2b
1fe82bb
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't related to this PR, but as far as I can tell, this is already guaranteed to be zeroed. If this is to work around a driver/naga bug, we should have a comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't realize that was a strong guarantee. In WebGPU world, it's probably worth skipping this explicit zeroing, but in native world it might be worth compiling with zeroing by infrastructure disabled, in which case we would need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - for us it's not impactful, but e.g. before #363 this would have mattered for the MSL conversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be reasonable to have
new_cmd=cmd_offset
here? I think that would avoid the UB - instead we'd just overwrite in the same location in each loopThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intriguing idea! However, that won't quite avoid UB, as cmd_offset will edge into the allocation following this one. Setting it to
cmd_limit - (PTCL_INCREMENT - PTCL_HEADROOM)
almost works, but only if it's not in its initial segment. I can't think of a good solution in that case, as we still ideally want the limit where it is so it accurately allocates as if there were enough memory, for the purposes of reporting the size back. My gut feeling is that if we were very concerned about the technical UB, we should in fact predicate the writes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, because the allocations are variably sized? I can't say I'm that happy about adding UB, but I do agree that it's unlikely to cause a problem in practise.
I wonder how bad the cost of writing to the same location is in terms of memory bandwidth/cache coherency?
I agree that this is probably fine as-is, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this represent? I would think this should be
STAGE_{BEFORE_COARSE}
, and then only if the check onseg_counts
failedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make the change, now that I'm thinking these flags will be interpreted by the client, not just to early-out downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm impressed that this works. Reading the specs suggest it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this works and it's the only way I'm aware of that allows you to "abort" this type of indirect dispatch (there are more sophisticated ways with bindless, see for example: https://developer.apple.com/documentation/metal/indirect_command_encoding/encoding_indirect_command_buffers_on_the_gpu?language=objc).
Interestingly, I couldn't find any explicit wording in the WebGPU, Metal, Vulkan, or D3D12 docs that this is the expected behavior but "0" falls within the accepted range for all of them. See also this past discussion: gpuweb/gpuweb#1045