Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds device options for MPS (Apple GPU) and XPU (Intel GPU), similarly to the addition of GPUs via CUDA.
In theory there are quite a few additional devices we could add (full list here / here), but these two are of most interest from discussions with @jatkinson1000.
I haven't been able to test the XPU device, but basic tests with MPS seem to suggest it's working as expected:
In example 2, resnet_infer_fortran, setting:
without changing the input tensor device throws an error:
Similarly, setting the input tensor device, but not the model
throws an error:
Setting both works and the expected output is produced:
I also see spikes in activity on my GPU (for the largest spikes, I added a loop around the example inference):
Note, when running 10,000 iterations of the inference, I got an error:
which might suggest a problem with cleanup.
I don't think this is specific to MPS, so might be worth checking on GPU too (you can reduce the CUDA memory to debug more easily, if it helps).