Vision_maskrcnn RuntimeError: roi_align_backward_kernel_xpu does not have a deterministic implementation #1264

mengfei25 · 2025-01-08T07:31:08Z

🐛 Describe the bug

python benchmarks/dynamo/torchbench.py --accuracy --float32 -d xpu -n10 --training  --only vision_maskrcnn --backend=inductor

xpu  train vision_maskrcnn                    
Traceback (most recent call last):
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2751, in validate_model
    self.model_iter_fn(model, example_inputs)
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 462, in forward_and_backward_pass
    self.grad_scaler.scale(loss).backward()
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/_tensor.py", line 648, in backward
    torch.autograd.backward(
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/autograd/__init__.py", line 347, in backward
    _engine_run_backward(
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/autograd/graph.py", line 823, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: roi_align_backward_kernel_xpu does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' option, if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 4886, in run
    ) = runner.load_model(
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 372, in load_model
    self.validate_model(model, example_inputs)
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2753, in validate_model
    raise RuntimeError("Eager run failed") from e
RuntimeError: Eager run failed

eager_fail_to_run

Versions

Envirnoments:
Device: PVC 1100
torch-xpu-ops: 18bcd9a
python: 3.10
TRITON_COMMIT_ID: e98b6fcb8df5b44eb0d0addb6767c573d37ba024
TORCH_COMMIT_ID: b9fbd65dfd5e703bacbc6c25258d1215108b4faf
TORCHBENCH_COMMIT_ID: 766a5e3a189384659fd35a68c3b17b88c761aaac
TORCHVISION_COMMIT_ID: d23a6e1664d20707c11781299611436e1f0c104f
TORCHAUDIO_COMMIT_ID: b6d4675c7aedc53ba04f3f55786aac1de32be6b4
DRIVER_VERSION: 1.23.10.49.231129.50 (803.61)
KERNEL_VERSION: 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023
BUNDLE_VERSION: 2025.0.1.20241113 (DL-Essential 2025.0.1)
OS_PRETTY_NAME: Ubuntu 22.04.2 LTS
GCC_VERSION: 11

The text was updated successfully, but these errors were encountered:

Last reference updated is 20240709 Related issues: - [x] #1216 - [x] #1217 - [x] #1219 - [x] #1220 - [ ] #1221 - [x] #1222 - [ ] #1256 - [ ] #1260 - [ ] #1261 - [ ] #1262 - [ ] #1263 - [ ] #1264 - [ ] #1273 - [ ] #1274 - [ ] #1275 - [ ] #1276 - [ ] #1277 - [ ] #1278 - [ ] #508 - [ ] #509 - [ ] #510

mengfei25 added E2E Accuracy torchbench training float32 labels Jan 8, 2025

This was referenced Jan 8, 2025

Vision_maskrcnn RuntimeError got diff tensor dtype #496

Closed

Update weekly accuracy reference #1223

Merged

frost-intel mentioned this issue Jan 15, 2025

Add RoI torchvision ops #1291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision_maskrcnn RuntimeError: roi_align_backward_kernel_xpu does not have a deterministic implementation #1264

Vision_maskrcnn RuntimeError: roi_align_backward_kernel_xpu does not have a deterministic implementation #1264

mengfei25 commented Jan 8, 2025

Vision_maskrcnn RuntimeError: roi_align_backward_kernel_xpu does not have a deterministic implementation #1264

Vision_maskrcnn RuntimeError: roi_align_backward_kernel_xpu does not have a deterministic implementation #1264

Comments

mengfei25 commented Jan 8, 2025

🐛 Describe the bug

Versions