You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
我在一台 T4(16GB) x 4 服务器上部署推理,使用 gradio_demo.py 运行,会 out of memeory,运行不起来。
做了代码调整:
line 57: model = AutoModel.from_pretrained(model_path, attn_implementation='sdpa', trust_remote_code=True,torch_dtype=torch.bfloat16, device_map="auto")
line 58: # model = model.to(device=device)
可以跑起来,但是提问问题会报错:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0!
请教,有多卡推理的办法吗?还是小于16G的卡无法使用?
The text was updated successfully, but these errors were encountered:
We do not have a development machine with multiple GPUs, so this scenario has not been fully tested. I suspect that the issue may be due to the visual features and hyper attention layers that can be assigned to different devices by the 'auto' device mapping. If this is the case, manually cloning the visual features to the same device as the current layers might resolve the issue.
We do not have a development machine with multiple GPUs, so this scenario has not been fully tested. I suspect that the issue may be due to the visual features and hyper attention layers that can be assigned to different devices by the 'auto' device mapping. If this is the case, manually cloning the visual features to the same device as the current layers might resolve the issue.
Due to the limited memory of the GPU, it is not possible to run on a single GPU, and multiple GPUs must be required.Can you give us an example how to multiple GPUs inference?
我在一台 T4(16GB) x 4 服务器上部署推理,使用 gradio_demo.py 运行,会 out of memeory,运行不起来。
做了代码调整:
line 57:
model = AutoModel.from_pretrained(model_path, attn_implementation='sdpa', trust_remote_code=True,torch_dtype=torch.bfloat16, device_map="auto")
line 58:
# model = model.to(device=device)
可以跑起来,但是提问问题会报错:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0!
请教,有多卡推理的办法吗?还是小于16G的卡无法使用?
The text was updated successfully, but these errors were encountered: