-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When devicePlugin.passDeviceSpecsEnabled is set to true, GPU pod failed to start (failed to apply OCI options) #776
Comments
Hi @Rei1010 , To help us diagnose the issue, could you please: Check NVIDIA devices: Run Test PyTorch GPU access: With
This will help us understand the problem better. Thanks! |
Hi @Nimbus318 , Helm with disable passDeviceSpecsEnabled:
GPU Devics:
Pytorch:
Nvidia-smi:
Run CUDA vectorAdd
|
Thanks for your report! Could you please try setting passDeviceSpecsEnabled to true again and see if the issue can be reproduced? |
What happened:
Install HAMi helm with default options, GPU pod failed to run with error "Error: failed to generate container "180e8893b52ae58994b7db777ee79513fbf6555e10d49cb686f94e8f88666074" spec: failed to apply OCI options: lstat /dev/nvidiactl: no such file or directory"
(
HAMi/charts/hami/values.yaml
Line 139 in b8548c3
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Just install HAMi with latest code and it reproduced
Anything else we need to know?:
Test yaml:
Enable passDeviceSpecsEnabled
![image](https://private-user-images.githubusercontent.com/56469400/399908380-c0126727-193f-4c68-9e5d-0eb16adc8e2a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5NjA4MDUsIm5iZiI6MTczODk2MDUwNSwicGF0aCI6Ii81NjQ2OTQwMC8zOTk5MDgzODAtYzAxMjY3MjctMTkzZi00YzY4LTllNWQtMGViMTZhZGM4ZTJhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDIwMzUwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWU3NDBjNzA4NjNhNTQ4ZDJiODA5NjVlNDcxMDdmMDY1ZWY1ZTA1YzcyM2Q3OTllOTNkMWM4ZjJkOTI4NDk3NjImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.pjmlq2shIZB_k8CgoJVwqp3WI36A63VZxhcKdk06bcA)
Disable passDeviceSpecsEnabled
![image](https://private-user-images.githubusercontent.com/56469400/399905341-f5df5c43-7c40-4eb8-8a81-6606e78d7a64.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5NjA4MDUsIm5iZiI6MTczODk2MDUwNSwicGF0aCI6Ii81NjQ2OTQwMC8zOTk5MDUzNDEtZjVkZjVjNDMtN2M0MC00ZWI4LThhODEtNjYwNmU3OGQ3YTY0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDIwMzUwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFkZWJjMTdmMTY3YTgzZDA5MWU1YjRkNDhlN2E4OTk4MDkzNTA3ODljNjU4YTE1OWFlY2RmODQwNDc1NTk3ZjYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.B-YQAWvbJSlabrXNBtG2XqrcYhc54PmdkiEKqbSt234)
Related PR:
#690
Environment:
docker version
uname -a
The text was updated successfully, but these errors were encountered: