Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: Error response from daemon: unknown or invalid runtime name: nvidia #7

Closed
mahmoodn opened this issue Jun 24, 2024 · 1 comment

Comments

@mahmoodn
Copy link

mahmoodn commented Jun 24, 2024

With CUDA-11.8 on Ubuntu 22.04 and RTX 3080 device, I tried to run make prebuild inference_v4 in NVIDIA folder. After about two hours of compilation, it ended with an error which I don't understand the message:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

$ nvidia-smi 
Mon Jun 24 13:46:50 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+

$ docker images
REPOSITORY                               TAG                                                       IMAGE ID       CREATED         SIZE
mlperf-inference                         mahmood-x86_64                                           3ef82bce4e33   4 minutes ago   46.5GB
mlperf-inference                         mahmood-x86_64-latest                                    6539c6892f0d   5 minutes ago   46.5GB
nvcr.io/nvidia/mlperf/mlperf-inference   mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public   34b056f25fae   4 months ago    14.5GB

$ dpkg -l | grep docker
ii  docker                                     1.5-2                                   all          transitional package
ii  docker-buildx                              0.12.1-0ubuntu1~22.04.1                 amd64        Docker CLI plugin for extended build capabilities with BuildKit
ii  docker.io                                  24.0.7-0ubuntu2~22.04.1                 amd64        Linux container runtime
ii  wmdocker                                   1.5-2                                   amd64        System tray for KDE3/GNOME2 docklet applications

$ dpkg -l | grep nvidia
ii  libnvidia-container-tools                  1.15.0-1                                amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                 1.15.0-1                                amd64        NVIDIA container runtime library
ii  nvidia-container-toolkit                   1.15.0-1                                amd64        NVIDIA Container toolkit
ii  nvidia-container-toolkit-base              1.15.0-1                                amd64        NVIDIA Container Toolkit Base


The output is:

 => [37/40] RUN mkdir -p /opt/fp8/faster-transformer-bert-fp8-weights-scales/     && tar -zxvf /tmp/faster-transformer-bert-f  8.4s 
 => [38/40] RUN apt install -y libgl1-mesa-glx                                                                                 6.6s 
 => [39/40] RUN apt install -y python3.8-venv                                                                                  1.7s 
 => [40/40] WORKDIR /work                                                                                                      0.0s 
 => exporting to image                                                                                                        46.7s 
 => => exporting layers                                                                                                       46.7s
 => => writing image sha256:6539c6892f0d0f4bd49b4234de1fe16cf03fcfdc22ea40f04dbea4509f0c61a7                                   0.0s
 => => naming to docker.io/library/mlperf-inference:mahmood-x86_64-latest                                                     0.0s
make[1]: Leaving directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
make[1]: Entering directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
make[2]: Entering directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
Adding user account into image
DOCKER_BUILDKIT=1 docker build -t mlperf-inference:mahmood-x86_64 --network host \
	--build-arg BASE_IMAGE=mlperf-inference:mahmood-x86_64-latest \
	--build-arg GID=1000 --build-arg UID=1000 --build-arg GROUP=mahmood --build-arg USER=mahmood \
	- < docker/Dockerfile.user
[+] Building 0.4s (6/6) FINISHED                                                                                     docker:default
 => [internal] load .dockerignore                                                                                              0.0s
 => => transferring context: 2B                                                                                                0.0s
 => [internal] load build definition from Dockerfile                                                                           0.0s
 => => transferring dockerfile: 1.05kB                                                                                         0.0s
 => [internal] load metadata for docker.io/library/mlperf-inference:mahmood-x86_64-latest                                     0.0s
 => [1/2] FROM docker.io/library/mlperf-inference:mahmood-x86_64-latest                                                       0.2s
 => [2/2] RUN echo root:root | chpasswd  && groupadd -f -g 1000 mahmood  && useradd -G sudo -g 1000 -u 1000 -m mahmood  &&   0.1s
 => exporting to image                                                                                                         0.0s
 => => exporting layers                                                                                                        0.0s
 => => writing image sha256:3ef82bce4e3303961c0d1b896e1c84228f241881c697965a584946443283b6e7                                   0.0s
 => => naming to docker.io/library/mlperf-inference:mahmood-x86_64                                                            0.0s
make[2]: Leaving directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
make[2]: Entering directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
/bin/bash: line 1: [: ==: unary operator expected
/bin/bash: line 1: [: !=: unary operator expected
docker run --gpus=all --runtime=nvidia --rm -it -w /work \
	-v /disk1/mahmood/inference_results_v4.0/closed/NVIDIA:/work -v /home/mahmood:/mnt//home/mahmood \
	--cap-add SYS_ADMIN --cap-add SYS_TIME \
	-e NVIDIA_VISIBLE_DEVICES=all \
	-e HISTFILE=/mnt//home/mahmood/.mlperf_bash_history \
	--shm-size=32gb \
	--ulimit memlock=-1 \
	-v /etc/timezone:/etc/timezone:ro -v /etc/localtime:/etc/localtime:ro \
	--security-opt apparmor=unconfined --security-opt seccomp=unconfined \
	--name mlperf-inference-mahmood-x86_64-28459 -h mlperf-inference-mahmood-x86-64-28459 --add-host mlperf-inference-mahmood-x86_64-28459:127.0.0.1 \
	--cpuset-cpus 0-15 \
	--user 1000 --net host --device /dev/fuse \
	-v /disk1/scratch_v4:/disk1/scratch_v4  \
	-e MLPERF_SCRATCH_PATH=/disk1/scratch_v4 \
	-e HOST_HOSTNAME=rtx3080 \
	 \
	mlperf-inference:mahmood-x86_64 
docker: Error response from daemon: unknown or invalid runtime name: nvidia.
See 'docker run --help'.
make[2]: *** [Makefile.docker:311: launch_docker] Error 125
make[2]: Leaving directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
make[1]: *** [Makefile.docker:299: attach_docker] Error 2
make[1]: Leaving directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'

Any idea on how to fix that?

@mahmoodn
Copy link
Author

With the following commands, I was able to define and enable nvidia runtime in docker.

$ sudo apt install nvidia-container-toolkit nvidia-container-runtime
$ sudo nvidia-ctk runtime configure --runtime=docker
$ docker info      # Verify that nvidia is listed in the runtime section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant