docker: Error response from daemon: unknown or invalid runtime name: nvidia #7

mahmoodn · 2024-06-24T11:50:55Z

With CUDA-11.8 on Ubuntu 22.04 and RTX 3080 device, I tried to run make prebuild inference_v4 in NVIDIA folder. After about two hours of compilation, it ended with an error which I don't understand the message:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

$ nvidia-smi 
Mon Jun 24 13:46:50 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+

$ docker images
REPOSITORY                               TAG                                                       IMAGE ID       CREATED         SIZE
mlperf-inference                         mahmood-x86_64                                           3ef82bce4e33   4 minutes ago   46.5GB
mlperf-inference                         mahmood-x86_64-latest                                    6539c6892f0d   5 minutes ago   46.5GB
nvcr.io/nvidia/mlperf/mlperf-inference   mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public   34b056f25fae   4 months ago    14.5GB

$ dpkg -l | grep docker
ii  docker                                     1.5-2                                   all          transitional package
ii  docker-buildx                              0.12.1-0ubuntu1~22.04.1                 amd64        Docker CLI plugin for extended build capabilities with BuildKit
ii  docker.io                                  24.0.7-0ubuntu2~22.04.1                 amd64        Linux container runtime
ii  wmdocker                                   1.5-2                                   amd64        System tray for KDE3/GNOME2 docklet applications

$ dpkg -l | grep nvidia
ii  libnvidia-container-tools                  1.15.0-1                                amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                 1.15.0-1                                amd64        NVIDIA container runtime library
ii  nvidia-container-toolkit                   1.15.0-1                                amd64        NVIDIA Container toolkit
ii  nvidia-container-toolkit-base              1.15.0-1                                amd64        NVIDIA Container Toolkit Base

The output is:

 => [37/40] RUN mkdir -p /opt/fp8/faster-transformer-bert-fp8-weights-scales/     && tar -zxvf /tmp/faster-transformer-bert-f  8.4s 
 => [38/40] RUN apt install -y libgl1-mesa-glx                                                                                 6.6s 
 => [39/40] RUN apt install -y python3.8-venv                                                                                  1.7s 
 => [40/40] WORKDIR /work                                                                                                      0.0s 
 => exporting to image                                                                                                        46.7s 
 => => exporting layers                                                                                                       46.7s
 => => writing image sha256:6539c6892f0d0f4bd49b4234de1fe16cf03fcfdc22ea40f04dbea4509f0c61a7                                   0.0s
 => => naming to docker.io/library/mlperf-inference:mahmood-x86_64-latest                                                     0.0s
make[1]: Leaving directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
make[1]: Entering directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
make[2]: Entering directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
Adding user account into image
DOCKER_BUILDKIT=1 docker build -t mlperf-inference:mahmood-x86_64 --network host \
	--build-arg BASE_IMAGE=mlperf-inference:mahmood-x86_64-latest \
	--build-arg GID=1000 --build-arg UID=1000 --build-arg GROUP=mahmood --build-arg USER=mahmood \
	- < docker/Dockerfile.user
[+] Building 0.4s (6/6) FINISHED                                                                                     docker:default
 => [internal] load .dockerignore                                                                                              0.0s
 => => transferring context: 2B                                                                                                0.0s
 => [internal] load build definition from Dockerfile                                                                           0.0s
 => => transferring dockerfile: 1.05kB                                                                                         0.0s
 => [internal] load metadata for docker.io/library/mlperf-inference:mahmood-x86_64-latest                                     0.0s
 => [1/2] FROM docker.io/library/mlperf-inference:mahmood-x86_64-latest                                                       0.2s
 => [2/2] RUN echo root:root | chpasswd  && groupadd -f -g 1000 mahmood  && useradd -G sudo -g 1000 -u 1000 -m mahmood  &&   0.1s
 => exporting to image                                                                                                         0.0s
 => => exporting layers                                                                                                        0.0s
 => => writing image sha256:3ef82bce4e3303961c0d1b896e1c84228f241881c697965a584946443283b6e7                                   0.0s
 => => naming to docker.io/library/mlperf-inference:mahmood-x86_64                                                            0.0s
make[2]: Leaving directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
make[2]: Entering directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
/bin/bash: line 1: [: ==: unary operator expected
/bin/bash: line 1: [: !=: unary operator expected
docker run --gpus=all --runtime=nvidia --rm -it -w /work \
	-v /disk1/mahmood/inference_results_v4.0/closed/NVIDIA:/work -v /home/mahmood:/mnt//home/mahmood \
	--cap-add SYS_ADMIN --cap-add SYS_TIME \
	-e NVIDIA_VISIBLE_DEVICES=all \
	-e HISTFILE=/mnt//home/mahmood/.mlperf_bash_history \
	--shm-size=32gb \
	--ulimit memlock=-1 \
	-v /etc/timezone:/etc/timezone:ro -v /etc/localtime:/etc/localtime:ro \
	--security-opt apparmor=unconfined --security-opt seccomp=unconfined \
	--name mlperf-inference-mahmood-x86_64-28459 -h mlperf-inference-mahmood-x86-64-28459 --add-host mlperf-inference-mahmood-x86_64-28459:127.0.0.1 \
	--cpuset-cpus 0-15 \
	--user 1000 --net host --device /dev/fuse \
	-v /disk1/scratch_v4:/disk1/scratch_v4  \
	-e MLPERF_SCRATCH_PATH=/disk1/scratch_v4 \
	-e HOST_HOSTNAME=rtx3080 \
	 \
	mlperf-inference:mahmood-x86_64 
docker: Error response from daemon: unknown or invalid runtime name: nvidia.
See 'docker run --help'.
make[2]: *** [Makefile.docker:311: launch_docker] Error 125
make[2]: Leaving directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'
make[1]: *** [Makefile.docker:299: attach_docker] Error 2
make[1]: Leaving directory '/disk1/mahmood/inference_results_v4.0/closed/NVIDIA'

Any idea on how to fix that?

The text was updated successfully, but these errors were encountered:

mahmoodn · 2024-06-25T13:53:11Z

With the following commands, I was able to define and enable nvidia runtime in docker.

$ sudo apt install nvidia-container-toolkit nvidia-container-runtime
$ sudo nvidia-ctk runtime configure --runtime=docker
$ docker info      # Verify that nvidia is listed in the runtime section

mahmoodn closed this as completed Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker: Error response from daemon: unknown or invalid runtime name: nvidia #7

docker: Error response from daemon: unknown or invalid runtime name: nvidia #7

mahmoodn commented Jun 24, 2024 •

edited

Loading

mahmoodn commented Jun 25, 2024

docker: Error response from daemon: unknown or invalid runtime name: nvidia #7

docker: Error response from daemon: unknown or invalid runtime name: nvidia #7

Comments

mahmoodn commented Jun 24, 2024 • edited Loading

mahmoodn commented Jun 25, 2024

mahmoodn commented Jun 24, 2024 •

edited

Loading