There are 4 ways to use Triton Model Analyzer:
The recommended way to use Model Analyzer is by building the Model Analyzer's
docker yourself. This installation method uses --triton-launch-mode=local
by
default, as all the dependencies will be available. First, clone the Model
Analyzer's git repository, then build the docker image.
$ git clone https://github.com/triton-inference-server/model_analyzer.git -b <rXX.YY>
$ cd ./model_analyzer
$ docker build --pull -t model-analyzer .
The above command will pull all the containers that Model Analyzer needs to run.
The Model Analyzer's Dockerfile bases the container on the latest tritonserver
containers from NGC. Now you can run the container with:
$ docker run -it --rm --gpus all \
-v /var/run/docker.sock:/var/run/docker.sock \
-v <path-to-triton-model-repository>:<path-to-triton-model-repository> \
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
--net=host model-analyzer
root@hostname:/opt/triton-model-analyzer#
If you want to build Model Analyzer on the main
branch, you need to also build the Triton
SDK container. To build the SDK container you can refer to the
Build SDK Image
instructions. After you have built the SDK container, you can build the Model
Analyzer's Docker image with:
$ git clone https://github.com/triton-inference-server/model_analyzer.git -b main
$ cd ./model_analyzer
$ docker build -t model-analyzer --build-arg TRITONSDK_BASE_IMAGE=<name of the built SDK image> .
You can also use the Triton SDK docker container available on the NVIDIA GPU Cloud Catalog. You can pull and run the SDK container with the following commands:
$ docker pull nvcr.io/nvidia/tritonserver:22.04-py3-sdk
If you are not planning to run Model Analyzer with
--triton-launch-mode=docker
, You can run the SDK container with the following
command:
$ docker run -it --gpus all --net=host nvcr.io/nvidia/tritonserver:22.04-py3-sdk
You will need to build and install the Triton server binary inside the SDK container if you want to use the local mode. See the Triton Installation docs for more details.
If you intend to use --triton-launch-mode=docker
, which is recommended with
this method of using Model Analyzer when using the SDK container,
you will need to mount the following:
-v /var/run/docker.sock:/var/run/docker.sock
allows running docker containers as sibling containers from inside the Triton SDK container. Model Analyzer will require this if run with--triton-launch-mode=docker
.-v <path-to-output-model-repo>:<path-to-output-model-repo>
The absolute path to the directory where the output model repository will be located (i.e. parent directory of the output model repository). This is so that the launched Triton container has access to the model config variants that Model Analyzer creates.
$ docker run -it --gpus all \
-v /var/run/docker.sock:/var/run/docker.sock \
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
--net=host nvcr.io/nvidia/tritonserver:22.04-py3-sdk
Model Analyzer uses pdfkit
for report generation. If you are running Model
Analyzer inside the Triton SDK container, then you will need to download
wkhtmltopdf
.
$ sudo apt-get update && sudo apt-get install wkhtmltopdf
Once you do this, Model Analyzer will able to use pdfkit
to generate reports.
You can install pip using:
$ sudo apt-get update && sudo apt-get install python3-pip
Model analyzer can be installed with:
$ pip3 install triton-model-analyzer
If you encounter any errors installing dependencies like numba
, make sure that
you have the latest version of pip
using:
$ pip3 install --upgrade pip
You can then try installing model analyzer again.
If you are using this approach you need to install DCGM on your machine.
For installing DCGM on Ubuntu 20.04 you can use the following commands:
$ export DCGM_VERSION=2.0.13
$ wget -q https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb && \
dpkg -i datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb
To build model analyzer form source, you'll need to install the same dependencies (tritonclient and DCGM) mentioned in the "Using pip section". After that, you can use the following commands:
$ git clone https://github.com/triton-inference-server/model_analyzer
$ cd model_analyzer
$ ./build_wheel.sh <path to perf_analyzer> true
In the final command above we are building the triton-model-analyzer wheel. You
will need to provide the build_wheel.sh
script with two arguments. The first
is the path to the perf_analyzer
binary that you would like Model Analyzer to
use. The second is whether you want this wheel to be linux specific. Currently,
this argument must be set to true
as perf analyzer is supported only on linux.
This will create a wheel file in the wheels
directory named
triton-model-analyzer-<version>-py3-none-manylinux1_x86_64.whl
. We can now
install this with:
$ pip3 install wheels/triton-model-analyzer-*.whl
After these steps, model-analyzer
executable should be available in $PATH
.
Notes:
- Triton Model Analyzer supports all the GPUs supported by the DCGM library. See DCGM Supported GPUs for more information.