Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Neuron DLCs for 2.21.1 release #34

Merged
merged 2 commits into from
Jan 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,23 @@ AWS Neuron Deep Learning Containers (DLCs) are a set of Docker images for traini

| Framework | Neuron Packages | Neuron SDK Version | Supported EC2 Instance Types | Python Version Options | ECR Public URL | Other Packages |
|-----------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|--------------------|------------------------------|------------------------|--------------------------------------------------------------------------------------------|-------------------|
| [PyTorch 2.5.1](https://github.com/aws-neuron/deep-learning-containers/blob/2.21.0/docker/pytorch/inference/2.5.1/Dockerfile.neuronx) | aws-neuronx-tools, neuronx_distributed, neuronx_distributed_inference, torch-neuronx, transformers-neuronx | Neuron 2.21.0 | trn1,trn2,inf2 | 3.10 (py310) | public.ecr.aws/neuron/pytorch-inference-neuronx:2.5.1-neuronx-py310-sdk2.21.0-ubuntu22.04 | torchserve 0.11.0 |
| [PyTorch 2.5.1](https://github.com/aws-neuron/deep-learning-containers/blob/2.21.1/docker/pytorch/inference/2.5.1/Dockerfile.neuronx) | aws-neuronx-tools, neuronx_distributed, neuronx_distributed_inference, torch-neuronx, transformers-neuronx | Neuron 2.21.1 | trn1,trn2,inf2 | 3.10 (py310) | public.ecr.aws/neuron/pytorch-inference-neuronx:2.5.1-neuronx-py310-sdk2.21.1-ubuntu22.04 | torchserve 0.11.0 |
| [PyTorch 2.1.2](https://github.com/aws-neuron/deep-learning-containers/blob/2.20.2/docker/pytorch/inference/2.1.2/Dockerfile.neuronx) | aws-neuronx-tools, neuronx_distributed, torch-neuronx, transformers-neuronx | Neuron 2.20.2 | trn1,inf2 | 3.10 (py310) | public.ecr.aws/neuron/pytorch-inference-neuronx:2.1.2-neuronx-py310-sdk2.20.2-ubuntu20.04 | torchserve 0.11.0 |
| [PyTorch 1.13.1](https://github.com/aws-neuron/deep-learning-containers/blob/2.20.2/docker/pytorch/inference/1.13.1/Dockerfile.neuronx) | aws-neuronx-tools, neuronx_distributed, torch-neuronx, transformers-neuronx | Neuron 2.20.2 | trn1,inf2 | 3.10 (py310) | public.ecr.aws/neuron/pytorch-inference-neuronx:1.13.1-neuronx-py310-sdk2.20.2-ubuntu20.04 | torchserve 0.11.0 |

### pytorch-training-neuronx

| Framework | Neuron Packages | Neuron SDK Version | Supported EC2 Instance Types | Python Version Options | ECR Public URL |
|----------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|--------------------|------------------------------|------------------------|-------------------------------------------------------------------------------------------|
| [PyTorch 2.5.1](https://github.com/aws-neuron/deep-learning-containers/blob/2.21.0/docker/pytorch/training/2.5.1/Dockerfile.neuronx) | aws-neuronx-tools, neuronx_distributed, neuronx_distributed_training, torch-neuronx | Neuron 2.21.0 | trn1,trn2,inf2 | 3.10 (py310) | public.ecr.aws/neuron/pytorch-training-neuronx:2.5.1-neuronx-py310-sdk2.21.0-ubuntu22.04 |
| [PyTorch 2.5.1](https://github.com/aws-neuron/deep-learning-containers/blob/2.21.1/docker/pytorch/training/2.5.1/Dockerfile.neuronx) | aws-neuronx-tools, neuronx_distributed, neuronx_distributed_training, torch-neuronx | Neuron 2.21.1 | trn1,trn2,inf2 | 3.10 (py310) | public.ecr.aws/neuron/pytorch-training-neuronx:2.5.1-neuronx-py310-sdk2.21.1-ubuntu22.04 |
| [PyTorch 2.1.2](https://github.com/aws-neuron/deep-learning-containers/blob/2.20.2/docker/pytorch/training/2.1.2/Dockerfile.neuronx) | aws-neuronx-tools, neuronx_distributed, neuronx_distributed_training, torch-neuronx | Neuron 2.20.2 | trn1,inf2 | 3.10 (py310) | public.ecr.aws/neuron/pytorch-training-neuronx:2.1.2-neuronx-py310-sdk2.20.2-ubuntu20.04 |
| [PyTorch 1.13.1](https://github.com/aws-neuron/deep-learning-containers/blob/2.20.2/docker/pytorch/training/1.13.1/Dockerfile.neuronx) | aws-neuronx-tools, neuronx_distributed, neuronx_distributed_training, torch-neuronx | Neuron 2.20.2 | trn1,inf2 | 3.10 (py310) | public.ecr.aws/neuron/pytorch-training-neuronx:1.13.1-neuronx-py310-sdk2.20.2-ubuntu20.04 |

### jax-training-neuron
### jax-training-neuronx

| Framework | Neuron Packages | Neuron SDK Version | Supported EC2 Instance Types | Python Version Options | ECR Public URL | Other Packages |
|----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|--------------------|------------------------------|------------------------|------------------------------------------------------------------------------------------|-------------------|
| [JAX 0.4](https://github.com/aws-neuron/deep-learning-containers/blob/2.21.0/docker/jax/training/0.4/Dockerfile.neuronx) | jax-neuronx, libneuronxla | Neuron 2.21.0 | trn1,trn2,inf2 | 3.10 (py310) | public.ecr.aws/neuron/jax-training-neuronx:0.4-neuronx-py310-sdk2.21.0-ubuntu22.04 | jaxlib 0.4 |
| [JAX 0.4](https://github.com/aws-neuron/deep-learning-containers/blob/2.21.1/docker/jax/training/0.4/Dockerfile.neuronx) | jax-neuronx, libneuronxla | Neuron 2.21.1 | trn1,trn2,inf2 | 3.10 (py310) | public.ecr.aws/neuron/jax-training-neuronx:0.4-neuronx-py310-sdk2.21.1-ubuntu22.04 | jaxlib 0.4 |

## Security

Expand Down
6 changes: 3 additions & 3 deletions docker/jax/training/0.4/Dockerfile.neuronx
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
FROM public.ecr.aws/docker/library/ubuntu:22.04

LABEL dlc_major_version="1"

Check failure on line 3 in docker/jax/training/0.4/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3048 style: Invalid label key.
LABEL maintainer="Amazon AI"

# Neuron SDK components version numbers
ARG NEURONX_RUNTIME_LIB_VERSION=2.23.110.0-9b5179492
ARG NEURONX_COLLECTIVES_LIB_VERSION=2.23.133.0-3e70920f2
ARG NEURONX_RUNTIME_LIB_VERSION=2.23.112.0-9b5179492
ARG NEURONX_COLLECTIVES_LIB_VERSION=2.23.135.0-3e70920f2
ARG NEURONX_TOOLS_VERSION=2.20.204.0
ARG NEURONX_CC_VERSION=2.16.345.0
ARG NEURONX_CC_VERSION=2.16.372.0
ARG NEURONX_JAX_TRAINING_VERSION=0.1.2

ARG PYTHON=python3.10
Expand All @@ -31,7 +31,7 @@
ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/amazon/openmpi/lib64"
ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/lib"

RUN apt-get update \

Check failure on line 34 in docker/jax/training/0.4/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3008 warning: Pin versions in apt get install. Instead of `apt-get install <package>` use `apt-get install <package>=<version>`
&& apt-get upgrade -y \
&& apt-get install -y --no-install-recommends \
build-essential \
Expand Down Expand Up @@ -74,7 +74,7 @@
&& apt-get clean

# Install Open MPI
RUN mkdir -p /tmp/openmpi \

Check failure on line 77 in docker/jax/training/0.4/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3003 warning: Use WORKDIR to switch to a directory

Check failure on line 77 in docker/jax/training/0.4/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

SC2046 warning: Quote this to prevent word splitting.
&& cd /tmp/openmpi \
&& wget --quiet https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-${OMPI_VERSION}.tar.gz \
&& tar zxf openmpi-${OMPI_VERSION}.tar.gz \
Expand All @@ -86,7 +86,7 @@
&& rm -rf /tmp/openmpi

# Install packages and configure SSH for MPI operator in k8s
RUN apt-get update && apt-get install -y openmpi-bin openssh-server \

Check failure on line 89 in docker/jax/training/0.4/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3008 warning: Pin versions in apt get install. Instead of `apt-get install <package>` use `apt-get install <package>=<version>`

Check failure on line 89 in docker/jax/training/0.4/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3015 info: Avoid additional packages by specifying `--no-install-recommends`
&& mkdir -p /var/run/sshd \
&& echo " UserKnownHostsFile /dev/null" >> /etc/ssh/ssh_config \
&& echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config \
Expand All @@ -95,7 +95,7 @@
&& apt-get clean

# install Python
RUN wget -q https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz \

Check failure on line 98 in docker/jax/training/0.4/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

SC2046 warning: Quote this to prevent word splitting.

Check failure on line 98 in docker/jax/training/0.4/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3003 warning: Use WORKDIR to switch to a directory
&& tar -xzf Python-$PYTHON_VERSION.tgz \
&& cd Python-$PYTHON_VERSION \
&& ./configure --enable-shared --prefix=/usr/local \
Expand All @@ -114,13 +114,13 @@
# ompi_info to fail. This is only observed in CPU containers
ENV PATH="$PATH:/home/.openmpi/bin"
ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/.openmpi/lib/"
RUN ompi_info --parsable --all | grep mpi_built_with_cuda_support:value

Check failure on line 117 in docker/jax/training/0.4/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL4006 warning: Set the SHELL option -o pipefail before RUN with a pipe in it. If you are using /bin/sh in an alpine image or if your shell is symlinked to busybox then consider explicitly setting your SHELL to /bin/ash, or disable this check

RUN mkdir -p /etc/pki/tls/certs && cp /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt

# Install Neuron Driver, Runtime and Tools
RUN echo "deb https://apt.repos.neuron.amazonaws.com focal main" > /etc/apt/sources.list.d/neuron.list
RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add -

Check failure on line 123 in docker/jax/training/0.4/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL4006 warning: Set the SHELL option -o pipefail before RUN with a pipe in it. If you are using /bin/sh in an alpine image or if your shell is symlinked to busybox then consider explicitly setting your SHELL to /bin/ash, or disable this check

RUN apt-get update \
&& apt-get install -y \
Expand Down
Loading
Loading