Skip to content

Commit

Permalink
0.3.3
Browse files Browse the repository at this point in the history
  • Loading branch information
alpayariyak committed Mar 5, 2024
1 parent d91ccb8 commit db7167d
Show file tree
Hide file tree
Showing 7 changed files with 20 additions and 50 deletions.
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "vllm-base-image/vllm"]
path = vllm-base-image/vllm
url = /devdisk/inference/worker-vllm/vllm-base-image/vllm
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
ARG WORKER_CUDA_VERSION=11.8.0
FROM runpod/worker-vllm:base-0.3.1-cuda${WORKER_CUDA_VERSION} AS vllm-base
FROM runpod/worker-vllm:base-0.3.2-cuda${WORKER_CUDA_VERSION} AS vllm-base

RUN apt-get update -y \
&& apt-get install -y python3-pip
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Deploy Blazing-fast LLMs powered by [vLLM](https://github.com/vllm-project/vllm) on RunPod Serverless in a few clicks.

<p>Worker Version: 0.3.1 | vLLM Version: 0.3.2</p>
<p>Worker Version: 0.3.2 | vLLM Version: 0.3.3</p>

[![CD | Docker-Build-Release](https://github.com/runpod-workers/worker-vllm/actions/workflows/docker-build-release.yml/badge.svg)](https://github.com/runpod-workers/worker-vllm/actions/workflows/docker-build-release.yml)

Expand Down Expand Up @@ -88,7 +88,7 @@ This table provides a quick reference to the image tags you should use based on
**LLM Settings**
| `MODEL_NAME`**\*** | - | `str` | Hugging Face Model Repository (e.g., `openchat/openchat-3.5-1210`). |
| `MODEL_REVISION` | `None` | `str` |Model revision(branch) to load. |
| `MAX_MODEL_LENGTH` | Model's maximum | `int` |Maximum number of tokens for the engine to handle per request. |
| `MAX_MODEL_LEN` | Model's maximum | `int` |Maximum number of tokens for the engine to handle per request. |
| `BASE_PATH` | `/runpod-volume` | `str` |Storage directory for Huggingface cache and model. Utilizes network storage if attached when pointed at `/runpod-volume`, which will have only one worker download the model once, which all workers will be able to load. If no network volume is present, creates a local directory within each worker. |
| `LOAD_FORMAT` | `auto` | `str` |Format to load model in. |
| `HF_TOKEN` | - | `str` |Hugging Face token for private and gated models. |
Expand Down
2 changes: 1 addition & 1 deletion src/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def _initialize_config(self):
"trust_remote_code": bool(int(os.getenv("TRUST_REMOTE_CODE", 0))),
"gpu_memory_utilization": float(os.getenv("GPU_MEMORY_UTILIZATION", 0.95)),
"max_parallel_loading_workers": None if device_count() > 1 or not os.getenv("MAX_PARALLEL_LOADING_WORKERS") else int(os.getenv("MAX_PARALLEL_LOADING_WORKERS")),
"max_model_len": int(os.getenv("MAX_MODEL_LENGTH")) if os.getenv("MAX_MODEL_LENGTH") else None,
"max_model_len": int(os.getenv("MAX_MODEL_LEN")) if os.getenv("MAX_MODEL_LEN") else None,
"tensor_parallel_size": device_count(),
"seed": int(os.getenv("SEED")) if os.getenv("SEED") else None,
"kv_cache_dtype": os.getenv("KV_CACHE_DTYPE"),
Expand Down
47 changes: 13 additions & 34 deletions vllm-base/Dockerfile → vllm-base-image/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,25 +17,16 @@ ARG WORKER_CUDA_VERSION
RUN apt-get update -y \
&& apt-get install -y python3-pip git

RUN if [ "${WORKER_CUDA_VERSION}" = "12.1.0" ]; then \
ldconfig /usr/local/cuda-12.1/compat/; \
fi

# Set working directory
WORKDIR /vllm-installation

# Install build and runtime dependencies
COPY vllm-${WORKER_CUDA_VERSION}/requirements.txt requirements.txt
COPY vllm/requirements-${WORKER_CUDA_VERSION}.txt requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt

RUN --mount=type=cache,target=/root/.cache/pip \
if [ "${WORKER_CUDA_VERSION}" = "11.8.0" ]; then \
pip install -U --force-reinstall torch==2.1.2 xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118; \
fi

# Install development dependencies
COPY vllm-${WORKER_CUDA_VERSION}/requirements-dev.txt requirements-dev.txt
COPY vllm/requirements-dev.txt requirements-dev.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements-dev.txt

Expand All @@ -45,25 +36,15 @@ FROM dev AS build
ARG WORKER_CUDA_VERSION

# Install build dependencies
COPY vllm-${WORKER_CUDA_VERSION}/requirements-build.txt requirements-build.txt
COPY vllm/requirements-build.txt requirements-build.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements-build.txt

# Copy necessary files
COPY vllm-${WORKER_CUDA_VERSION}/csrc csrc
COPY vllm-${WORKER_CUDA_VERSION}/setup.py setup.py
COPY vllm-12.1.0/pyproject.toml pyproject.toml
COPY vllm-${WORKER_CUDA_VERSION}/vllm/__init__.py vllm/__init__.py

# Conditional installation based on CUDA version
RUN --mount=type=cache,target=/root/.cache/pip \
if [ "${WORKER_CUDA_VERSION}" = "11.8.0" ]; then \
pip install -U --force-reinstall torch==2.1.2 xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118; \
rm pyproject.toml; \
elif [ "${WORKER_CUDA_VERSION}" != "12.1.0" ]; then \
echo "WORKER_CUDA_VERSION not supported"; \
exit 1; \
fi
COPY vllm/csrc csrc
COPY vllm/setup.py setup.py
COPY vllm/pyproject.toml pyproject.toml
COPY vllm/vllm/__init__.py vllm/__init__.py

# Set environment variables for building extensions
ARG torch_cuda_arch_list='7.0 7.5 8.0 8.6 8.9 9.0+PTX'
Expand All @@ -72,8 +53,10 @@ ARG max_jobs=48
ENV MAX_JOBS=${max_jobs}
ARG nvcc_threads=1024
ENV NVCC_THREADS=${nvcc_threads}

ENV WORKER_CUDA_VERSION=${WORKER_CUDA_VERSION}
ENV VLLM_INSTALL_PUNICA_KERNELS=0
# Build extensions
RUN ldconfig /usr/local/cuda-$(echo "$WORKER_CUDA_VERSION" | sed 's/\.0$//')/compat/
RUN python3 setup.py build_ext --inplace

FROM nvidia/cuda:${WORKER_CUDA_VERSION}-runtime-ubuntu22.04 AS vllm-base
Expand All @@ -88,19 +71,15 @@ RUN apt-get update -y \
# Set working directory
WORKDIR /vllm-installation


# Install runtime dependencies
COPY vllm-${WORKER_CUDA_VERSION}/requirements.txt requirements.txt
COPY vllm/requirements-${WORKER_CUDA_VERSION}.txt requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt

RUN --mount=type=cache,target=/root/.cache/pip \
if [ "${WORKER_CUDA_VERSION}" = "11.8.0" ]; then \
pip install -U --force-reinstall torch==2.1.2 xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118; \
fi

# Copy built files from the build stage
COPY --from=build /vllm-installation/vllm/*.so /vllm-installation/vllm/
COPY vllm-${WORKER_CUDA_VERSION}/vllm vllm
COPY vllm/vllm vllm

# Set PYTHONPATH environment variable
ENV PYTHONPATH="/"
Expand Down
File renamed without changes.
12 changes: 0 additions & 12 deletions vllm-base/download_required_files.sh

This file was deleted.

0 comments on commit db7167d

Please sign in to comment.