Skip to content

Commit

Permalink
0.3.1: bug fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
alpayariyak committed Feb 29, 2024
1 parent 36e9b67 commit d91ccb8
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 10 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
ARG WORKER_CUDA_VERSION=11.8.0
FROM runpod/worker-vllm:base-0.3.0-cuda${WORKER_CUDA_VERSION} AS vllm-base
FROM runpod/worker-vllm:base-0.3.1-cuda${WORKER_CUDA_VERSION} AS vllm-base

RUN apt-get update -y \
&& apt-get install -y python3-pip
Expand Down
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,14 @@

<h1> vLLM Serverless Endpoint Worker </h1>

Deploy Blazing-fast LLMs powered by [vLLM](https://github.com/vllm-project/vllm) on RunPod Serverless in a few clicks.

<p>Worker Version: 0.3.1 | vLLM Version: 0.3.2</p>

[![CD | Docker-Build-Release](https://github.com/runpod-workers/worker-vllm/actions/workflows/docker-build-release.yml/badge.svg)](https://github.com/runpod-workers/worker-vllm/actions/workflows/docker-build-release.yml)

Deploy Blazing-fast LLMs powered by [vLLM](https://github.com/vllm-project/vllm) on RunPod Serverless in a few clicks.
</div>

> [!IMPORTANT]
> [02.28.2024] When HuggingFace is down: to successfully load models that are downloaded on the image or endpoint network storage, set environment variables `TRANSFORMERS_OFFLINE` and `HF_HUB_OFFLINE` to `1` in the endpoint template.
</div>

### Worker vLLM 0.3.0: What's New since 0.2.0:
- **🚀 Full OpenAI Compatibility 🚀**
Expand Down Expand Up @@ -69,8 +70,8 @@ Below is a summary of the available RunPod Worker images, categorized by image s

| CUDA Version | Stable Image Tag | Development Image Tag | Note |
|--------------|-----------------------------------|-----------------------------------|----------------------------------------------------------------------|
| 11.8.0 | `runpod/worker-vllm:0.3.0-cuda11.8.0` | `runpod/worker-vllm:dev-cuda11.8.0` | Available on all RunPod Workers without additional selection needed. |
| 12.1.0 | `runpod/worker-vllm:0.3.0-cuda12.1.0` | `runpod/worker-vllm:dev-cuda12.1.0` | When creating an Endpoint, select CUDA Version 12.2 and 12.1 in the filter. |
| 11.8.0 | `runpod/worker-vllm:0.3.1-cuda11.8.0` | `runpod/worker-vllm:dev-cuda11.8.0` | Available on all RunPod Workers without additional selection needed. |
| 12.1.0 | `runpod/worker-vllm:0.3.1-cuda12.1.0` | `runpod/worker-vllm:dev-cuda12.1.0` | When creating an Endpoint, select CUDA Version 12.2 and 12.1 in the filter. |

This table provides a quick reference to the image tags you should use based on the desired CUDA version and image stability (Stable or Development). Ensure to follow the selection note for CUDA 12.1.0 compatibility.

Expand Down
8 changes: 5 additions & 3 deletions src/config.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import os
from dotenv import load_dotenv
from utils import count_physical_cores
from torch.cuda import device_count
import os

class EngineConfig:
def __init__(self):
Expand All @@ -14,9 +14,11 @@ def __init__(self):

def _get_local_or_env(self, local_path, env_var):
if os.path.exists(local_path):
os.environ["TRANSFORMERS_OFFLINE"] = "1"
os.environ["HF_HUB_OFFLINE"] = "1"
with open(local_path, "r") as file:
return file.read().strip(), None, None
return os.getenv(env_var), os.getenv("HF_HOME"), os.getenv(f"{env_var}_REVISION")
return os.getenv(env_var), os.getenv("HF_HOME"), os.getenv(f"{env_var.split('_')[0]}_REVISION") or None

def _get_quantization(self):
quantization = os.getenv("QUANTIZATION", "").lower()
Expand Down Expand Up @@ -48,4 +50,4 @@ def _initialize_config(self):
"enforce_eager": bool(int(os.getenv("ENFORCE_EAGER", 0)))
}

return {k: v for k, v in args.items() if v is not None}
return {k: v for k, v in args.items() if v is not None}

0 comments on commit d91ccb8

Please sign in to comment.