0.3.1: bug fixes

runpod-workers · Feb 29, 2024 · d91ccb8 · d91ccb8
1 parent 36e9b67
commit d91ccb8
Show file tree

Hide file tree

Showing 3 changed files with 13 additions and 10 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -1,5 +1,5 @@
 ARG WORKER_CUDA_VERSION=11.8.0
-FROM runpod/worker-vllm:base-0.3.0-cuda${WORKER_CUDA_VERSION} AS vllm-base
+FROM runpod/worker-vllm:base-0.3.1-cuda${WORKER_CUDA_VERSION} AS vllm-base
 
 RUN apt-get update -y \
     && apt-get install -y python3-pip

diff --git a/README.md b/README.md
@@ -2,13 +2,14 @@
 
 <h1> vLLM Serverless Endpoint Worker </h1>
 
+Deploy Blazing-fast LLMs powered by [vLLM](https://github.com/vllm-project/vllm) on RunPod Serverless in a few clicks.
+
+<p>Worker Version: 0.3.1 | vLLM Version: 0.3.2</p>
+
 [![CD | Docker-Build-Release](https://github.com/runpod-workers/worker-vllm/actions/workflows/docker-build-release.yml/badge.svg)](https://github.com/runpod-workers/worker-vllm/actions/workflows/docker-build-release.yml)
 
-Deploy Blazing-fast LLMs powered by [vLLM](https://github.com/vllm-project/vllm) on RunPod Serverless in a few clicks.
-</div>
 
-> [!IMPORTANT]
-> [02.28.2024] When HuggingFace is down: to successfully load models that are downloaded on the image or endpoint network storage, set environment variables `TRANSFORMERS_OFFLINE` and `HF_HUB_OFFLINE` to `1` in the endpoint template.
+</div>
 
 ### Worker vLLM 0.3.0: What's New since 0.2.0:
 - **🚀 Full OpenAI Compatibility 🚀**
@@ -69,8 +70,8 @@ Below is a summary of the available RunPod Worker images, categorized by image s
 
 | CUDA Version | Stable Image Tag                  | Development Image Tag             | Note                                                        |
 |--------------|-----------------------------------|-----------------------------------|----------------------------------------------------------------------|
-| 11.8.0       | `runpod/worker-vllm:0.3.0-cuda11.8.0`        | `runpod/worker-vllm:dev-cuda11.8.0`   | Available on all RunPod Workers without additional selection needed. |
-| 12.1.0       | `runpod/worker-vllm:0.3.0-cuda12.1.0` | `runpod/worker-vllm:dev-cuda12.1.0` | When creating an Endpoint, select CUDA Version 12.2 and 12.1 in the filter. |
+| 11.8.0       | `runpod/worker-vllm:0.3.1-cuda11.8.0`        | `runpod/worker-vllm:dev-cuda11.8.0`   | Available on all RunPod Workers without additional selection needed. |
+| 12.1.0       | `runpod/worker-vllm:0.3.1-cuda12.1.0` | `runpod/worker-vllm:dev-cuda12.1.0` | When creating an Endpoint, select CUDA Version 12.2 and 12.1 in the filter. |
 
 This table provides a quick reference to the image tags you should use based on the desired CUDA version and image stability (Stable or Development). Ensure to follow the selection note for CUDA 12.1.0 compatibility.
 

diff --git a/src/config.py b/src/config.py
@@ -1,7 +1,7 @@
 import os
 from dotenv import load_dotenv
-from utils import count_physical_cores 
 from torch.cuda import device_count
+import os
 
 class EngineConfig:
     def __init__(self):
@@ -14,9 +14,11 @@ def __init__(self):
 
     def _get_local_or_env(self, local_path, env_var):
         if os.path.exists(local_path):
+            os.environ["TRANSFORMERS_OFFLINE"] = "1"
+            os.environ["HF_HUB_OFFLINE"] = "1"
             with open(local_path, "r") as file:
                 return file.read().strip(), None, None
-        return os.getenv(env_var), os.getenv("HF_HOME"), os.getenv(f"{env_var}_REVISION")
+        return os.getenv(env_var), os.getenv("HF_HOME"), os.getenv(f"{env_var.split('_')[0]}_REVISION") or None
 
     def _get_quantization(self):
         quantization = os.getenv("QUANTIZATION", "").lower()
@@ -48,4 +50,4 @@ def _initialize_config(self):
             "enforce_eager": bool(int(os.getenv("ENFORCE_EAGER", 0)))
         }
 
-        return {k: v for k, v in args.items() if v is not None}
+        return {k: v for k, v in args.items() if v is not None}