[BUG] Qwen 1.5B Inference Crash on Vertex AI Platform #144

rothn · 2025-02-25T17:48:22Z

I'm trying to deploy Qwen 1.5B on Vertex AI Endpoints, and I get a crash deploying Qwen 1.5B while Qwen 7B deploys perfectly fine, using the same HuggingFace TRL configuration (other than the base model) to train both. Note that training and local inference work fine both for 1.5B and 7B. The container I'm using is us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu124.2-4.ubuntu2204.py311. My requirements.txt file is as follows for the training / local-inference setup is as follows:

accelerate==1.4.0
deepspeed==0.16.3
importlib-metadata==8.6.1
transformers==4.49.0
trl @ git+https://github.com/huggingface/[email protected]
protobuf==5.29.3
sentencepiece==0.2.0

Logs from the container referenced above:
aiplatform_endpoints_crash.log

Container environment variables:

serving_container_environment_variables={
          "NUM_SHARD": "1",
          "MAX_INPUT_TOKENS": "512",
          "MAX_TOTAL_TOKENS": "1024",
          "MAX_BATCH_PREFILL_TOKENS": "1512",
          "CUDA_LAUNCH_BLOCKING": "1", # Debug for Qwen 1.5B
          "TORCH_USE_CUDA_DSA": "1",   # Debug for Qwen 1.5B
      }

I wonder if there's some sort of version mismatch here between the training and serving containers, or perhaps 2.4.0 is just too old/buggy, since the latest release of text-generation-inference appears to be 3.1.0. Is there a newer container I can try?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Qwen 1.5B Inference Crash on Vertex AI Platform #144

[BUG] Qwen 1.5B Inference Crash on Vertex AI Platform #144

rothn commented Feb 25, 2025

[BUG] Qwen 1.5B Inference Crash on Vertex AI Platform #144

[BUG] Qwen 1.5B Inference Crash on Vertex AI Platform #144

Comments

rothn commented Feb 25, 2025