Skip to content

Benchmark and Configuration to deploy Lumo 8B Instruct in vLLM mode

License

Notifications You must be signed in to change notification settings

brianx9crypto/lumo-8b-instruct-vllm-vs-ollama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Lumo 8B Instruct Optimization

  • Optimizated configuration for Lumo 8B Instruct model with 1.5x to 3x speed gains.
  • Benchmark on AWS G6e.2xLarge (Nvidia L40S)

Results -

  1. Ollama benchmark Ollama benchmark
  2. vLLM benchmark vLLM benchmark

How to setup?

  • Get nvidia L40S or above GPU instance
  • Ubuntu VM preferred
  • Install Ollama (optional; for comparison or to reproduce these tests)
cd src
# Build vLLM docker image and run
./run.sh
# OR manual mode
docker build . -t vllm-gguf

# Run the container
docker run --gpus all \
  --shm-size 16g \
  -p 8000:8000 \
  vllm-gguf
  • Benchmark
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python src/test_model.py

Roadmap -

  • More optimizations to Docker image
  • Image on Docker hub
  • Serverless configuration (on runpod, koyeb etc).

About

Benchmark and Configuration to deploy Lumo 8B Instruct in vLLM mode

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published