Lumo 8B Instruct Optimization

Optimizated configuration for Lumo 8B Instruct model with 1.5x to 3x speed gains.
Benchmark on AWS G6e.2xLarge (Nvidia L40S)

Results -

Ollama benchmark
vLLM benchmark

How to setup?

Get nvidia L40S or above GPU instance
Ubuntu VM preferred
- Install Docker, nVidia-cuda drivers, nvidia tool kit
Install Ollama (optional; for comparison or to reproduce these tests)

cd src
# Build vLLM docker image and run
./run.sh
# OR manual mode
docker build . -t vllm-gguf

# Run the container
docker run --gpus all \
  --shm-size 16g \
  -p 8000:8000 \
  vllm-gguf

Benchmark

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python src/test_model.py

Roadmap -

More optimizations to Docker image
Image on Docker hub
Serverless configuration (on runpod, koyeb etc).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bench		bench
src		src
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lumo 8B Instruct Optimization

Results -

How to setup?

Roadmap -

About

Releases

Packages

Languages

License

brianx9crypto/lumo-8b-instruct-vllm-vs-ollama

Folders and files

Latest commit

History

Repository files navigation

Lumo 8B Instruct Optimization

Results -

How to setup?

Roadmap -

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages