Skip to content
This repository has been archived by the owner on Feb 15, 2025. It is now read-only.

feat(vllm)!: upgrade vllm backend and refactor deployment #854

Merged
merged 350 commits into from
Oct 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
350 commits
Select commit Hold shift + click to select a range
301fd42
fix Dockerfile lint
justinthelaw Sep 16, 2024
2a2c7d6
re-added default tensor size
justinthelaw Sep 16, 2024
98227a6
fix README
justinthelaw Sep 16, 2024
c620efa
cleanup
justinthelaw Sep 16, 2024
6593fbb
3.11.9 python
justinthelaw Sep 16, 2024
79272d1
fix FinishReason, add vLLM E2E
justinthelaw Sep 17, 2024
927ad25
llama completion test, add CompleteStreamChoice
justinthelaw Sep 17, 2024
e9e434f
condense e2e to 1 file, add max_new_tokens
justinthelaw Sep 17, 2024
d8c6767
formatting fix
justinthelaw Sep 17, 2024
29a9785
max_tokens for OpenAI client
justinthelaw Sep 17, 2024
a166c93
fix singular model_name arg
justinthelaw Sep 17, 2024
1c63741
isolate model_name to single test
justinthelaw Sep 17, 2024
2e82a9f
fix e2e-llama-cpp-python.yaml
justinthelaw Sep 17, 2024
807128e
Update e2e-vllm.yaml
justinthelaw Sep 17, 2024
e48331f
model_name fixture
justinthelaw Sep 17, 2024
e88b29f
Merge remote-tracking branch 'origin/main' into 1037-testvllm-impleme…
justinthelaw Sep 17, 2024
b366c5f
Merge remote-tracking branch 'origin/main' into 835-upgrade-vllm-for-…
justinthelaw Sep 17, 2024
ecbd4f7
handle request queue possibly being None
justinthelaw Sep 17, 2024
8552ce0
workaround GPU runner issue
justinthelaw Sep 17, 2024
af4e4ca
workaround GPU runner issue, pt.2
justinthelaw Sep 17, 2024
5b1532a
workaround GPU runner issue, pt.3
justinthelaw Sep 17, 2024
a8551e5
workaround GPU runner issue, pt.4
justinthelaw Sep 17, 2024
5f1b3c1
temp turn on e2e vllm, add nvidia-smi
justinthelaw Sep 17, 2024
1e7e98c
add nvidia setp
justinthelaw Sep 17, 2024
c46731a
fix cluster cmd, play with prompt
justinthelaw Sep 17, 2024
161fb3a
k3d permissions
justinthelaw Sep 17, 2024
84a0388
Update e2e-vllm.yaml
justinthelaw Sep 17, 2024
cb905ff
Update e2e-llama-cpp-python.yaml
justinthelaw Sep 17, 2024
6afb992
e2e-vllm.yaml with lfai-core
justinthelaw Sep 17, 2024
094da70
vllm e2e missing cluster create
justinthelaw Sep 17, 2024
f5d9f82
fix llama e2e steps
justinthelaw Sep 17, 2024
9fb28fa
test GPU cluster health
justinthelaw Sep 17, 2024
c19cec2
test GPU runner deps, pt.1
justinthelaw Sep 17, 2024
8767649
test GPU runner deps, pt.2
justinthelaw Sep 17, 2024
52857c5
test GPU runner deps, pt.3
justinthelaw Sep 17, 2024
287b911
test GPU runner deps, pt.4
justinthelaw Sep 17, 2024
e0b7e18
test GPU runner deps, pt.5
justinthelaw Sep 17, 2024
042248d
add comments
justinthelaw Sep 17, 2024
64079aa
better comments, log test outputs
justinthelaw Sep 17, 2024
0148b92
add wait-for, more comments
justinthelaw Sep 17, 2024
04ab8b2
remove formatting
justinthelaw Sep 17, 2024
635bdaf
fix CUDA pod test
justinthelaw Sep 17, 2024
c85a00c
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 17, 2024
4b73ba9
Merge remote-tracking branch 'origin/main' into 1037-testvllm-impleme…
justinthelaw Sep 17, 2024
f7b2a50
reduced context window
justinthelaw Sep 17, 2024
1bef345
remove pytest cache Make target
justinthelaw Sep 17, 2024
8b2af46
vLLM deployment debugging
justinthelaw Sep 17, 2024
9dae852
revert formatting
justinthelaw Sep 17, 2024
d44a907
fix build, add better debugging steps
justinthelaw Sep 17, 2024
5af2d70
fix Kubectl commands
justinthelaw Sep 17, 2024
8befd3b
nvidia daemonset debug
justinthelaw Sep 17, 2024
32a1c31
set nvidia runtime as default
justinthelaw Sep 17, 2024
1e7aca1
check node issues
justinthelaw Sep 17, 2024
2464cc4
draft, node detailed describe
justinthelaw Sep 17, 2024
c7b4aa3
Update cuda-vector-add.yaml
justinthelaw Sep 17, 2024
2245c7c
Update cuda-vector-add.yaml
justinthelaw Sep 17, 2024
b1933c2
more cluster runner debugging
justinthelaw Sep 17, 2024
325f520
remove erroneous journal to command
justinthelaw Sep 18, 2024
87cc755
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 18, 2024
32ed63c
Merge remote-tracking branch 'origin/main' into 1037-testvllm-impleme…
justinthelaw Sep 18, 2024
5c13861
docker-level debug addition
justinthelaw Sep 18, 2024
e4e4611
downgrade CUDA version
justinthelaw Sep 18, 2024
32dad39
downgrade CUDA version, again
justinthelaw Sep 18, 2024
850100f
try root full
justinthelaw Sep 18, 2024
d1d6e48
try root, pt.2
justinthelaw Sep 18, 2024
df61e46
try root, pt.3
justinthelaw Sep 18, 2024
34926e9
different tests and logs
justinthelaw Sep 18, 2024
547a64b
typo
justinthelaw Sep 18, 2024
59ce6f6
revert to old daemonset version
justinthelaw Sep 18, 2024
284812d
typo
justinthelaw Sep 18, 2024
b222543
add config.toml to k3s image
justinthelaw Sep 18, 2024
76cccbc
get failure reason
justinthelaw Sep 18, 2024
d6aacf0
Merge branch 'main' into 1037-testvllm-implement-e2e-testing-for-vllm
justinthelaw Sep 18, 2024
c9e7840
just see if change in containerd config works
justinthelaw Sep 18, 2024
1514ead
Dockerfile changes, apply both tests
justinthelaw Sep 18, 2024
a437b7b
typo
justinthelaw Sep 18, 2024
66ef462
fix image tag, add NVIDIA capabilities all
justinthelaw Sep 18, 2024
c9d480c
align docker test, add node label
justinthelaw Sep 18, 2024
a32226a
add quotes, increase priv
justinthelaw Sep 18, 2024
db04bd0
Merge remote-tracking branch 'origin/main' into 1037-testvllm-impleme…
justinthelaw Sep 18, 2024
d199203
add nfd
justinthelaw Sep 18, 2024
bd89870
add nfd, pt.1
justinthelaw Sep 18, 2024
2ce805b
remove nfd
justinthelaw Sep 18, 2024
3be3648
remove set-as-default
justinthelaw Sep 18, 2024
9cf7d7f
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 18, 2024
5f777e0
Merge branch 'main' into 1037-testvllm-implement-e2e-testing-for-vllm
justinthelaw Sep 18, 2024
8d28084
refactor, unload drivers
justinthelaw Sep 18, 2024
c12aa82
script typo
justinthelaw Sep 18, 2024
6900dac
fix typos
justinthelaw Sep 18, 2024
3ab9228
slim k3d cluster, permission workaround
justinthelaw Sep 18, 2024
7dd8abf
k3d bootstrap match
justinthelaw Sep 18, 2024
79f8d30
k3d server name
justinthelaw Sep 18, 2024
2811359
nvidia wait-for
justinthelaw Sep 18, 2024
3cf42eb
remove extra stuff
justinthelaw Sep 18, 2024
331584e
pods out first
justinthelaw Sep 18, 2024
e7fdf7c
node out first, whoami
justinthelaw Sep 18, 2024
0662106
which k3d
justinthelaw Sep 18, 2024
6b04c55
sleep!
justinthelaw Sep 18, 2024
6110ec4
root user
justinthelaw Sep 18, 2024
9f9157c
root user, pt.2
justinthelaw Sep 18, 2024
664709b
revert vllm e2e GPU runner changes
justinthelaw Sep 18, 2024
f896e59
revert formatting changes
justinthelaw Sep 18, 2024
ef75a70
e2e tests made easier
justinthelaw Sep 18, 2024
2fcac88
Merge branch 'main' into 1037-testvllm-implement-e2e-testing-for-vllm
justinthelaw Sep 18, 2024
23c008e
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 18, 2024
d1d6540
e2e test Make target typo
justinthelaw Sep 18, 2024
2cfd164
Merge branch '1037-testvllm-implement-e2e-testing-for-vllm' of https:…
justinthelaw Sep 18, 2024
09510b7
zarf-config.yaml changes docs
justinthelaw Sep 18, 2024
1e89fac
add load_format
justinthelaw Sep 18, 2024
0568232
revert format e2e-llama-cpp-python.yaml
justinthelaw Sep 18, 2024
cc7ac6c
fixed Makefile typo
justinthelaw Sep 18, 2024
8a07080
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 18, 2024
f335be7
attempt merge with main
justinthelaw Sep 18, 2024
e0c0ac7
better clean-up
justinthelaw Sep 19, 2024
c90d820
add FinishReason enum back in
justinthelaw Sep 19, 2024
a1a03c1
passing unit tests
justinthelaw Sep 19, 2024
3da388f
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 19, 2024
3387974
Merge branch 'main' into 1037-testvllm-implement-e2e-testing-for-vllm
justinthelaw Sep 19, 2024
620e3b5
fixes GPU_LIMIT
justinthelaw Sep 20, 2024
09dd182
Merge remote-tracking branch 'origin/1037-testvllm-implement-e2e-test…
justinthelaw Sep 20, 2024
331a346
fixes load_format
justinthelaw Sep 20, 2024
6df5ebb
Merge branch 'main' into 1037-testvllm-implement-e2e-testing-for-vllm
justinthelaw Sep 20, 2024
304f659
Merge remote-tracking branch 'origin/1037-testvllm-implement-e2e-test…
justinthelaw Sep 20, 2024
cc46716
adds Docker container-only things
justinthelaw Sep 20, 2024
5ab0b99
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 20, 2024
da1399b
PR review fixes
justinthelaw Sep 20, 2024
59e1830
Merge remote-tracking branch 'origin/1037-testvllm-implement-e2e-test…
justinthelaw Sep 20, 2024
e963293
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 20, 2024
b9545b7
description for PROMPT_FORMAT*
justinthelaw Sep 20, 2024
5a6d59f
makefile clean improvements, add bundle configs
justinthelaw Sep 20, 2024
396370a
variabilize PYTHON_VERSION in vllm Dockerfile
justinthelaw Sep 20, 2024
b023dfa
missing download sub-cmd
justinthelaw Sep 20, 2024
f24180d
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 20, 2024
8ba6dcb
variabilize vllm directory
justinthelaw Sep 20, 2024
0186ad0
Merge branch '835-upgrade-vllm-for-gptq-bfloat16-inferencing' of http…
justinthelaw Sep 20, 2024
9791cb6
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 20, 2024
6e1ca0c
fix release.yaml
justinthelaw Sep 20, 2024
858b64f
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 20, 2024
ced0797
Update e2e-registry1-weekly.yaml
justinthelaw Sep 20, 2024
89d0d69
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 20, 2024
dd9b1bc
Update e2e-registry1-weekly.yaml
justinthelaw Sep 20, 2024
2bf474c
Update e2e-registry1-weekly.yaml
justinthelaw Sep 20, 2024
6effe8c
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 23, 2024
1641379
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 23, 2024
ca0ff03
update to 0.13.0, fix versioning
justinthelaw Sep 23, 2024
d365660
fix registry1 workflow, add prints
justinthelaw Sep 23, 2024
bdda602
merge with registry1 workflow
justinthelaw Sep 23, 2024
2e24a6b
chainguard login, fix registry1 uds setup
justinthelaw Sep 23, 2024
280927a
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Sep 23, 2024
bd3d7ff
Merge branch 'main' into chore-update-registry1-weekly-bundle-0.13.0
justinthelaw Sep 23, 2024
686e755
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 23, 2024
e109740
fix permissions
justinthelaw Sep 23, 2024
b4b767e
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Sep 23, 2024
7948e33
fix permissions, pt.2
justinthelaw Sep 23, 2024
c468b2c
fix permissions
justinthelaw Sep 23, 2024
44d4a0e
centralize integration llm config, no-cache-dir
justinthelaw Sep 23, 2024
9c1811c
merge with testing branch, pt.1
justinthelaw Sep 23, 2024
c0af7c7
centralize integration llm config, pt.2
justinthelaw Sep 23, 2024
6c24d34
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Sep 23, 2024
332d348
better make clean-all
justinthelaw Sep 23, 2024
14ab833
complete overhaul of registry1 weekly
justinthelaw Sep 23, 2024
3caed3a
revert formatting
justinthelaw Sep 23, 2024
c50e16a
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Sep 23, 2024
cbbdc20
update yq command for zarf.yaml
justinthelaw Sep 23, 2024
8518d71
yq sub typo
justinthelaw Sep 23, 2024
f11dd73
go back to using latest bundle
justinthelaw Sep 23, 2024
4079620
package create modifications
justinthelaw Sep 23, 2024
dd52e03
typo UDS zarf package create
justinthelaw Sep 23, 2024
a4fb386
correct bundle pointers and mutation
justinthelaw Sep 23, 2024
7192692
different zarf package ref location
justinthelaw Sep 23, 2024
d465753
log level debug
justinthelaw Sep 23, 2024
58b67c6
confirm missing C lib, more dynamic API create
justinthelaw Sep 24, 2024
25a1223
README improvement
justinthelaw Sep 24, 2024
9185ebf
README improvement, pt.2
justinthelaw Sep 24, 2024
5ff7f1c
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Sep 24, 2024
ee08217
0.13.0, merge with test branch
justinthelaw Sep 24, 2024
982533f
more FinishReason exception throwing
justinthelaw Sep 24, 2024
4c4b0b6
fix class method on FinishReason
justinthelaw Sep 24, 2024
78efedb
change method name
justinthelaw Sep 24, 2024
55546a7
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 25, 2024
c7ca585
Merge branch 'main' into chore-update-registry1-weekly-bundle-0.13.0
justinthelaw Sep 25, 2024
072427a
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 25, 2024
5e545f6
Merge branch 'main' into chore-update-registry1-weekly-bundle-0.13.0
justinthelaw Sep 25, 2024
91da0ce
modify release-please-config
justinthelaw Sep 25, 2024
240e2c1
weekly sunday 12AM pst
justinthelaw Sep 25, 2024
d673244
move install to JIT
justinthelaw Sep 25, 2024
81c598c
remove udsCliVersion
justinthelaw Sep 25, 2024
301e9dd
comment typo
justinthelaw Sep 25, 2024
340414f
add v to registry ref
justinthelaw Sep 25, 2024
8c4e194
Merge branch 'main' into chore-update-registry1-weekly-bundle-0.13.0
justinthelaw Sep 25, 2024
beb643f
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 25, 2024
3defb55
better sub yq cmd
justinthelaw Sep 25, 2024
4fdec61
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Sep 25, 2024
da1e466
add failure logging
justinthelaw Sep 25, 2024
b2b6905
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Sep 25, 2024
3cfecf0
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 26, 2024
94d2385
Merge branch 'main' into chore-update-registry1-weekly-bundle-0.13.0
justinthelaw Sep 26, 2024
ccd99e9
Merge branch 'main' into chore-update-registry1-weekly-bundle-0.13.0
justinthelaw Sep 26, 2024
26932de
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 26, 2024
649406d
Update release-please-config.json
justinthelaw Sep 27, 2024
20a73b7
Update and rename e2e-registry1-weekly.yaml to weekly-registry1-e2e-t…
justinthelaw Sep 27, 2024
a4f4c0f
Update and rename weekly-registry1-e2e-testing.yaml to weekly-registr…
justinthelaw Sep 27, 2024
ab5871d
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 27, 2024
0928698
Merge branch 'main' into chore-update-registry1-weekly-bundle-0.13.0
justinthelaw Sep 27, 2024
757166e
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Sep 27, 2024
5cca687
0.13.1
justinthelaw Sep 27, 2024
db7193e
Merge remote-tracking branch 'origin/main' into 835-upgrade-vllm-for-…
justinthelaw Sep 27, 2024
c878283
Merge branch 'main' into chore-update-registry1-weekly-bundle-0.13.0
justinthelaw Sep 27, 2024
be13c59
filename typo
justinthelaw Sep 27, 2024
1264c4c
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Sep 27, 2024
db7e27a
make target typo
justinthelaw Sep 27, 2024
ef2f559
env variabilized
justinthelaw Sep 27, 2024
bcc1287
make target just does not work
justinthelaw Sep 27, 2024
03837c9
image_versions explicit set
justinthelaw Sep 27, 2024
8e4faf3
image_versions explicit set, pt.2
justinthelaw Sep 27, 2024
7a3c365
Merge branch 'main' into chore-update-registry1-weekly-bundle-0.13.0
justinthelaw Sep 27, 2024
f77bcfe
use version pattern from release.yaml
justinthelaw Sep 27, 2024
37093dd
merge and resolve release conflict
justinthelaw Sep 27, 2024
14351c1
remove the v
justinthelaw Sep 27, 2024
46174ed
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Sep 27, 2024
ce4c30f
Merge branch 'main' into chore-update-registry1-weekly-bundle-0.13.0
justinthelaw Sep 30, 2024
f502e06
fix lint
justinthelaw Sep 30, 2024
af8c971
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Sep 30, 2024
fd2c153
cutover to utils.client.py
justinthelaw Oct 1, 2024
d22439e
Merge branch 'main' into chore-update-registry1-weekly-bundle-0.13.0
justinthelaw Oct 1, 2024
5c493ea
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Oct 1, 2024
ae68868
cutover to utils.client.py, pt.2
justinthelaw Oct 1, 2024
2acb604
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Oct 1, 2024
ca55f72
cutover to utils.client.py, pt.3
justinthelaw Oct 1, 2024
a42c320
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Oct 1, 2024
807fbdc
fix text embeddings backend full
justinthelaw Oct 1, 2024
abff6bd
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Oct 1, 2024
a9c34fb
remove extraneous env
justinthelaw Oct 1, 2024
8caf64f
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Oct 1, 2024
a6f0af0
add get_supabase_url, default model warnings
justinthelaw Oct 1, 2024
0b291f4
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Oct 1, 2024
d590268
supabase base url incorrect
justinthelaw Oct 1, 2024
54af6dc
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Oct 1, 2024
17e20fa
supabase_url in wrong position
justinthelaw Oct 1, 2024
2c3b7f1
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Oct 1, 2024
76efca3
Merge remote-tracking branch 'origin/main' into chore-update-registry…
justinthelaw Oct 1, 2024
1211e69
fastapi status code usage
justinthelaw Oct 1, 2024
7e6bdb2
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Oct 1, 2024
a4c5ace
FinishReason _missing_ class method
justinthelaw Oct 1, 2024
5ee07cf
new missing JWT
justinthelaw Oct 1, 2024
df60811
Merge remote-tracking branch 'origin/chore-update-registry1-weekly-bu…
justinthelaw Oct 1, 2024
0c12449
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Oct 1, 2024
99c27c9
missing ZARF VAR passthrough to values
justinthelaw Oct 2, 2024
c106e10
more clarity in the README
justinthelaw Oct 3, 2024
d92b572
Merge branch 'main' into 835-upgrade-vllm-for-gptq-bfloat16-inferencing
justinthelaw Oct 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/release/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ runs:
run: |
docker buildx build --build-arg LOCAL_VERSION=${{ inputs.releaseTag }} -t ghcr.io/defenseunicorns/leapfrogai/vllm:${{ inputs.releaseTag }} --push -f packages/vllm/Dockerfile .

zarf package create packages/vllm --set=IMAGE_VERSION=${{ inputs.releaseTag }} --flavor upstream --confirm
ZARF_CONFIG=packages/vllm/zarf-config.yaml zarf package create packages/vllm --set=IMAGE_VERSION=${{ inputs.releaseTag }} --flavor upstream --confirm

zarf package publish zarf-package-vllm-amd64-${{ inputs.releaseTag }}.tar.zst oci://ghcr.io/defenseunicorns/packages${{ inputs.subRepository }}leapfrogai

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/e2e-vllm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,4 +88,4 @@ jobs:
##########
- name: Build vLLM
run: |
make build-vllm LOCAL_VERSION=e2e-test
make build-vllm LOCAL_VERSION=e2e-test ZARF_CONFIG=packages/vllm/zarf-config.yaml
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ build-vllm: local-registry docker-vllm ## Build the vllm container and Zarf pack
docker push ${DOCKER_FLAGS} localhost:${REG_PORT}/defenseunicorns/leapfrogai/vllm:${LOCAL_VERSION}

## Build the Zarf package
uds zarf package create packages/vllm --flavor ${FLAVOR} -a ${ARCH} -o packages/vllm --registry-override=ghcr.io=localhost:${REG_PORT} --insecure --set IMAGE_VERSION=${LOCAL_VERSION} ${ZARF_FLAGS} --confirm
ZARF_CONFIG=packages/vllm/zarf-config.yaml uds zarf package create packages/vllm --flavor ${FLAVOR} -a ${ARCH} -o packages/vllm --registry-override=ghcr.io=localhost:${REG_PORT} --insecure --set IMAGE_VERSION=${LOCAL_VERSION} ${ZARF_FLAGS} --confirm

docker-text-embeddings: sdk-wheel
## Build the image (and tag it for the local registry)
Expand Down Expand Up @@ -263,7 +263,7 @@ silent-deploy-llama-cpp-python-package:
silent-deploy-vllm-package:
@echo "Starting VLLM deployment..."
@mkdir -p .logs
@uds zarf package deploy packages/vllm/zarf-package-vllm-${ARCH}-${LOCAL_VERSION}.tar.zst ${ZARF_FLAGS} --confirm > .logs/deploy-vllm.log 2>&1
@ZARF_CONFIG=packages/vllm/zarf-config.yaml uds zarf package deploy packages/vllm/zarf-package-vllm-${ARCH}-${LOCAL_VERSION}.tar.zst ${ZARF_FLAGS} --confirm > .logs/deploy-vllm.log 2>&1
@echo "VLLM deployment completed"

silent-deploy-text-embeddings-package:
Expand Down
27 changes: 25 additions & 2 deletions bundles/dev/gpu/uds-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,31 @@ variables:
gpu_limit: 0 # runs on CPU until GPU limit is increased

vllm:
gpu_limit: 1 # if <1, vllm won't work, VLLM is GPU only
#tensor_parallel_size: 1 # TODO: reintroduce when vllm changes get pulled in
trust_remote_code: "True"
tensor_parallel_size: "1"
enforce_eager: "False"
gpu_memory_utilization: "0.90"
worker_use_ray: "True"
engine_use_ray: "True"
quantization: "None"
load_format: "auto"
# LeapfrogAI SDK runtime configuration (usually influenced by config.yaml in development)
max_context_length: "32768"
stop_tokens: "</s>, <|im_end|>, <|endoftext|>"
prompt_format_chat_system: "SYSTEM: {}\n"
prompt_format_chat_user: "USER: {}\n"
prompt_format_chat_assistant: "ASSISTANT: {}\n"
temperature: "0.1"
top_p: "1.0"
top_k: "0"
repetition_penalty: "1.0"
max_new_tokens: "8192"
# Pod deployment configuration
gpu_limit: "1"
gpu_runtime: "nvidia"
pvc_size: "15Gi"
pvc_access_mode: "ReadWriteOnce"
pvc_storage_class: "local-path"

supabase:
domain: "uds.dev"
Expand Down
27 changes: 25 additions & 2 deletions bundles/latest/gpu/uds-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,31 @@ variables:
gpu_limit: 0 # runs on CPU until GPU limit is increased

vllm:
gpu_limit: 1 # if <1, vllm won't work, VLLM is GPU only
#tensor_parallel_size: 1 # TODO: reintroduce when vllm changes get pulled in
trust_remote_code: "True"
tensor_parallel_size: "1"
enforce_eager: "False"
gpu_memory_utilization: "0.90"
worker_use_ray: "True"
engine_use_ray: "True"
quantization: "None"
load_format: "auto"
# LeapfrogAI SDK runtime configuration (usually influenced by config.yaml in development)
max_context_length: "32768"
stop_tokens: "</s>, <|im_end|>, <|endoftext|>"
prompt_format_chat_system: "SYSTEM: {}\n"
prompt_format_chat_user: "USER: {}\n"
prompt_format_chat_assistant: "ASSISTANT: {}\n"
temperature: "0.1"
top_p: "1.0"
top_k: "0"
repetition_penalty: "1.0"
max_new_tokens: "8192"
# Pod deployment configuration
gpu_limit: "1"
gpu_runtime: "nvidia"
pvc_size: "15Gi"
pvc_access_mode: "ReadWriteOnce"
pvc_storage_class: "local-path"

supabase:
domain: "uds.dev"
Expand Down
8 changes: 4 additions & 4 deletions docs/DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,20 @@ Please first see the pre-requisites listed on the LeapfrogAI documentation websi

It is **_HIGHLY RECOMMENDED_** that PyEnv be installed on your machine, and a new virtual environment is created for every new development branch.

Follow the installation instructions outlined in the [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#installation) repository to install Python 3.11.6:
Follow the installation instructions outlined in the [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#installation) repository to install Python 3.11.9:

```bash
# install the correct python version
pyenv install 3.11.6
pyenv install 3.11.9

# create a new virtual environment named "leapfrogai"
pyenv virtualenv 3.11.6 leapfrogai
pyenv virtualenv 3.11.9 leapfrogai

# activate the virtual environment
pyenv activate leapfrogai
```

If your installation process completes successfully but indicates missing packages such as `sqlite3`, execute the following command to install the required packages then proceed with the reinstallation of Python 3.11.6:
If your installation process completes successfully but indicates missing packages such as `sqlite3`, execute the following command to install the required packages then proceed with the reinstallation of Python 3.11.9:

```bash
sudo apt-get install build-essential zlib1g-dev libffi-dev \
Expand Down
25 changes: 12 additions & 13 deletions packages/vllm/.env.example
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
export LAI_HF_HUB_ENABLE_HF_TRANSFER="1"
export LAI_REPO_ID="TheBloke/Synthia-7B-v2.0-GPTQ"
export LAI_REVISION="gptq-4bit-32g-actorder_True"
export LAI_QUANTIZATION="gptq"
export LAI_TENSOR_PARALLEL_SIZE=1
export LAI_MODEL_SOURCE=".model/"
export LAI_MAX_CONTEXT_LENGTH=32768
export LAI_STOP_TOKENS='["</s>","<|endoftext|>","<|im_end|>"]'
export LAI_PROMPT_FORMAT_CHAT_SYSTEM="SYSTEM: {}\n"
export LAI_PROMPT_FORMAT_CHAT_ASSISTANT="ASSISTANT: {}\n"
export LAI_PROMPT_FORMAT_CHAT_USER="USER: {}\n"
export LAI_PROMPT_FORMAT_DEFAULTS_TOP_P=1.0
export LAI_PROMPT_FORMAT_DEFAULTS_TOP_K=0
LFAI_REPO_ID="TheBloke/SynthIA-7B-v2.0-GPTQ"
LFAI_REVISION="gptq-4bit-32g-actorder_True"

VLLM_TENSOR_PARALLEL_SIZE=1
VLLM_TRUST_REMOTE_CODE=True
VLLM_MAX_CONTEXT_LENGTH=32768
VLLM_ENFORCE_EAGER=False
VLLM_GPU_MEMORY_UTILIZATION=0.90
VLLM_WORKER_USE_RAY=True
VLLM_ENGINE_USE_RAY=True
VLLM_QUANTIZATION=None
VLLM_LOAD_FORMAT=auto
54 changes: 15 additions & 39 deletions packages/vllm/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ FROM nvidia/cuda:12.2.2-devel-ubuntu22.04 AS builder
# set SDK location
# set the pyenv and Python versions
justinthelaw marked this conversation as resolved.
Show resolved Hide resolved
ARG SDK_DEST=src/leapfrogai_sdk/build \
PYTHON_VERSION=3.11.6 \
PYENV_GIT_TAG=v2.4.8
PYTHON_VERSION=3.11.9 \
PYENV_GIT_TAG=v2.4.8\
COMPONENT_DIRECTORY="packages/vllm"

# use root user for deps installation and nonroot user creation
USER root
Expand Down Expand Up @@ -41,7 +42,7 @@ USER nonroot
# copy-in SDK from sdk stage and vllm source code from host
WORKDIR /home/leapfrogai
COPY --from=sdk --chown=nonroot:nonroot /leapfrogai/${SDK_DEST} ./${SDK_DEST}
COPY --chown=nonroot:nonroot packages/vllm packages/vllm
COPY --chown=nonroot:nonroot ${COMPONENT_DIRECTORY} packages/vllm

# create virtual environment for light-weight portability and minimal libraries
RUN curl https://pyenv.run | bash && \
Expand All @@ -54,10 +55,10 @@ RUN curl https://pyenv.run | bash && \
ENV PYENV_ROOT="/home/nonroot/.pyenv" \
PATH="/home/nonroot/.pyenv/bin:$PATH"

# Install Python 3.11.6, set it as global, and create a venv
# Install Python, set it as global, and create a venv
RUN . ~/.bashrc && \
PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.11.6 && \
pyenv global 3.11.6 && \
PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.11.9 && \
pyenv global ${PYTHON_VERSION} && \
pyenv exec python -m venv .venv

# set path to venv python
Expand All @@ -67,26 +68,15 @@ RUN rm -f packages/vllm/build/*.whl && \
python -m pip wheel packages/vllm -w packages/vllm/build --find-links=${SDK_DEST} && \
pip install packages/vllm/build/lfai_vllm*.whl --no-index --find-links=packages/vllm/build/

#################
# FINAL CONTAINER
#################

FROM nvidia/cuda:12.2.2-runtime-ubuntu22.04

# set SDK location
ARG SDK_DEST=src/leapfrogai_sdk/build

# model-specific arguments
ARG ARG HF_HUB_ENABLE_HF_TRANSFER="1" \
REPO_ID="TheBloke/Synthia-7B-v2.0-GPTQ" \
REVISION="gptq-4bit-32g-actorder_True" \
MODEL_SOURCE="/data/.model/" \
MAX_CONTEXT_LENGTH=32768 \
STOP_TOKENS='["</s>"]' \
PROMPT_FORMAT_CHAT_SYSTEM="SYSTEM: {}\n" \
PROMPT_FORMAT_CHAT_USER="USER: {}\n" \
PROMPT_FORMAT_CHAT_ASSISTANT="ASSISTANT: {}\n" \
PROMPT_FORMAT_DEFAULTS_TOP_P=1.0 \
PROMPT_FORMAT_DEFAULTS_TOP_K=0 \
TENSOR_PARALLEL_SIZE=1 \
QUANTIZATION="gptq"

# setup nonroot user and permissions
USER root
RUN groupadd -g 65532 vglusers && \
Expand All @@ -101,24 +91,10 @@ COPY --from=sdk --chown=nonroot:nonroot /leapfrogai/${SDK_DEST} ./${SDK_DEST}
COPY --from=builder --chown=nonroot:nonroot /home/leapfrogai/.venv /home/leapfrogai/.venv
COPY --from=builder --chown=nonroot:nonroot /home/leapfrogai/packages/vllm/src /home/leapfrogai/packages/vllm/src
# copy-in python binaries
COPY --from=builder --chown=nonroot:nonroot /home/nonroot/.pyenv/versions/3.11.6/ /home/nonroot/.pyenv/versions/3.11.6/

# load ARG values into env variables for pickup by confz
ENV LAI_HF_HUB_ENABLE_HF_TRANSFER=${HF_HUB_ENABLE_HF_TRANSFER} \
LAI_REPO_ID=${REPO_ID} \
LAI_REVISION=${REVISION} \
LAI_MODEL_SOURCE=${MODEL_SOURCE} \
LAI_MAX_CONTEXT_LENGTH=${MAX_CONTEXT_LENGTH} \
LAI_STOP_TOKENS=${STOP_TOKENS} \
LAI_PROMPT_FORMAT_CHAT_SYSTEM=${PROMPT_FORMAT_CHAT_SYSTEM} \
LAI_PROMPT_FORMAT_CHAT_USER=${PROMPT_FORMAT_CHAT_USER} \
LAI_PROMPT_FORMAT_CHAT_ASSISTANT=${PROMPT_FORMAT_CHAT_ASSISTANT} \
LAI_PROMPT_FORMAT_DEFAULTS_TOP_P=${PROMPT_FORMAT_DEFAULTS_TOP_P} \
LAI_PROMPT_FORMAT_DEFAULTS_TOP_K=${PROMPT_FORMAT_DEFAULTS_TOP_K} \
LAI_TENSOR_PARALLEL_SIZE=${TENSOR_PARALLEL_SIZE} \
LAI_QUANTIZATION=${QUANTIZATION} \
# remove vLLM callback to stats server
VLLM_NO_USAGE_STATS=1
COPY --from=builder --chown=nonroot:nonroot /home/nonroot/.pyenv/versions/${PYTHON_VERSION}/ /home/nonroot/.pyenv/versions/${PYTHON_VERSION}/

# remove vLLM callback to stats server
ENV VLLM_NO_USAGE_STATS=1

ENV PATH="/home/leapfrogai/.venv/bin:$PATH"

Expand Down
25 changes: 23 additions & 2 deletions packages/vllm/Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,27 @@
ARCH ?= amd64
LOCAL_VERSION ?= $(shell git rev-parse --short HEAD)
DOCKER_FLAGS :=

install:
python -m pip install ../../src/leapfrogai_sdk
python -m pip install -e ".[dev]"

dev:
python -m leapfrogai_sdk.cli --app-dir=src/ main:Model
download:
@env $$(cat .env | xargs) python src/model_download.py

dev: download
@env $$(cat .env | xargs) python -m leapfrogai_sdk.cli --app-dir=src/ main:Model

docker: download
docker build ${DOCKER_FLAGS} \
--platform=linux/${ARCH} \
--build-arg LOCAL_VERSION=${LOCAL_VERSION} \
--build-arg COMPONENT_DIRECTORY="./" \
-t ghcr.io/defenseunicorns/leapfrogai/vllm:${LOCAL_VERSION} \
-f ./Dockerfile .

docker run -it --rm \
--env-file ./.env \
-v $(PWD)/config.yaml:/home/leapfrogai/config.yaml \
-v $(PWD)/.model:/home/leapfrogai/.model \
ghcr.io/defenseunicorns/leapfrogai/vllm:${LOCAL_VERSION}
53 changes: 46 additions & 7 deletions packages/vllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,21 @@ See the LeapfrogAI documentation website for [system requirements](https://docs.

The default model that comes with this backend in this repository's officially released images is a [4-bit quantization of the Synthia-7b model](https://huggingface.co/TheBloke/SynthIA-7B-v2.0-GPTQ).

You can optionally specify different models or quantization types using the following Docker build arguments:
All of the commands in this sub-section are executed within this `packages/vllm` sub-directory.

- `--build-arg HF_HUB_ENABLE_HF_TRANSFER="1"`: Enable or disable HuggingFace Hub transfer (default: 1)
- `--build-arg REPO_ID="TheBloke/Synthia-7B-v2.0-GPTQ"`: HuggingFace repository ID for the model
- `--build-arg REVISION="gptq-4bit-32g-actorder_True"`: Revision or commit hash for the model
- `--build-arg QUANTIZATION="gptq"`: Quantization type (e.g., gptq, awq, or empty for un-quantized)
- `--build-arg TENSOR_PARALLEL_SIZE="1"`: The number of gpus to spread the tensor processing across
Optionally, you can specify a different model during Zarf creation:

```bash
uds zarf package create --confirm --set MODEL_REPO_ID=defenseunicorns/Hermes-2-Pro-Mistral-7B-4bit-32g --set MODEL_REVISION=main
```

If you decide to use a different model, there will likely be a need to change generation and engine runtime configurations, please see the [Zarf Package Config](./zarf-config.yaml) and the [values override file](./values/upstream-values.yaml) for details on what runtime parameters can be modified. These parameters are model-specific, and can be found in the HuggingFace model cards and/or configuration files (e.g., prompt templates).

For example, during Zarf deployment, you can override the Zarf Package Config defaults by doing the following:

```bash
uds zarf package deploy zarf-package-vllm-amd64-dev.tar.zst --confirm --set ENFORCE_EAGER=True
```

### Deployment

Expand All @@ -39,11 +47,26 @@ uds zarf package deploy packages/vllm/zarf-package-vllm-*-dev.tar.zst --confirm

### Local Development

To run the vllm backend locally:
In local development the [config.yaml](./config.yaml) and [.env.example](./.env.example) must be modified if the model has changed away from the default. The LeapfrogAI SDK picks up the `config.yaml` automatically, and the `.env` must be sourced into the Python environment.

> [!IMPORTANT]
> Execute the following commands from this sub-directory

Create a `.env` file based on the [`.env.example`](./.env.example):

```bash
cp .env.example .env
source .env
```

As necessary, modify the existing [`config.yaml`](./config.yaml):

```bash
vim config.yaml
```

To run the vllm backend locally:

```bash
# Install dev and runtime dependencies
make install
Expand All @@ -54,3 +77,19 @@ python src/model_download.py
# Start the model backend
make dev
```

#### Local Docker Container

To run the Docker container, use the following Makefile commands. `LOCAL_VERSION` must be consistent across the two Make commands.

In the root of the LeapfrogAI repository:

```bash
LOCAL_VERSION=dev make sdk-wheel
```

In the root of this vLLM sub-directory:

```bash
LOCAL_VERSION=dev make docker
```
12 changes: 11 additions & 1 deletion packages/vllm/chart/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ spec:
[
"sh",
"-c",
'while [ ! -f /data/.model/###ZARF_DATA_INJECTION_MARKER### ]; do echo "waiting for zarf data sync" && sleep 1; done; echo "we are done waiting!"',
'while [ ! -f ###ZARF_CONST_MODEL_PATH###/###ZARF_DATA_INJECTION_MARKER### ]; do echo "waiting for zarf data sync" && sleep 1; done; echo "we are done waiting!"',
]
resources:
{{- toYaml .Values.modelInjectionContainer.resources | nindent 12 }}
Expand All @@ -46,6 +46,9 @@ spec:
- name: leapfrogai-pv-storage
persistentVolumeClaim:
claimName: lfai-{{ .Values.nameOverride }}-pv-claim
- name: leapfrogai-sdk-configmap
configMap:
name: "{{ .Values.nameOverride }}-sdk-configmap"
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
Expand All @@ -58,6 +61,9 @@ spec:
env:
{{- toYaml . | nindent 12 }}
{{- end }}
envFrom:
- configMapRef:
name: "{{ .Values.nameOverride }}-engine-configmap"
ports:
- name: http
containerPort: {{ .Values.service.port }}
Expand All @@ -67,6 +73,10 @@ spec:
volumeMounts:
- name: leapfrogai-pv-storage
mountPath: "/data"
- name: leapfrogai-sdk-configmap
mountPath: "/home/leapfrogai/config.yaml"
subPath: "config.yaml"
readOnly: true
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
Expand Down
Loading