Releases: bentoml/OpenLLM
v0.4.42
Installation
pip install openllm==0.4.42
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.42
Usage
All available models: openllm models
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.42 start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
- docs: Update opt example to ms-phi by @Sherlock113 in #805
- chore(script): run vendored scripts by @aarnphm in #808
- docs: README.md typo by @weibeu in #819
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #818
- chore(deps): bump docker/metadata-action from 5.3.0 to 5.4.0 by @dependabot in #814
- chore(deps): bump taiki-e/install-action from 2.22.5 to 2.23.1 by @dependabot in #813
- chore(deps): bump github/codeql-action from 3.22.11 to 3.22.12 by @dependabot in #815
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #825
- chore(deps): bump crazy-max/ghaction-import-gpg from 6.0.0 to 6.1.0 by @dependabot in #824
- chore(deps): bump taiki-e/install-action from 2.23.1 to 2.23.7 by @dependabot in #823
- docs: Add Llamaindex in freedom to build by @Sherlock113 in #826
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #836
- chore(deps): bump docker/metadata-action from 5.4.0 to 5.5.0 by @dependabot in #834
- chore(deps): bump aquasecurity/trivy-action from 0.16.0 to 0.16.1 by @dependabot in #832
- chore(deps): bump taiki-e/install-action from 2.23.7 to 2.24.1 by @dependabot in #833
- chore(deps): bump vllm to 0.2.7 by @aarnphm in #837
- chore: update discord link by @aarnphm in #838
- improv(package): use python slim base image and let pytorch install cuda by @larme in #807
- fix(dockerfile): conflict deps by @aarnphm in #841
- chore: fix typo in list_models pydoc by @fuzzie360 in #847
- docs: update README.md telemetry code link by @fuzzie360 in #842
- chore(deps): bump taiki-e/install-action from 2.24.1 to 2.25.1 by @dependabot in #846
- chore(deps): bump github/codeql-action from 3.22.12 to 3.23.0 by @dependabot in #844
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #848
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #858
- chore(deps): bump taiki-e/install-action from 2.25.1 to 2.25.9 by @dependabot in #856
- chore(deps): bump github/codeql-action from 3.23.0 to 3.23.1 by @dependabot in #855
- fix: proper SSE handling for vllm by @larme in #877
- chore: set stop to empty list by default by @larme in #878
- fix: all runners sse output by @larme in #880
New Contributors
- @weibeu made their first contribution in #819
- @fuzzie360 made their first contribution in #847
Full Changelog: v0.4.41...v0.4.42
v0.4.41
GPTQ Supports
vLLM backend now support GPTQ with upstream
openlml start TheBloke/Mistral-7B-Instruct-v0.2-GPTQ --backend vllm --quantise gptq
Installation
pip install openllm==0.4.41
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.41
Usage
All available models: openllm models
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.41 start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
- docs: add notes about dtypes usage. by @aarnphm in #786
- chore(deps): bump taiki-e/install-action from 2.22.0 to 2.22.5 by @dependabot in #790
- chore(deps): bump github/codeql-action from 2.22.9 to 3.22.11 by @dependabot in #794
- chore(deps): bump sigstore/cosign-installer from 3.2.0 to 3.3.0 by @dependabot in #793
- chore(deps): bump actions/download-artifact from 3.0.2 to 4.0.0 by @dependabot in #791
- chore(deps): bump actions/upload-artifact from 3.1.3 to 4.0.0 by @dependabot in #792
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #796
- fix(cli): avoid runtime
__origin__
check for older Python by @aarnphm in #798 - feat(vllm): support GPTQ with 0.2.6 by @aarnphm in #797
- fix(ci): lock to v3 iteration of
actions/artifacts
workflow by @aarnphm in #799
Full Changelog: v0.4.40...v0.4.41
v0.4.40
Installation
pip install openllm==0.4.40
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.40
Usage
All available models: openllm models
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.40 start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
- fix(infra): conform ruff to 150 LL by @aarnphm in #781
- infra: update blame ignore to formatter hash by @aarnphm in #782
- perf: upgrade mixtral to use expert parallelism by @aarnphm in #783
Full Changelog: v0.4.39...v0.4.40
v0.4.39
Installation
pip install openllm==0.4.39
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.39
Usage
All available models: openllm models
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.39 start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
Full Changelog: v0.4.38...v0.4.39
v0.4.38
Installation
pip install openllm==0.4.38
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.38
Usage
All available models: openllm models
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.38 start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
- fix(mixtral): correct chat templates to remove additional spacing by @aarnphm in #774
- fix(cli): correct set arguments for
openllm import
andopenllm build
by @aarnphm in #775 - fix(mixtral): setup hack atm to load weights from pt specifically instead of safetensors by @aarnphm in #776
Full Changelog: v0.4.37...v0.4.38
v0.4.37
Installation
pip install openllm==0.4.37
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.37
Usage
All available models: openllm models
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.37 start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
- feat(mixtral): correct support for mixtral by @aarnphm in #772
- chore: running all script when installation by @aarnphm in #773
Full Changelog: v0.4.36...v0.4.37
v0.4.36
Mixtral supports
Supports Mixtral on BentoCloud with vLLM
and all required dependencies.
Bento built with openllm now defaults to python 3.11 for this change to work.
Installation
pip install openllm==0.4.36
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.36
Usage
All available models: openllm models
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.36 start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
- feat(openai): supports echo by @aarnphm in #760
- fix(openai): logprobs when echo is enabled by @aarnphm in #761
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #767
- chore(deps): bump docker/metadata-action from 5.2.0 to 5.3.0 by @dependabot in #766
- chore(deps): bump actions/setup-python from 4.7.1 to 5.0.0 by @dependabot in #765
- chore(deps): bump taiki-e/install-action from 2.21.26 to 2.22.0 by @dependabot in #764
- chore(deps): bump aquasecurity/trivy-action from 0.14.0 to 0.16.0 by @dependabot in #763
- chore(deps): bump github/codeql-action from 2.22.8 to 2.22.9 by @dependabot in #762
- feat: mixtral support by @aarnphm in #770
Full Changelog: v0.4.35...v0.4.36
v0.4.35
Installation
pip install openllm==0.4.35
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.35
Usage
All available models: openllm models
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.35 start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
- chore(deps): bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 by @dependabot in #749
- chore(deps): bump docker/metadata-action from 5.0.0 to 5.2.0 by @dependabot in #751
- chore(deps): bump taiki-e/install-action from 2.21.19 to 2.21.26 by @dependabot in #750
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #753
- fix(logprobs): explicitly set logprobs=None by @aarnphm in #757
Full Changelog: v0.4.34...v0.4.35
v0.4.34
Installation
pip install openllm==0.4.34
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.34
Usage
All available models: openllm models
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.34 start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
- feat(models): Support qwen by @yansheng105 in #742
New Contributors
- @yansheng105 made their first contribution in #742
Full Changelog: v0.4.33...v0.4.34
v0.4.33
Installation
pip install openllm==0.4.33
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.33
Usage
All available models: openllm models
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.33 start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
Full Changelog: v0.4.32...v0.4.33