Skip to content

Releases: bentoml/OpenLLM

v0.4.42

02 Feb 12:31
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.42

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.42

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.42 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

New Contributors

Full Changelog: v0.4.41...v0.4.42

v0.4.41

18 Dec 18:18
Compare
Choose a tag to compare

GPTQ Supports

vLLM backend now support GPTQ with upstream

openlml start TheBloke/Mistral-7B-Instruct-v0.2-GPTQ --backend vllm --quantise gptq

Installation

pip install openllm==0.4.41

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.41

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.41 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • docs: add notes about dtypes usage. by @aarnphm in #786
  • chore(deps): bump taiki-e/install-action from 2.22.0 to 2.22.5 by @dependabot in #790
  • chore(deps): bump github/codeql-action from 2.22.9 to 3.22.11 by @dependabot in #794
  • chore(deps): bump sigstore/cosign-installer from 3.2.0 to 3.3.0 by @dependabot in #793
  • chore(deps): bump actions/download-artifact from 3.0.2 to 4.0.0 by @dependabot in #791
  • chore(deps): bump actions/upload-artifact from 3.1.3 to 4.0.0 by @dependabot in #792
  • ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #796
  • fix(cli): avoid runtime __origin__ check for older Python by @aarnphm in #798
  • feat(vllm): support GPTQ with 0.2.6 by @aarnphm in #797
  • fix(ci): lock to v3 iteration of actions/artifacts workflow by @aarnphm in #799

Full Changelog: v0.4.40...v0.4.41

v0.4.40

15 Dec 16:57
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.40

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.40

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.40 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • fix(infra): conform ruff to 150 LL by @aarnphm in #781
  • infra: update blame ignore to formatter hash by @aarnphm in #782
  • perf: upgrade mixtral to use expert parallelism by @aarnphm in #783

Full Changelog: v0.4.39...v0.4.40

v0.4.39

14 Dec 19:30
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.39

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.39

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.39 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

Full Changelog: v0.4.38...v0.4.39

v0.4.38

13 Dec 23:36
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.38

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.38

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.38 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • fix(mixtral): correct chat templates to remove additional spacing by @aarnphm in #774
  • fix(cli): correct set arguments for openllm import and openllm build by @aarnphm in #775
  • fix(mixtral): setup hack atm to load weights from pt specifically instead of safetensors by @aarnphm in #776

Full Changelog: v0.4.37...v0.4.38

v0.4.37

13 Dec 14:22
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.37

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.37

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.37 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • feat(mixtral): correct support for mixtral by @aarnphm in #772
  • chore: running all script when installation by @aarnphm in #773

Full Changelog: v0.4.36...v0.4.37

v0.4.36

12 Dec 06:44
Compare
Choose a tag to compare

Mixtral supports

Supports Mixtral on BentoCloud with vLLM and all required dependencies.

Bento built with openllm now defaults to python 3.11 for this change to work.

Installation

pip install openllm==0.4.36

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.36

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.36 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • feat(openai): supports echo by @aarnphm in #760
  • fix(openai): logprobs when echo is enabled by @aarnphm in #761
  • ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #767
  • chore(deps): bump docker/metadata-action from 5.2.0 to 5.3.0 by @dependabot in #766
  • chore(deps): bump actions/setup-python from 4.7.1 to 5.0.0 by @dependabot in #765
  • chore(deps): bump taiki-e/install-action from 2.21.26 to 2.22.0 by @dependabot in #764
  • chore(deps): bump aquasecurity/trivy-action from 0.14.0 to 0.16.0 by @dependabot in #763
  • chore(deps): bump github/codeql-action from 2.22.8 to 2.22.9 by @dependabot in #762
  • feat: mixtral support by @aarnphm in #770

Full Changelog: v0.4.35...v0.4.36

v0.4.35

07 Dec 08:47
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.35

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.35

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.35 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • chore(deps): bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 by @dependabot in #749
  • chore(deps): bump docker/metadata-action from 5.0.0 to 5.2.0 by @dependabot in #751
  • chore(deps): bump taiki-e/install-action from 2.21.19 to 2.21.26 by @dependabot in #750
  • ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #753
  • fix(logprobs): explicitly set logprobs=None by @aarnphm in #757

Full Changelog: v0.4.34...v0.4.35

v0.4.34

30 Nov 12:28
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.34

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.34

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.34 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

New Contributors

Full Changelog: v0.4.33...v0.4.34

v0.4.33

29 Nov 18:12
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.33

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.33

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.33 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

Full Changelog: v0.4.32...v0.4.33