Skip to content

Releases: meta-llama/llama-stack

v0.1.4

25 Feb 00:02
Compare
Choose a tag to compare

v0.1.4 Release Notes

Here are the key changes coming as part of this release:

Build and Test Agents

  • Inference: Added support for non-llama models
  • Inference: Added option to list all downloaded models and remove models
  • Agent: Introduce new api agents.resume_turn to include client side tool execution in the same turn
  • Agent: AgentConfig introduces new variable β€œtool_config” that allows for better tool configuration and system prompt overrides
  • Agent: Added logging for agent step start and completion times
  • Agent: Added support for logging for tool execution metadata
  • Embedding: Updated /inference/embeddings to support asymmetric models, truncation and variable sized outputs
  • Embedding: Updated embedding models for Ollama, Together, and Fireworks with available defaults
  • VectorIO: Improved performance of sqlite-vec using chunked writes

Agent Evals and Model Customization

  • Deprecated api /eval-tasks. Use /eval/benchmark instead
  • Added CPU training support for TorchTune

Deploy and Monitoring of Agents

  • Consistent view of client and server tool calls in telemetry

Better Engineering

  • Made tests more data-driven for consistent evaluation
  • Fixed documentation links and improved API reference generation
  • Various small fixes for build scripts and system reliability

What's Changed

  • build: resync uv and deps on 0.1.3 by @leseb in #1108
  • style: fix the capitalization issue by @reidliu41 in #1117
  • feat: log start, complete time to Agent steps by @ehhuang in #1116
  • fix: Ensure a tool call can be converted before adding to buffer by @terrytangyuan in #1119
  • docs: Fix incorrect link and command for generating API reference by @terrytangyuan in #1124
  • chore: remove --no-list-templates option by @reidliu41 in #1121
  • style: update verify-download help text by @reidliu41 in #1134
  • style: update download help text by @reidliu41 in #1135
  • fix: modify the model id title for model list by @reidliu41 in #1095
  • fix: direct client pydantic type casting by @yanxi0830 in #1145
  • style: remove prints in codebase by @yanxi0830 in #1146
  • feat: support tool_choice = {required, none, } by @ehhuang in #1059
  • test: Enable test_text_chat_completion_with_tool_choice_required for remote::vllm by @terrytangyuan in #1148
  • fix(rag-example): add provider_id to avoid llama_stack_client 400 error by @fulvius31 in #1114
  • fix: Get distro_codegen.py working with default deps and enabled in pre-commit hooks by @bbrowning in #1123
  • chore: remove llama_models.llama3.api imports from providers by @ashwinb in #1107
  • docs: fix Python llama_stack_client SDK links by @leseb in #1150
  • feat: Chunk sqlite-vec writes by @franciscojavierarceo in #1094
  • fix: miscellaneous job management improvements in torchtune by @booxter in #1136
  • feat: add aggregation_functions to llm_as_judge_405b_simpleqa by @SLR722 in #1164
  • feat: inference passthrough provider by @SLR722 in #1166
  • docs: Remove unused python-openapi and json-strong-typing in openapi_generator by @terrytangyuan in #1167
  • docs: improve API contribution guidelines by @leseb in #1137
  • feat: add a option to list the downloaded models by @reidliu41 in #1127
  • fix: Fixing some small issues with the build scripts by @franciscojavierarceo in #1132
  • fix: llama stack build use UV_SYSTEM_PYTHON to install dependencies to system environment by @yanxi0830 in #1163
  • build: add missing dev dependencies for unit tests by @leseb in #1004
  • fix: More robust handling of the arguments in tool call response in remote::vllm by @terrytangyuan in #1169
  • Added support for mongoDB KV store by @shrinitg in #543
  • script for running client sdk tests by @sixianyi0721 in #895
  • test: skip model registration for unsupported providers by @leseb in #1030
  • feat: Enable CPU training for torchtune by @booxter in #1140
  • fix: add logging import by @raspawar in #1174
  • docs: Add note about distro_codegen.py and provider dependencies by @bbrowning in #1175
  • chore: slight renaming of model alias stuff by @ashwinb in #1181
  • feat: adding endpoints for files and uploads by @vladimirivic in #1070
  • docs: Fix Links, Add Podman Instructions, Vector DB Unregister, and Example Script by @kevincogan in #1129
  • chore!: deprecate eval/tasks by @yanxi0830 in #1186
  • fix: some telemetry APIs don't currently work by @ehhuang in #1188
  • feat: D69478008 [llama-stack] turning tests into data-driven by @LESSuseLESS in #1180
  • feat: register embedding models for ollama, together, fireworks by @ashwinb in #1190
  • feat(providers): add NVIDIA Inference embedding provider and tests by @mattf in #935
  • docs: Add missing uv command for docs generation in contributing guide by @terrytangyuan in #1197
  • docs: Simplify installation guide with uv by @terrytangyuan in #1196
  • fix: BuiltinTool JSON serialization in remote vLLM provider by @bbrowning in #1183
  • ci: improve GitHub Actions workflow for website builds by @leseb in #1151
  • fix: pass tool_prompt_format to chat_formatter by @ehhuang in #1198
  • fix(api): update embeddings signature so inputs and outputs list align by @ashwinb in #1161
  • feat(api): Add options for supporting various embedding models by @ashwinb in #1192
  • fix: update URL import, URL -> ImageContentItemImageURL by @mattf in #1204
  • feat: model remove cmd by @reidliu41 in #1128
  • chore: remove configure subcommand by @reidliu41 in #1202
  • fix: remove list of list tests, no longer relevant after #1161 by @mattf in #1205
  • test(client-sdk): Update embedding test types to use latest imports by @raspawar in #1203
  • fix: convert back to model descriptor for model in list --downloaded by @reidliu41 in #1201
  • docs: Add missing uv command and clarify website rebuild by @terrytangyuan in #1199
  • fix: Updating images so that they are able to run without root access by @jland-redhat in #1208
  • fix: pull ollama embedding model if necessary by @ashwinb in #1209
  • chore: move embedding deps to RAG tool where they are needed by @ashwinb in #1210
  • feat(1/n): api: unify agents for handling server & client tools by @yanxi0830 in #1178
  • feat: tool outputs metadata by @ehhuang in #1155
  • ci: add mypy for static type checking by @leseb in #1101
  • feat(providers): support non-llama models for inference providers by @ashwinb in #1200
  • test: fix test_rag_agent test by @ehhuang in #1215
  • feat: add substring search for model list by @reidliu41 in #1099
  • test: do not overwrite agent_config by @ehhuang in #1216
  • docs: Adding Provider sections to docs by @franciscojavierarceo in #1195
  • fix: update virtualenv building so llamastack- prefix is not added, make notebook experience easier by @ashwinb in #1225
  • feat: add --run to llama stack build by @cdoern in #1156
  • docs: Add vLLM to the list o...
Read more

v0.1.3

14 Feb 20:24
Compare
Choose a tag to compare

v0.1.3 Release

Here are some key changes that are coming as part of this release.

Build and Test Agents

Streamlined the initial development experience

  • Added support for llama stack run --image-type venv
  • Enhanced vector store options with new sqlite-vec provider and improved Qdrant integration
  • vLLM improvements for tool calling and logprobs
  • Better handling of sporadic code_interpreter tool calls

Agent Evals

Better benchmarking and Agent performance assessment

  • Renamed eval API /eval-task to /benchmarks
  • Improved documentation and notebooks for RAG and evals

Deploy and Monitoring of Agents

Improved production readiness

  • Added usage metrics collection for chat completions
  • CLI improvements for provider information
  • Improved error handling and system reliability
  • Better model endpoint handling and accessibility
  • Improved signal handling on distro server

Better Engineering

Infrastructure and code quality improvements

  • Faster text-based chat completion tests
  • Improved testing for non-streaming agent apis
  • Standardized import formatting with ruff linter
  • Added conventional commits standard
  • Fixed documentation parsing issues

What's Changed

New Contributors

Full Changelog: v0.1.2...v0.1.3

v0.1.2

07 Feb 22:06
Compare
Choose a tag to compare

TL;DR

  • Several stabilizations to development flows after the switch to uv
  • Migrated CI workflows to new OSS repo - llama-stack-ops
  • Added automated rebuilds for ReadTheDocs
  • Llama Stack server supports HTTPS
  • Added system prompt overrides support
  • Several bug fixes and improvements to documentation (check out Kubernetes deployment guide by @terrytangyuan )

What's Changed

New Contributors

Full Changelog: v0.1.1...v0.1.2

v0.1.1

02 Feb 02:29
Compare
Choose a tag to compare

A bunch of small / big improvements everywhere including support for Windows, switching to uv and many provider improvements.

What's Changed

New Contributors

Full Changelog: v0.1.0...v0.1.1

v0.1.0

24 Jan 17:47
Compare
Choose a tag to compare

We are excited to announce a stable API release of Llama Stack, which enables developers to build RAG applications and Agents using tools and safety shields, monitor and those agents with telemetry, and evaluate the agent with scoring functions.

Context

GenAI application developers need more than just an LLM - they need to integrate tools, connect with their data sources, establish guardrails, and ground the LLM responses effectively. Currently, developers must piece together various tools and APIs, complicating the development lifecycle and increasing costs. The result is that developers are spending more time on these integrations rather than focusing on the application logic itself. The bespoke coupling of components also makes it challenging to adopt state-of-the-art solutions in the rapidly evolving GenAI space. This is particularly difficult for open models like Llama, as best practices are not widely established in the open.

Llama Stack was created to provide developers with a comprehensive and coherent interface that simplifies AI application development and codifies best practices across the Llama ecosystem. Since our launch in September 2024, we have seen a huge uptick in interest in Llama Stack APIs by both AI developers and from partners building AI services with Llama models. Partners like Nvidia, Fireworks, and Ollama have collaborated with us to develop implementations across various APIs, including inference, memory, and safety.

With Llama Stack, you can easily build a RAG agent which can also search the web, do complex math, and custom tool calling. You can use telemetry to inspect those traces, and convert telemetry into evals datasets. And with Llama Stack’s plugin architecture and prepackage distributions, you choose to run your agent anywhere - in the cloud with our partners, deploy your own environment using virtualenv, conda, or Docker, operate locally with Ollama, or even run on mobile devices with our SDKs. Llama Stack offers unprecedented flexibility while also simplifying the developer experience.

Release

After iterating on the APIs for the last 3 months, today we’re launching a stable release (V1) of the Llama Stack APIs and the corresponding llama-stack server and client packages(v0.1.0). We now have automated tests for providers. These tests make sure that all provider implementations are verified. Developers can now easily and reliably select distributions or providers based on their specific requirements.

There are example standalone apps in llama-stack-apps.

Key Features of this release

  • Unified API Layer

    • Inference: Run LLM models
    • RAG: Store and retrieve knowledge for RAG
    • Agents: Build multi-step agentic workflows
    • Tools: Register tools that can be called by the agent
    • Safety: Apply content filtering and safety policies
    • Evaluation: Test model and agent quality
    • Telemetry: Collect and analyze usage data and complex agentic traces
    • Post Training ( Coming Soon ): Fine tune models for specific use cases
  • Rich Provider Ecosystem

    • Local Development: Meta's Reference, Ollama
    • Cloud: Fireworks, Together, Nvidia, AWS Bedrock, Groq, Cerebras
    • On-premises: Nvidia NIM, vLLM, TGI, Dell-TGI
    • On-device: iOS and Android support
  • Built for Production

    • Pre-packaged distributions for common deployment scenarios
    • Backwards compatibility across model versions
    • Comprehensive evaluation capabilities
    • Full observability and monitoring
  • Multiple developer interfaces

    • CLI: Command line interface
    • Python SDK
    • Swift iOS SDK
    • Kotlin Android SDK
  • Sample llama stack applications

    • Python
    • iOS
    • Android

What's Changed

  • [4/n][torchtune integration] support lazy load model during inference by @SLR722 in #620
  • remove unused telemetry related code for console by @dineshyv in #659
  • Fix Meta reference GPU implementation by @ashwinb in #663
  • Fixed imports for inference by @cdgamarose-nv in #661
  • fix trace starting in library client by @dineshyv in #655
  • Add Llama 70B 3.3 to fireworks by @aidando73 in #654
  • Tools API with brave and MCP providers by @dineshyv in #639
  • [torchtune integration] post training + eval by @SLR722 in #670
  • Fix post training apis broken by torchtune release by @SLR722 in #674
  • Add missing venv option in --image-type by @terrytangyuan in #677
  • Removed unnecessary CONDA_PREFIX env var in installation guide by @terrytangyuan in #683
  • Add 3.3 70B to Ollama inference provider by @aidando73 in #681
  • docs: update evals_reference/index.md by @eltociear in #675
  • [remove import ][1/n] clean up import & in apis/ by @yanxi0830 in #689
  • [bugfix] fix broken vision inference, change serialization for bytes by @yanxi0830 in #693
  • Minor Quick Start documentation updates. by @derekslager in #692
  • [bugfix] fix meta-reference agents w/ safety multiple model loading pytest by @yanxi0830 in #694
  • [bugfix] fix prompt_adapter interleaved_content_convert_to_raw by @yanxi0830 in #696
  • Add missing "inline::" prefix for providers in building_distro.md by @terrytangyuan in #702
  • Fix failing flake8 E226 check by @terrytangyuan in #701
  • Add missing newlines before printing the Dockerfile content by @terrytangyuan in #700
  • Add JSON structured outputs to Ollama Provider by @aidando73 in #680
  • [#407] Agents: Avoid calling tools that haven't been explicitly enabled by @aidando73 in #637
  • Made changes to readme and pinning to llamastack v0.0.61 by @heyjustinai in #624
  • [rag evals][1/n] refactor base scoring fn & data schema check by @yanxi0830 in #664
  • [Post Training] Fix missing import by @SLR722 in #705
  • Import from the right path by @SLR722 in #708
  • [#432] Add Groq Provider - chat completions by @aidando73 in #609
  • Change post training run.yaml inference config by @SLR722 in #710
  • [Post training] make validation steps configurable by @SLR722 in #715
  • Fix incorrect entrypoint for broken llama stack run by @terrytangyuan in #706
  • Fix assert message and call to completion_request_to_prompt in remote:vllm by @terrytangyuan in #709
  • Fix Groq invalid self.config reference by @aidando73 in #719
  • support llama3.1 8B instruct in post training by @SLR722 in #698
  • remove default logger handlers when using libcli with notebook by @dineshyv in #718
  • move DataSchemaValidatorMixin into standalone utils by @yanxi0830 in #720
  • add 3.3 to together inference provider by @yanxi0830 in #729
  • Update CODEOWNERS - add sixianyi0721 as the owner by @sixianyi0721 in #731
  • fix links for distro by @yanxi0830 in #733
  • add --version to llama stack CLI & /version endpoint by @yanxi0830 in #732
  • agents to use tools api by @dineshyv in #673
  • Add X-LlamaStack-Client-Version, rename ProviderData -> Provider-Data by @ashwinb in #735
  • Check version incompatibility by @ashwinb in #738
  • Add persistence for localfs datasets by @VladOS95-cyber in #557
  • Fixed typo in default VLLM_URL in remote-vllm.md by @terrytangyuan in #723
  • Consolidating Memory tests under client-sdk by @vladimirivic in #703
  • Expose LLAMASTACK_PORT in cli.stack.run by @terrytangyuan in #722
  • remove conflicting default for tool prompt format in chat completion by @dineshyv in #742
  • rename LLAMASTACK_PORT to LLAMA_STACK_PORT for consistency with other env vars by @raghotham in #744
  • Add inline vLLM inference provider to regression tests and fix regressions by @frreiss in #662
  • [CICD] github workflow to push nightly package to testpypi by @yanxi0830 in #734
  • Replaced zrangebylex method in the range method by @che...
Read more

v0.1.0rc12

22 Jan 22:24
Compare
Choose a tag to compare
v0.1.0rc12 Pre-release
Pre-release

What's Changed

Read more

v0.0.63

18 Dec 07:17
Compare
Choose a tag to compare

A small but important bug-fix release to update the URL datatype for the client-SDKs. The issue affected multimodal agentic turns especially.

Full Changelog: v0.0.62...v0.0.63

v0.0.62

18 Dec 02:39
Compare
Choose a tag to compare

What's Changed

A few important updates some of which are backwards incompatible. You must update your run.yamls when upgrading. As always look to templates/<distro>/run.yaml for reference.

  • Make embedding generation go through inference by @dineshyv in #606
  • [/scoring] add ability to define aggregation functions for scoring functions & refactors by @yanxi0830 in #597
  • Update the "InterleavedTextMedia" type by @ashwinb in #635
  • [NEW!] Experimental post-training APIs! #540, #593, etc.

A variety of fixes and enhancements. Some selected ones:

New Contributors

Full Changelog: v0.0.61...v0.0.62

v0.0.61

10 Dec 20:50
e2054d5
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.0.55...v0.0.61

v0.0.55 release

23 Nov 17:14
Compare
Choose a tag to compare

What's Changed

  • Fix TGI inference adapter
  • Fix llama stack build in 0.0.54 by @dltn in #505
  • Several documentation related improvements
  • Fix opentelemetry adapter by @dineshyv in #510
  • Update Ollama supported llama model list by @hickeyma in #483

Full Changelog: v0.0.54...v0.0.55