25 Feb 00:02

hardikjshah

7946285

v0.1.4 Latest

Latest

v0.1.4 Release Notes

Here are the key changes coming as part of this release:

Build and Test Agents

Inference: Added support for non-llama models
Inference: Added option to list all downloaded models and remove models
Agent: Introduce new api agents.resume_turn to include client side tool execution in the same turn
Agent: AgentConfig introduces new variable “tool_config” that allows for better tool configuration and system prompt overrides
Agent: Added logging for agent step start and completion times
Agent: Added support for logging for tool execution metadata
Embedding: Updated /inference/embeddings to support asymmetric models, truncation and variable sized outputs
Embedding: Updated embedding models for Ollama, Together, and Fireworks with available defaults
VectorIO: Improved performance of sqlite-vec using chunked writes

Agent Evals and Model Customization

Deprecated api /eval-tasks. Use /eval/benchmark instead
Added CPU training support for TorchTune

Deploy and Monitoring of Agents

Consistent view of client and server tool calls in telemetry

Better Engineering

Made tests more data-driven for consistent evaluation
Fixed documentation links and improved API reference generation
Various small fixes for build scripts and system reliability

What's Changed

build: resync uv and deps on 0.1.3 by @leseb in #1108
style: fix the capitalization issue by @reidliu41 in #1117
feat: log start, complete time to Agent steps by @ehhuang in #1116
fix: Ensure a tool call can be converted before adding to buffer by @terrytangyuan in #1119
docs: Fix incorrect link and command for generating API reference by @terrytangyuan in #1124
chore: remove --no-list-templates option by @reidliu41 in #1121
style: update verify-download help text by @reidliu41 in #1134
style: update download help text by @reidliu41 in #1135
fix: modify the model id title for model list by @reidliu41 in #1095
fix: direct client pydantic type casting by @yanxi0830 in #1145
style: remove prints in codebase by @yanxi0830 in #1146
feat: support tool_choice = {required, none, } by @ehhuang in #1059
test: Enable test_text_chat_completion_with_tool_choice_required for remote::vllm by @terrytangyuan in #1148
fix(rag-example): add provider_id to avoid llama_stack_client 400 error by @fulvius31 in #1114
fix: Get distro_codegen.py working with default deps and enabled in pre-commit hooks by @bbrowning in #1123
chore: remove llama_models.llama3.api imports from providers by @ashwinb in #1107
docs: fix Python llama_stack_client SDK links by @leseb in #1150
feat: Chunk sqlite-vec writes by @franciscojavierarceo in #1094
fix: miscellaneous job management improvements in torchtune by @booxter in #1136
feat: add aggregation_functions to llm_as_judge_405b_simpleqa by @SLR722 in #1164
feat: inference passthrough provider by @SLR722 in #1166
docs: Remove unused python-openapi and json-strong-typing in openapi_generator by @terrytangyuan in #1167
docs: improve API contribution guidelines by @leseb in #1137
feat: add a option to list the downloaded models by @reidliu41 in #1127
fix: Fixing some small issues with the build scripts by @franciscojavierarceo in #1132
fix: llama stack build use UV_SYSTEM_PYTHON to install dependencies to system environment by @yanxi0830 in #1163
build: add missing dev dependencies for unit tests by @leseb in #1004
fix: More robust handling of the arguments in tool call response in remote::vllm by @terrytangyuan in #1169
Added support for mongoDB KV store by @shrinitg in #543
script for running client sdk tests by @sixianyi0721 in #895
test: skip model registration for unsupported providers by @leseb in #1030
feat: Enable CPU training for torchtune by @booxter in #1140
fix: add logging import by @raspawar in #1174
docs: Add note about distro_codegen.py and provider dependencies by @bbrowning in #1175
chore: slight renaming of model alias stuff by @ashwinb in #1181
feat: adding endpoints for files and uploads by @vladimirivic in #1070
docs: Fix Links, Add Podman Instructions, Vector DB Unregister, and Example Script by @kevincogan in #1129
chore!: deprecate eval/tasks by @yanxi0830 in #1186
fix: some telemetry APIs don't currently work by @ehhuang in #1188
feat: D69478008 [llama-stack] turning tests into data-driven by @LESSuseLESS in #1180
feat: register embedding models for ollama, together, fireworks by @ashwinb in #1190
feat(providers): add NVIDIA Inference embedding provider and tests by @mattf in #935
docs: Add missing uv command for docs generation in contributing guide by @terrytangyuan in #1197
docs: Simplify installation guide with uv by @terrytangyuan in #1196
fix: BuiltinTool JSON serialization in remote vLLM provider by @bbrowning in #1183
ci: improve GitHub Actions workflow for website builds by @leseb in #1151
fix: pass tool_prompt_format to chat_formatter by @ehhuang in #1198
fix(api): update embeddings signature so inputs and outputs list align by @ashwinb in #1161
feat(api): Add options for supporting various embedding models by @ashwinb in #1192
fix: update URL import, URL -> ImageContentItemImageURL by @mattf in #1204
feat: model remove cmd by @reidliu41 in #1128
chore: remove configure subcommand by @reidliu41 in #1202
fix: remove list of list tests, no longer relevant after #1161 by @mattf in #1205
test(client-sdk): Update embedding test types to use latest imports by @raspawar in #1203
fix: convert back to model descriptor for model in list --downloaded by @reidliu41 in #1201
docs: Add missing uv command and clarify website rebuild by @terrytangyuan in #1199
fix: Updating images so that they are able to run without root access by @jland-redhat in #1208
fix: pull ollama embedding model if necessary by @ashwinb in #1209
chore: move embedding deps to RAG tool where they are needed by @ashwinb in #1210
feat(1/n): api: unify agents for handling server & client tools by @yanxi0830 in #1178
feat: tool outputs metadata by @ehhuang in #1155
ci: add mypy for static type checking by @leseb in #1101
feat(providers): support non-llama models for inference providers by @ashwinb in #1200
test: fix test_rag_agent test by @ehhuang in #1215
feat: add substring search for model list by @reidliu41 in #1099
test: do not overwrite agent_config by @ehhuang in #1216
docs: Adding Provider sections to docs by @franciscojavierarceo in #1195
fix: update virtualenv building so llamastack- prefix is not added, make notebook experience easier by @ashwinb in #1225
feat: add --run to llama stack build by @cdoern in #1156
docs: Add vLLM to the list o...

Contributors

ashwinb, bbrowning, and 18 other contributors

Assets 2

14 Feb 20:24

ashwinb

v0.1.3

9b2fe6b

v0.1.3

v0.1.3 Release

Here are some key changes that are coming as part of this release.

Build and Test Agents

Streamlined the initial development experience

Added support for llama stack run --image-type venv
Enhanced vector store options with new sqlite-vec provider and improved Qdrant integration
vLLM improvements for tool calling and logprobs
Better handling of sporadic code_interpreter tool calls

Agent Evals

Better benchmarking and Agent performance assessment

Renamed eval API /eval-task to /benchmarks
Improved documentation and notebooks for RAG and evals

Deploy and Monitoring of Agents

Improved production readiness

Added usage metrics collection for chat completions
CLI improvements for provider information
Improved error handling and system reliability
Better model endpoint handling and accessibility
Improved signal handling on distro server

Better Engineering

Infrastructure and code quality improvements

Faster text-based chat completion tests
Improved testing for non-streaming agent apis
Standardized import formatting with ruff linter
Added conventional commits standard
Fixed documentation parsing issues

What's Changed

Getting started notebook update by @jeffxtang in #936
docs: update index.md for 0.1.2 by @raghotham in #1013
test: Make text-based chat completion tests run 10x faster by @terrytangyuan in #1016
chore: Updated requirements.txt by @cheesecake100201 in #1017
test: Use JSON tool prompt format for remote::vllm provider by @terrytangyuan in #1019
docs: Render check marks correctly on PyPI by @terrytangyuan in #1024
docs: update rag.md example code to prevent errors by @MichaelClifford in #1009
build: update uv lock to sync package versions by @leseb in #1026
fix: Gaps in doc codegen by @ellistarn in #1035
fix: Readthedocs cannot parse comments, resulting in docs bugs by @ellistarn in #1033
fix: a bad newline in ollama docs by @ellistarn in #1036
fix: Update Qdrant support post-refactor by @jwm4 in #1022
test: replace blocked image URLs with GitHub-hosted by @leseb in #1025
fix: Added missing tool_config arg in SambaNova chat_completion() by @terrytangyuan in #1042
docs: Updating wording and nits in the README.md by @kelbrown20 in #992
docs: remove changelog mention from PR template by @leseb in #1049
docs: reflect actual number of spaces for indent by @booxter in #1052
fix: agent config validation by @ehhuang in #1053
feat: add MetricResponseMixin to chat completion response types by @dineshyv in #1050
feat: make telemetry attributes be dict[str,PrimitiveType] by @dineshyv in #1055
fix: filter out remote::sample providers when listing by @booxter in #1057
feat: Support tool calling for non-streaming chat completion in remote vLLM provider by @terrytangyuan in #1034
perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools by @yanxi0830 in #1041
chore: update return type to Optional[str] by @leseb in #982
feat: Support tool calling for streaming chat completion in remote vLLM provider by @terrytangyuan in #1063
fix: show proper help text by @cdoern in #1065
feat: add support for running in a venv by @cdoern in #1018
feat: Adding sqlite-vec as a vectordb by @franciscojavierarceo in #1040
feat: support listing all for llama stack list-providers by @booxter in #1056
docs: Mention convential commits format in CONTRIBUTING.md by @bbrowning in #1075
fix: logprobs support in remote-vllm provider by @bbrowning in #1074
fix: improve signal handling and update dependencies by @leseb in #1044
style: update model id in model list title by @reidliu41 in #1072
fix: make backslash work in GET /models/{model_id:path} by @yanxi0830 in #1068
chore: Link to Groq docs in the warning message for preview model by @terrytangyuan in #1060
fix: remove :path in agents by @yanxi0830 in #1077
build: format codebase imports using ruff linter by @leseb in #1028
chore: Consistent naming for VectorIO providers by @terrytangyuan in #1023
test: Enable logprobs top_k tests for remote::vllm by @terrytangyuan in #1080
docs: Fix url to the llama-stack-spec yaml/html files by @vishnoianil in #1081
fix: Update VectorIO config classes in registry by @terrytangyuan in #1079
test: Add qdrant to provider tests by @jwm4 in #1039
test: add test for Agent.create_turn non-streaming response by @ehhuang in #1078
fix!: update eval-tasks -> benchmarks by @yanxi0830 in #1032
fix: openapi for eval-task by @yanxi0830 in #1085
fix: regex pattern matching to support :path suffix in the routes by @hardikjshah in #1089
fix: disable sqlite-vec test by @yanxi0830 in #1090
fix: add the missed help description info by @reidliu41 in #1096
fix: Update QdrantConfig to QdrantVectorIOConfig by @bbrowning in #1104
docs: Add region parameter to Bedrock provider by @raghotham in #1103
build: configure ruff from pyproject.toml by @leseb in #1100
chore: move all Llama Stack types from llama-models to llama-stack by @ashwinb in #1098
fix: enable_session_persistence in AgentConfig should be optional by @terrytangyuan in #1012
fix: improve stack build on venv by @leseb in #980
fix: remove the empty line by @reidliu41 in #1097

New Contributors

@MichaelClifford made their first contribution in #1009
@ellistarn made their first contribution in #1035
@kelbrown20 made their first contribution in #992
@franciscojavierarceo made their first contribution in #1040
@bbrowning made their first contribution in #1075
@reidliu41 made their first contribution in #1072
@vishnoianil made their first contribution in #1081

Full Changelog: v0.1.2...v0.1.3

Contributors

ashwinb, bbrowning, and 18 other contributors

Assets 2

07 Feb 22:06

hardikjshah

v0.1.2

ddd0610

v0.1.2

TL;DR

Several stabilizations to development flows after the switch to uv
Migrated CI workflows to new OSS repo - llama-stack-ops
Added automated rebuilds for ReadTheDocs
Llama Stack server supports HTTPS
Added system prompt overrides support
Several bug fixes and improvements to documentation (check out Kubernetes deployment guide by @terrytangyuan )

What's Changed

Fix UBI9 image build when installing Python packages via uv by @terrytangyuan in #926
Fix precommit check after moving to ruff by @terrytangyuan in #927
LocalInferenceImpl update for LS 0.1 by @jeffxtang in #911
Properly close PGVector DB connection during shutdown() by @terrytangyuan in #931
Add issue template config with docs and Discord links by @terrytangyuan in #930
Fix uv pip install timeout issue for PyTorch by @terrytangyuan in #929
github: ignore non-hidden python virtual environments by @nathan-weinberg in #939
fix: broken link in Quick Start doc by @nathan-weinberg in #943
fix: broken "core concepts" link in docs website by @nathan-weinberg in #940
Misc fixes by @ashwinb in #944
fix: formatting for ollama note in Quick Start doc by @nathan-weinberg in #945
[docs] typescript sdk readme by @yanxi0830 in #946
Support sys_prompt behavior in inference by @ehhuang in #937
if client.initialize fails, the example should exit by @cdoern in #954
Add Podman instructions to Quick Start by @jwm4 in #957
github: issue templates automatically apply relevant label by @nathan-weinberg in #956
docs: miscellaneous small fixes by @booxter in #961
Make a couple properties optional by @ashwinb in #963
[docs] Make RAG example self-contained by @booxter in #962
docs, tests: replace datasets.rst with memory_optimizations.rst by @booxter in #968
Fix broken pgvector provider and memory leaks by @terrytangyuan in #947
[docs] update the zero_to_hero_guide llama stack version to 0.1.0 by @kami619 in #960
missing T in import by @cooktheryan in #974
Fix README.md notebook links by @aakankshaduggal in #976
docs: clarify host.docker.internal works for recent podman by @booxter in #977
docs: add addn server guidance for Linux users in Quick Start by @nathan-weinberg in #972
sys_prompt support in Agent by @ehhuang in #938
chore: update PR template to reinforce changelog by @leseb in #988
github: update PR template to use correct syntax to auto-close issues by @booxter in #989
chore: remove unused argument by @cdoern in #987
test: replace memory with vector_io fixture by @leseb in #984
docs: use uv in CONTRIBUTING guide by @leseb in #970
docs: Add license badge to README.md by @terrytangyuan in #994
Add Kubernetes deployment guide by @terrytangyuan in #899
Fix incorrect handling of chat completion endpoint in remote::vLLM by @terrytangyuan in #951
ci: Add semantic PR title check by @terrytangyuan in #979
feat: Add a new template for dell by @hardikjshah in #978
docs: Correct typos in Zero to Hero guide by @mlecanu in #997
fix: Update rag examples to use fresh faiss index every time by @hardikjshah in #998
doc: getting started notebook by @ehhuang in #996
test: fix flaky agent test by @ehhuang in #1002
test: rm unused exception alias in pytest.raises by @leseb in #991
fix: List providers command prints out non-existing APIs from registry. Fixes #966 by @terrytangyuan in #969
chore: add missing ToolConfig import in groq.py by @leseb in #983
test: remove flaky agent test by @ehhuang in #1006
test: Split inference tests to text and vision by @terrytangyuan in #1008
feat: Add HTTPS serving option by @ashwinb in #1000
test: encode image data as base64 by @leseb in #1003
fix: Ensure a better error stack trace when llama-stack is not built by @cdoern in #950
refactor(ollama): model availability check by @leseb in #986

New Contributors

@nathan-weinberg made their first contribution in #939
@cdoern made their first contribution in #954
@jwm4 made their first contribution in #957
@booxter made their first contribution in #961
@kami619 made their first contribution in #960
@cooktheryan made their first contribution in #974
@aakankshaduggal made their first contribution in #976
@leseb made their first contribution in #988
@mlecanu made their first contribution in #997

Full Changelog: v0.1.1...v0.1.2

Contributors

ashwinb, booxter, and 13 other contributors

Assets 2

02 Feb 02:29

ashwinb

v0.1.1

3b8d657

v0.1.1

A bunch of small / big improvements everywhere including support for Windows, switching to uv and many provider improvements.

What's Changed

Update doc templates for running safety on self-hosted templates by @hardikjshah in #874
Update GH action so it correctly queries for test.pypi, etc. by @ashwinb in #875
Fix report generation for url endpoints by @hardikjshah in #876
Fixed typo by @BakungaBronson in #877
Fixed multiple typos by @BakungaBronson in #878
Ensure llama stack build --config <> --image-type <> works by @ashwinb in #879
Update documentation by @ashwinb in #865
Update discriminator to have the correct mapping by @ashwinb in #881
Fix telemetry init by @dineshyv in #885
Sambanova - LlamaGuard by @snova-edwardm in #886
Update index.md by @Ckhanoyan in #888
Report generation minor fixes by @sixianyi0721 in #884
adding readme to docs folder for easier discoverability of notebooks … by @heyjustinai in #857
Agent response format by @hanzlfs in #660
Add windows support for build execution by @VladOS95-cyber in #889
Add run win command for stack by @VladOS95-cyber in #890
Use ruamel.yaml to format the OpenAPI spec by @ashwinb in #892
Fix Chroma adapter by @ashwinb in #893
align with CompletionResponseStreamChunk.delta as str (instead of TextDelta) by @mattf in #900
add NVIDIA_BASE_URL and NVIDIA_API_KEY to control hosted vs local endpoints by @mattf in #897
Fix validator of "container" image type by @terrytangyuan in #901
Update OpenAPI generator to add param and field documentation by @ashwinb in #896
Fix link to selection guide and change "docker" to "container" by @terrytangyuan in #898
[#432] Groq Provider tool call tweaks by @aidando73 in #811
Fix running stack built with base conda environment by @dvrogozh in #903
create a github action for triggering client-sdk tests on new pull-request by @sixianyi0721 in #850
log probs - mark pytests as xfail for unsupported providers + add support for together by @sixianyi0721 in #883
SambaNova supports Llama 3.3 by @snova-edwardm in #905
fix ImageContentItem to take base64 string as image.data by @yanxi0830 in #909
Fix Agents to support code and rag simultaneously by @hardikjshah in #908
add test for user message w/ image.data content by @mattf in #906
openapi gen return type fix for streaming/non-streaming by @yanxi0830 in #910
feat: enable xpu support for meta-reference stack by @dvrogozh in #558
Sec fixes as raised by bandit by @hardikjshah in #917
Run code-gen by @hardikjshah in #916
fix rag tests by @hardikjshah in #918
Use uv pip install instead of pip install by @ashwinb in #921
add image support to NVIDIA inference provider by @mattf in #907

New Contributors

@BakungaBronson made their first contribution in #877
@Ckhanoyan made their first contribution in #888
@hanzlfs made their first contribution in #660
@dvrogozh made their first contribution in #903

Full Changelog: v0.1.0...v0.1.1

Contributors

ashwinb, mattf, and 13 other contributors

Assets 2

24 Jan 17:47

ashwinb

v0.1.0

19521cb

v0.1.0

We are excited to announce a stable API release of Llama Stack, which enables developers to build RAG applications and Agents using tools and safety shields, monitor and those agents with telemetry, and evaluate the agent with scoring functions.

Context

GenAI application developers need more than just an LLM - they need to integrate tools, connect with their data sources, establish guardrails, and ground the LLM responses effectively. Currently, developers must piece together various tools and APIs, complicating the development lifecycle and increasing costs. The result is that developers are spending more time on these integrations rather than focusing on the application logic itself. The bespoke coupling of components also makes it challenging to adopt state-of-the-art solutions in the rapidly evolving GenAI space. This is particularly difficult for open models like Llama, as best practices are not widely established in the open.

Llama Stack was created to provide developers with a comprehensive and coherent interface that simplifies AI application development and codifies best practices across the Llama ecosystem. Since our launch in September 2024, we have seen a huge uptick in interest in Llama Stack APIs by both AI developers and from partners building AI services with Llama models. Partners like Nvidia, Fireworks, and Ollama have collaborated with us to develop implementations across various APIs, including inference, memory, and safety.

With Llama Stack, you can easily build a RAG agent which can also search the web, do complex math, and custom tool calling. You can use telemetry to inspect those traces, and convert telemetry into evals datasets. And with Llama Stack’s plugin architecture and prepackage distributions, you choose to run your agent anywhere - in the cloud with our partners, deploy your own environment using virtualenv, conda, or Docker, operate locally with Ollama, or even run on mobile devices with our SDKs. Llama Stack offers unprecedented flexibility while also simplifying the developer experience.

Release

After iterating on the APIs for the last 3 months, today we’re launching a stable release (V1) of the Llama Stack APIs and the corresponding llama-stack server and client packages(v0.1.0). We now have automated tests for providers. These tests make sure that all provider implementations are verified. Developers can now easily and reliably select distributions or providers based on their specific requirements.

There are example standalone apps in llama-stack-apps.

Key Features of this release

Unified API Layer
- Inference: Run LLM models
- RAG: Store and retrieve knowledge for RAG
- Agents: Build multi-step agentic workflows
- Tools: Register tools that can be called by the agent
- Safety: Apply content filtering and safety policies
- Evaluation: Test model and agent quality
- Telemetry: Collect and analyze usage data and complex agentic traces
- Post Training ( Coming Soon ): Fine tune models for specific use cases
Rich Provider Ecosystem
- Local Development: Meta's Reference, Ollama
- Cloud: Fireworks, Together, Nvidia, AWS Bedrock, Groq, Cerebras
- On-premises: Nvidia NIM, vLLM, TGI, Dell-TGI
- On-device: iOS and Android support
Built for Production
- Pre-packaged distributions for common deployment scenarios
- Backwards compatibility across model versions
- Comprehensive evaluation capabilities
- Full observability and monitoring
Multiple developer interfaces
- CLI: Command line interface
- Python SDK
- Swift iOS SDK
- Kotlin Android SDK
Sample llama stack applications
- Python
- iOS
- Android

What's Changed

[4/n][torchtune integration] support lazy load model during inference by @SLR722 in #620
remove unused telemetry related code for console by @dineshyv in #659
Fix Meta reference GPU implementation by @ashwinb in #663
Fixed imports for inference by @cdgamarose-nv in #661
fix trace starting in library client by @dineshyv in #655
Add Llama 70B 3.3 to fireworks by @aidando73 in #654
Tools API with brave and MCP providers by @dineshyv in #639
[torchtune integration] post training + eval by @SLR722 in #670
Fix post training apis broken by torchtune release by @SLR722 in #674
Add missing venv option in --image-type by @terrytangyuan in #677
Removed unnecessary CONDA_PREFIX env var in installation guide by @terrytangyuan in #683
Add 3.3 70B to Ollama inference provider by @aidando73 in #681
docs: update evals_reference/index.md by @eltociear in #675
[remove import ][1/n] clean up import & in apis/ by @yanxi0830 in #689
[bugfix] fix broken vision inference, change serialization for bytes by @yanxi0830 in #693
Minor Quick Start documentation updates. by @derekslager in #692
[bugfix] fix meta-reference agents w/ safety multiple model loading pytest by @yanxi0830 in #694
[bugfix] fix prompt_adapter interleaved_content_convert_to_raw by @yanxi0830 in #696
Add missing "inline::" prefix for providers in building_distro.md by @terrytangyuan in #702
Fix failing flake8 E226 check by @terrytangyuan in #701
Add missing newlines before printing the Dockerfile content by @terrytangyuan in #700
Add JSON structured outputs to Ollama Provider by @aidando73 in #680
[#407] Agents: Avoid calling tools that haven't been explicitly enabled by @aidando73 in #637
Made changes to readme and pinning to llamastack v0.0.61 by @heyjustinai in #624
[rag evals][1/n] refactor base scoring fn & data schema check by @yanxi0830 in #664
[Post Training] Fix missing import by @SLR722 in #705
Import from the right path by @SLR722 in #708
[#432] Add Groq Provider - chat completions by @aidando73 in #609
Change post training run.yaml inference config by @SLR722 in #710
[Post training] make validation steps configurable by @SLR722 in #715
Fix incorrect entrypoint for broken llama stack run by @terrytangyuan in #706
Fix assert message and call to completion_request_to_prompt in remote:vllm by @terrytangyuan in #709
Fix Groq invalid self.config reference by @aidando73 in #719
support llama3.1 8B instruct in post training by @SLR722 in #698
remove default logger handlers when using libcli with notebook by @dineshyv in #718
move DataSchemaValidatorMixin into standalone utils by @yanxi0830 in #720
add 3.3 to together inference provider by @yanxi0830 in #729
Update CODEOWNERS - add sixianyi0721 as the owner by @sixianyi0721 in #731
fix links for distro by @yanxi0830 in #733
add --version to llama stack CLI & /version endpoint by @yanxi0830 in #732
agents to use tools api by @dineshyv in #673
Add X-LlamaStack-Client-Version, rename ProviderData -> Provider-Data by @ashwinb in #735
Check version incompatibility by @ashwinb in #738
Add persistence for localfs datasets by @VladOS95-cyber in #557
Fixed typo in default VLLM_URL in remote-vllm.md by @terrytangyuan in #723
Consolidating Memory tests under client-sdk by @vladimirivic in #703
Expose LLAMASTACK_PORT in cli.stack.run by @terrytangyuan in #722
remove conflicting default for tool prompt format in chat completion by @dineshyv in #742
rename LLAMASTACK_PORT to LLAMA_STACK_PORT for consistency with other env vars by @raghotham in #744
Add inline vLLM inference provider to regression tests and fix regressions by @frreiss in #662
[CICD] github workflow to push nightly package to testpypi by @yanxi0830 in #734
Replaced zrangebylex method in the range method by @che...

Contributors

ashwinb, ehhuang, and 22 other contributors

Assets 2

22 Jan 22:24

ashwinb

v0.1.0rc12

82d942b

v0.1.0rc12 Pre-release

Pre-release

What's Changed

[4/n][torchtune integration] support lazy load model during inference by @SLR722 in #620
remove unused telemetry related code for console by @dineshyv in #659
Fix Meta reference GPU implementation by @ashwinb in #663
Fixed imports for inference by @cdgamarose-nv in #661
fix trace starting in library client by @dineshyv in #655
Add Llama 70B 3.3 to fireworks by @aidando73 in #654
Tools API with brave and MCP providers by @dineshyv in #639
[torchtune integration] post training + eval by @SLR722 in #670
Fix post training apis broken by torchtune release by @SLR722 in #674
Add missing venv option in --image-type by @terrytangyuan in #677
Removed unnecessary CONDA_PREFIX env var in installation guide by @terrytangyuan in #683
Add 3.3 70B to Ollama inference provider by @aidando73 in #681
docs: update evals_reference/index.md by @eltociear in #675
[remove import ][1/n] clean up import & in apis/ by @yanxi0830 in #689
[bugfix] fix broken vision inference, change serialization for bytes by @yanxi0830 in #693
Minor Quick Start documentation updates. by @derekslager in #692
[bugfix] fix meta-reference agents w/ safety multiple model loading pytest by @yanxi0830 in #694
[bugfix] fix prompt_adapter interleaved_content_convert_to_raw by @yanxi0830 in #696
Add missing "inline::" prefix for providers in building_distro.md by @terrytangyuan in #702
Fix failing flake8 E226 check by @terrytangyuan in #701
Add missing newlines before printing the Dockerfile content by @terrytangyuan in #700
Add JSON structured outputs to Ollama Provider by @aidando73 in #680
[#407] Agents: Avoid calling tools that haven't been explicitly enabled by @aidando73 in #637
Made changes to readme and pinning to llamastack v0.0.61 by @heyjustinai in #624
[rag evals][1/n] refactor base scoring fn & data schema check by @yanxi0830 in #664
[Post Training] Fix missing import by @SLR722 in #705
Import from the right path by @SLR722 in #708
[#432] Add Groq Provider - chat completions by @aidando73 in #609
Change post training run.yaml inference config by @SLR722 in #710
[Post training] make validation steps configurable by @SLR722 in #715
Fix incorrect entrypoint for broken llama stack run by @terrytangyuan in #706
Fix assert message and call to completion_request_to_prompt in remote:vllm by @terrytangyuan in #709
Fix Groq invalid self.config reference by @aidando73 in #719
support llama3.1 8B instruct in post training by @SLR722 in #698
remove default logger handlers when using libcli with notebook by @dineshyv in #718
move DataSchemaValidatorMixin into standalone utils by @yanxi0830 in #720
add 3.3 to together inference provider by @yanxi0830 in #729
Update CODEOWNERS - add sixianyi0721 as the owner by @sixianyi0721 in #731
fix links for distro by @yanxi0830 in #733
add --version to llama stack CLI & /version endpoint by @yanxi0830 in #732
agents to use tools api by @dineshyv in #673
Add X-LlamaStack-Client-Version, rename ProviderData -> Provider-Data by @ashwinb in #735
Check version incompatibility by @ashwinb in #738
Add persistence for localfs datasets by @VladOS95-cyber in #557
Fixed typo in default VLLM_URL in remote-vllm.md by @terrytangyuan in #723
Consolidating Memory tests under client-sdk by @vladimirivic in #703
Expose LLAMASTACK_PORT in cli.stack.run by @terrytangyuan in #722
remove conflicting default for tool prompt format in chat completion by @dineshyv in #742
rename LLAMASTACK_PORT to LLAMA_STACK_PORT for consistency with other env vars by @raghotham in #744
Add inline vLLM inference provider to regression tests and fix regressions by @frreiss in #662
[CICD] github workflow to push nightly package to testpypi by @yanxi0830 in #734
Replaced zrangebylex method in the range method by @cheesecake100201 in #521
Improve model download doc by @SLR722 in #748
Support building UBI9 base container image by @terrytangyuan in #676
update notebook to use new tool defs by @dineshyv in #745
Add provider data passing for library client by @dineshyv in #750
[Fireworks] Update model name for Fireworks by @benjibc in #753
Consolidating Inference tests under client-sdk tests by @vladimirivic in #751
Consolidating Safety tests from various places under client-sdk by @vladimirivic in #699
[CI/CD] more robust re-try for downloading testpypi package by @yanxi0830 in #749
[#432] Add Groq Provider - tool calls by @aidando73 in #630
Rename ipython to tool by @ashwinb in #756
Fix incorrect Python binary path for UBI9 image by @terrytangyuan in #757
Update Cerebras docs to include header by @henrytwo in #704
Add init files to post training folders by @SLR722 in #711
Switch to use importlib instead of deprecated pkg_resources by @terrytangyuan in #678
[bugfix] fix streaming GeneratorExit exception with LlamaStackAsLibraryClient by @yanxi0830 in #760
Fix telemetry to work on reinstantiating new lib cli by @dineshyv in #761
[post training] define llama stack post training dataset format by @SLR722 in #717
add braintrust to experimental-post-training template by @SLR722 in #763
added support of PYPI_VERSION in stack build by @jeffxtang in #762
Fix broken tests in test_registry by @vladimirivic in #707
Fix fireworks run-with-safety template by @vladimirivic in #766
Free up memory after post training finishes by @SLR722 in #770
Fix issue when generating distros by @terrytangyuan in #755
Convert SamplingParams.strategy to a union by @hardikjshah in #767
[CICD] Github workflow for publishing Docker images by @yanxi0830 in #764
[bugfix] fix llama guard parsing ContentDelta by @yanxi0830 in #772
rebase eval test w/ tool_runtime fixtures by @yanxi0830 in #773
More idiomatic REST API by @dineshyv in #765
add nvidia distribution by @cdgamarose-nv in #565
bug fixes on inference tests by @sixianyi0721 in #774
[bugfix] fix inference sdk test for v1 by @yanxi0830 in #775
fix routing in library client by @dineshyv in https://github.com/meta-llama/llama-stack/pull...

Contributors

ashwinb, jeffxtang, and 19 other contributors

Assets 2

18 Dec 07:17

ashwinb

v0.0.63

d6fcdef

v0.0.63

A small but important bug-fix release to update the URL datatype for the client-SDKs. The issue affected multimodal agentic turns especially.

Full Changelog: v0.0.62...v0.0.63

Assets 2

18 Dec 02:39

ashwinb

v0.0.62

eea4786

v0.0.62

What's Changed

A few important updates some of which are backwards incompatible. You must update your run.yamls when upgrading. As always look to templates/<distro>/run.yaml for reference.

Make embedding generation go through inference by @dineshyv in #606
[/scoring] add ability to define aggregation functions for scoring functions & refactors by @yanxi0830 in #597
Update the "InterleavedTextMedia" type by @ashwinb in #635
[NEW!] Experimental post-training APIs! #540, #593, etc.

A variety of fixes and enhancements. Some selected ones:

[#342] RAG - fix PDF format in vector database by @aidando73 in #551
add completion api support to nvidia inference provider by @mattf in #533
add model type to APIs by @dineshyv in #588
Allow using an "inline" version of Chroma using PersistentClient by @ashwinb in #567
[docs] add playground ui docs by @yanxi0830 in #592
add colab notebook & update docs by @yanxi0830 in #619
[tests] add client-sdk pytests & delete client.py by @yanxi0830 in #638
[bugfix] no shield_call when there's no shields configured by @yanxi0830 in #642

New Contributors

@SLR722 made their first contribution in #540
@iamarunbrahma made their first contribution in #636

Full Changelog: v0.0.61...v0.0.62

Contributors

ashwinb, mattf, and 5 other contributors

Assets 2

10 Dec 20:50

yanxi0830

v0.0.61

e2054d5

v0.0.61

What's Changed

add NVIDIA NIM inference adapter by @mattf in #355
Tgi fixture by @dineshyv in #519
fixes tests & move braintrust api_keys to request headers by @yanxi0830 in #535
allow env NVIDIA_BASE_URL to set NVIDIAConfig.url by @mattf in #531
move playground ui to llama-stack repo by @yanxi0830 in #536
fix[documentation]: Update links to point to correct pages by @sablair in #549
Fix URLs to Llama Stack Read the Docs Webpages by @JeffreyLind3 in #547
Fix Zero to Hero README.md Formatting by @JeffreyLind3 in #546
Guide readme fix by @raghotham in #552
Fix broken Ollama link by @aidando73 in #554
update client cli docs by @dineshyv in #560
reduce the accuracy requirements to pass the chat completion structured output test by @mattf in #522
removed assertion in ollama.py and fixed typo in the readme by @wukaixingxp in #563
Cerebras Inference Integration by @henrytwo in #265
unregister API for dataset by @sixianyi0721 in #507
[llama stack ui] add native eval & inspect distro & playground pages by @yanxi0830 in #541
Telemetry API redesign by @dineshyv in #525
Introduce GitHub Actions Workflow for Llama Stack Tests by @ConnorHack in #523
specify the client version that works for current together server by @jeffxtang in #566
remove unused telemetry related code by @dineshyv in #570
Fix up safety client for versioned API by @stevegrubb in #573
Add eval/scoring/datasetio API providers to distribution templates & UI developer guide by @yanxi0830 in #564
Add ability to query and export spans to dataset by @dineshyv in #574
Renames otel config from jaeger to otel by @codefromthecrypt in #569
add telemetry docs by @dineshyv in #572
Console span processor improvements by @dineshyv in #577
doc: quickstart guide errors by @aidando73 in #575
Add kotlin docs by @Riandy in #568
Update android_sdk.md by @Riandy in #578
Bump kotlin docs to 0.0.54.1 by @Riandy in #579
Make LlamaStackLibraryClient work correctly by @ashwinb in #581
Update integration type for Cerebras to hosted by @henrytwo in #583
Use customtool's get_tool_definition to remove duplication by @jeffxtang in #584
[#391] Add support for json structured output for vLLM by @aidando73 in #528
Fix Jaeger instructions by @yurishkuro in #580
fix telemetry import by @yanxi0830 in #585
update template run.yaml to include openai api key for braintrust by @yanxi0830 in #590
add tracing to library client by @dineshyv in #591
Fixes for library client by @ashwinb in #587
Fix issue 586 by @yanxi0830 in #594

New Contributors

@sablair made their first contribution in #549
@JeffreyLind3 made their first contribution in #547
@aidando73 made their first contribution in #554
@henrytwo made their first contribution in #265
@sixianyi0721 made their first contribution in #507
@ConnorHack made their first contribution in #523
@yurishkuro made their first contribution in #580

Full Changelog: v0.0.55...v0.0.61

Contributors

ashwinb, codefromthecrypt, and 15 other contributors

Assets 2

23 Nov 17:14

ashwinb

v0.0.55

45fd732

v0.0.55 release

What's Changed

Fix TGI inference adapter
Fix llama stack build in 0.0.54 by @dltn in #505
Several documentation related improvements
Fix opentelemetry adapter by @dineshyv in #510
Update Ollama supported llama model list by @hickeyma in #483

Full Changelog: v0.0.54...v0.0.55

Contributors

dltn, hickeyma, and dineshyv

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.4 Release Notes

Build and Test Agents

Agent Evals and Model Customization

Deploy and Monitoring of Agents

Better Engineering

What's Changed

Contributors

v0.1.3 Release

Build and Test Agents

Agent Evals

Deploy and Monitoring of Agents

Better Engineering

What's Changed

New Contributors

Contributors

TL;DR

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Context

Release

Key Features of this release

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

Releases: meta-llama/llama-stack

v0.1.4

v0.1.4 Release Notes

Build and Test Agents

Agent Evals and Model Customization

Deploy and Monitoring of Agents

Better Engineering

What's Changed

Contributors

v0.1.3

v0.1.3 Release

Build and Test Agents

Agent Evals

Deploy and Monitoring of Agents

Better Engineering

What's Changed

New Contributors

Contributors

v0.1.2

TL;DR

What's Changed

New Contributors

Contributors

v0.1.1

What's Changed

New Contributors

Contributors

v0.1.0

Context

Release

Key Features of this release

What's Changed

Contributors

v0.1.0rc12

What's Changed

Contributors

v0.0.63

v0.0.62

What's Changed

New Contributors

Contributors

v0.0.61

What's Changed

New Contributors

Contributors

v0.0.55 release

What's Changed

Contributors