Releases · meta-llama/llama-stack

🚀 Initial Release Notes for Llama Stack!

Added

Resource-oriented design for models, shields, memory banks, datasets and eval tasks
Persistence for registered objects with distribution
Ability to persist memory banks created for FAISS
PostgreSQL KVStore implementation
Environment variable placeholder support in run.yaml files
Comprehensive Zero-to-Hero notebooks and quickstart guides
Support for quantized models in Ollama
Vision models support for Together, Fireworks, Meta-Reference, and Ollama, and vLLM
Bedrock distribution with safety shields support
Evals API with task registration and scoring functions
MMLU and SimpleQA benchmark scoring functions
Huggingface dataset provider integration for benchmarks
Support for custom dataset registration from local paths
Benchmark evaluation CLI tools with visualization tables
RAG evaluation scoring functions and metrics
Local persistence for datasets and eval tasks

Changed

Split safety into distinct providers (llama-guard, prompt-guard, code-scanner)
Changed provider naming convention (impls → inline, adapters → remote)
Updated API signatures for dataset and eval task registration
Restructured folder organization for providers
Enhanced Docker build configuration
Added version prefixing for REST API routes
Enhanced evaluation task registration workflow
Improved benchmark evaluation output formatting
Restructured evals folder organization for better modularity

Removed

llama stack configure command

What's Changed

Update download command by @Wauplin in #9
Update fbgemm version by @jianyuh in #12
Add CLI reference docs by @dltn in #14
Added Ollama as an inference impl by @hardikjshah in #20
Hide older models by @dltn in #23
Introduce Llama stack distributions by @ashwinb in #22
Rename inline -> local by @dltn in #24
Avoid using nearly double the memory needed by @ashwinb in #30
Updates to prompt for tool calls by @hardikjshah in #29
RFC-0001-The-Llama-Stack by @raghotham in #8
Add API keys to AgenticSystemConfig instead of relying on dotenv by @ashwinb in #33
update cli ref doc by @jeffxtang in #34
fixed bug in download not enough disk space condition by @sisminnmaw in #35
Updated cli instructions with additonal details for each subcommands by @varunfb in #36
Updated URLs and addressed feedback by @varunfb in #37
Fireworks basic integration by @benjibc in #39
Together AI basic integration by @Nutlope in #43
Update LICENSE by @raghotham in #47
Add patch for SSE event endpoint responses by @dltn in #50
API Updates: fleshing out RAG APIs, introduce "llama stack" CLI command by @ashwinb in #51
[inference] Add a TGI adapter by @ashwinb in #52
upgrade llama_models by @benjibc in #55
Query generators for RAG query by @hardikjshah in #54
Add Chroma and PGVector adapters by @ashwinb in #56
API spec update, client demo with Stainless SDK by @yanxi0830 in #58
Enable Bing search by @hardikjshah in #59
add safety to openapi spec by @yanxi0830 in #62
Add config file based CLI by @yanxi0830 in #60
Simplified Telemetry API and tying it to logger by @ashwinb in #57
[Inference] Use huggingface_hub inference client for TGI adapter by @hanouticelina in #53
Support data: in URL for memory. Add ootb support for pdfs by @hardikjshah in #67
Remove request wrapper migration by @yanxi0830 in #64
CLI Update: build -> configure -> run by @yanxi0830 in #69
API Updates by @ashwinb in #73
Unwrap ChatCompletionRequest for context_retriever by @yanxi0830 in #75
CLI - add back build wizard, configure with name instead of build.yaml by @yanxi0830 in #74
CLI: add build templates support, move imports by @yanxi0830 in #77
fix prompt with name args by @yanxi0830 in #80
Fix memory URL parsing by @yanxi0830 in #81
Allow TGI adaptor to have non-standard llama model names by @hardikjshah in #84
[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers by @ashwinb in #92
Bedrock Guardrails comiting after rebasing the fork by @rsgrewal-aws in #96
Bedrock Inference Integration by @poegej in #94
Support for Llama3.2 models and Swift SDK by @ashwinb in #98
fix safety using inference by @yanxi0830 in #99
Fixes typo for setup instruction for starting Llama Stack Server section by @abhishekmishragithub in #103
Make TGI adapter compatible with HF Inference API by @Wauplin in #97
Fix links & format by @machina-source in #104
docs: fix typo by @dijonkitchen in #107
LG safety fix by @kplawiak in #108
Minor typos, HuggingFace -> Hugging Face by @marklysze in #113
Reordered pip install and llama model download by @KarthiDreamr in #112
Update getting_started.ipynb by @delvingdeep in #117
fix: 404 link to agentic system repository by @moldhouse in #118
Fix broken links in RFC-0001-llama-stack.md by @bhimrazy in #134
Validate name in llama stack build by @russellb in #128
inference: Fix download command in error msg by @russellb in #133
configure: Fix a error msg typo by @russellb in #131
docs: Note how to use podman by @russellb in #130
add env for LLAMA_STACK_CONFIG_DIR by @yanxi0830 in #137
[bugfix] fix duplicate api endpoints by @yanxi0830 in #139
Use inference APIs for executing Llama Guard by @ashwinb in #121
fixing safety inference and safety adapter for new API spec. Pinned t… by @yogishbaliga in #105
[CLI] remove dependency on CONDA_PREFIX in CLI by @yanxi0830 in #144
[bugfix] fix #146 by @yanxi0830 in #147
Extract provider data properly (attempt 2) by @ashwinb in #148
is_multimodal accepts core_model_id not model itself. by @wizardbc in #153
fix broken bedrock inference provider by @moritalous in #151
Fix podman+selinux compatibility by @russellb in #132
docker: Install in editable mode for dev purposes by @russellb in #160
[CLI] simplify docker run by @yanxi0830 in #159
Add a RoutableProvider protocol, support for multiple routing keys by @ashwinb in #163
docker: Check for selinux before using --security-opt by @russellb in #167
Adds markdown-link-check and fixes a broken link by @codefromthecrypt in #165
[bugfix] conda path lookup by @yanxi0830 in #179
fix prompt guard by @ashwinb in #177
inference: Add model option to client by @russellb in #17...