v0.1.4 Release Notes
Here are the key changes coming as part of this release:
Build and Test Agents
- Inference: Added support for non-llama models
- Inference: Added option to list all downloaded models and remove models
- Agent: Introduce new api agents.resume_turn to include client side tool execution in the same turn
- Agent: AgentConfig introduces new variable “tool_config” that allows for better tool configuration and system prompt overrides
- Agent: Added logging for agent step start and completion times
- Agent: Added support for logging for tool execution metadata
- Embedding: Updated /inference/embeddings to support asymmetric models, truncation and variable sized outputs
- Embedding: Updated embedding models for Ollama, Together, and Fireworks with available defaults
- VectorIO: Improved performance of sqlite-vec using chunked writes
Agent Evals and Model Customization
- Deprecated api /eval-tasks. Use /eval/benchmark instead
- Added CPU training support for TorchTune
Deploy and Monitoring of Agents
- Consistent view of client and server tool calls in telemetry
Better Engineering
- Made tests more data-driven for consistent evaluation
- Fixed documentation links and improved API reference generation
- Various small fixes for build scripts and system reliability
What's Changed
- build: resync uv and deps on 0.1.3 by @leseb in #1108
- style: fix the capitalization issue by @reidliu41 in #1117
- feat: log start, complete time to Agent steps by @ehhuang in #1116
- fix: Ensure a tool call can be converted before adding to buffer by @terrytangyuan in #1119
- docs: Fix incorrect link and command for generating API reference by @terrytangyuan in #1124
- chore: remove --no-list-templates option by @reidliu41 in #1121
- style: update verify-download help text by @reidliu41 in #1134
- style: update download help text by @reidliu41 in #1135
- fix: modify the model id title for model list by @reidliu41 in #1095
- fix: direct client pydantic type casting by @yanxi0830 in #1145
- style: remove prints in codebase by @yanxi0830 in #1146
- feat: support tool_choice = {required, none, } by @ehhuang in #1059
- test: Enable test_text_chat_completion_with_tool_choice_required for remote::vllm by @terrytangyuan in #1148
- fix(rag-example): add provider_id to avoid llama_stack_client 400 error by @fulvius31 in #1114
- fix: Get distro_codegen.py working with default deps and enabled in pre-commit hooks by @bbrowning in #1123
- chore: remove llama_models.llama3.api imports from providers by @ashwinb in #1107
- docs: fix Python llama_stack_client SDK links by @leseb in #1150
- feat: Chunk sqlite-vec writes by @franciscojavierarceo in #1094
- fix: miscellaneous job management improvements in torchtune by @booxter in #1136
- feat: add aggregation_functions to llm_as_judge_405b_simpleqa by @SLR722 in #1164
- feat: inference passthrough provider by @SLR722 in #1166
- docs: Remove unused python-openapi and json-strong-typing in openapi_generator by @terrytangyuan in #1167
- docs: improve API contribution guidelines by @leseb in #1137
- feat: add a option to list the downloaded models by @reidliu41 in #1127
- fix: Fixing some small issues with the build scripts by @franciscojavierarceo in #1132
- fix: llama stack build use UV_SYSTEM_PYTHON to install dependencies to system environment by @yanxi0830 in #1163
- build: add missing dev dependencies for unit tests by @leseb in #1004
- fix: More robust handling of the arguments in tool call response in remote::vllm by @terrytangyuan in #1169
- Added support for mongoDB KV store by @shrinitg in #543
- script for running client sdk tests by @sixianyi0721 in #895
- test: skip model registration for unsupported providers by @leseb in #1030
- feat: Enable CPU training for torchtune by @booxter in #1140
- fix: add logging import by @raspawar in #1174
- docs: Add note about distro_codegen.py and provider dependencies by @bbrowning in #1175
- chore: slight renaming of model alias stuff by @ashwinb in #1181
- feat: adding endpoints for files and uploads by @vladimirivic in #1070
- docs: Fix Links, Add Podman Instructions, Vector DB Unregister, and Example Script by @kevincogan in #1129
- chore!: deprecate eval/tasks by @yanxi0830 in #1186
- fix: some telemetry APIs don't currently work by @ehhuang in #1188
- feat: D69478008 [llama-stack] turning tests into data-driven by @LESSuseLESS in #1180
- feat: register embedding models for ollama, together, fireworks by @ashwinb in #1190
- feat(providers): add NVIDIA Inference embedding provider and tests by @mattf in #935
- docs: Add missing uv command for docs generation in contributing guide by @terrytangyuan in #1197
- docs: Simplify installation guide with
uv
by @terrytangyuan in #1196 - fix: BuiltinTool JSON serialization in remote vLLM provider by @bbrowning in #1183
- ci: improve GitHub Actions workflow for website builds by @leseb in #1151
- fix: pass tool_prompt_format to chat_formatter by @ehhuang in #1198
- fix(api): update embeddings signature so inputs and outputs list align by @ashwinb in #1161
- feat(api): Add options for supporting various embedding models by @ashwinb in #1192
- fix: update URL import, URL -> ImageContentItemImageURL by @mattf in #1204
- feat: model remove cmd by @reidliu41 in #1128
- chore: remove configure subcommand by @reidliu41 in #1202
- fix: remove list of list tests, no longer relevant after #1161 by @mattf in #1205
- test(client-sdk): Update embedding test types to use latest imports by @raspawar in #1203
- fix: convert back to model descriptor for model in list --downloaded by @reidliu41 in #1201
- docs: Add missing uv command and clarify website rebuild by @terrytangyuan in #1199
- fix: Updating images so that they are able to run without root access by @jland-redhat in #1208
- fix: pull ollama embedding model if necessary by @ashwinb in #1209
- chore: move embedding deps to RAG tool where they are needed by @ashwinb in #1210
- feat(1/n): api: unify agents for handling server & client tools by @yanxi0830 in #1178
- feat: tool outputs metadata by @ehhuang in #1155
- ci: add mypy for static type checking by @leseb in #1101
- feat(providers): support non-llama models for inference providers by @ashwinb in #1200
- test: fix test_rag_agent test by @ehhuang in #1215
- feat: add substring search for model list by @reidliu41 in #1099
- test: do not overwrite agent_config by @ehhuang in #1216
- docs: Adding Provider sections to docs by @franciscojavierarceo in #1195
- fix: update virtualenv building so llamastack- prefix is not added, make notebook experience easier by @ashwinb in #1225
- feat: add --run to llama stack build by @cdoern in #1156
- docs: Add vLLM to the list of inference providers in concepts and providers pages by @terrytangyuan in #1227
- docs: small fixes by @reidliu41 in #1224
- fix: avoid failure when no special pip deps and better exit by @leseb in #1228
- fix: set default tool_prompt_format in inference api by @ehhuang in #1214
- test: fix test_tool_choice by @ehhuang in #1234
New Contributors
- @fulvius31 made their first contribution in #1114
- @shrinitg made their first contribution in #543
- @raspawar made their first contribution in #1174
- @kevincogan made their first contribution in #1129
- @LESSuseLESS made their first contribution in #1180
- @jland-redhat made their first contribution in #1208
Full Changelog: v0.1.3...v0.1.4