Inference task type endpoints #3545

jonathan-buttner · 2025-01-16T20:54:25Z

This PR makes breaking changes to the client for the inference API. Prior to this PR we had a single endpoint for most task types supported in the inference API: _inference/<optional_task_type>/<inference id>. After discussion with @swallez we decided to make the task type required in the URL. This way we could have separate requests and responses for each task type.

This PR does not include another item of work to make well defined task_settings for each route. Correct me if I'm wrong, but I don't believe that would be a breaking change? If it is not a breaking change, I think we can defer that work until later.

jonathan-buttner · 2025-01-16T20:55:31Z

specification/_json_spec/inference.stream_completion.json

@@ -1,5 +1,5 @@
 {
-  "inference.stream_inference": {
+  "inference.stream_completion": {


In the future we might have a streaming endpoint for text embeddings for example.

jonathan-buttner · 2025-01-16T20:58:31Z

specification/inference/_types/Results.ts

+ */
+export class SparseEmbeddingInferenceResult {
+  // TODO should we make this optional if we ever support multiple encoding types? So we can make it a variant
+  sparse_embedding: Array<SparseEmbeddingResult>


I could see us having a variant here for a different type of response (like byte encoding for text embedding). That would be returned using the same URL so it wouldn't be a new response. Should we make this a variant and make sparse_embedding optional?

I suppose changing some from required to optional in the future would be a breaking change right?

jonathan-buttner · 2025-01-16T20:59:44Z

specification/inference/_types/Results.ts

+ * TextEmbeddingInferenceResult is an aggregation of mutually exclusive text_embedding variants
+ * @variants container
+ */
+export class TextEmbeddingInferenceResult {


Same thing here, one URL multiple response formats so keeping this as it was.

jonathan-buttner · 2025-01-16T21:03:46Z

specification/inference/_types/Results.ts

+/**
+ * Defines the completion result.
+ */
+export class CompletionInferenceResult {


I'm open to other ideas for naming the classes. *Result was already taken for everything for the nested field which is why I went with *InferenceResult.

jonathan-buttner · 2025-01-16T21:05:08Z

specification/inference/rerank/RerankRequest.ts

+    /**
+     * Query input.
+     */
+    query: string


query is required for the rerank task type.

jonathan-buttner · 2025-01-16T21:05:57Z

specification/inference/stream_completion/StreamInferenceRequest.ts

+    /**
+     * Optional task settings
+     */
+    task_settings?: TaskSettings


Adding this because I think it was missing before.

github-actions · 2025-01-16T21:16:13Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

Refactoring inference endpoints

687063f

jonathan-buttner added specification backport 8.x labels Jan 16, 2025

jonathan-buttner requested a review from a team as a code owner January 16, 2025 20:54

jonathan-buttner commented Jan 16, 2025

View reviewed changes

jonathan-buttner added 2 commits January 16, 2025 16:06

Fixing stream completion url and removing the old url and class

0442e31

generating spec

05864d4

jonathan-buttner requested review from prwhelan, davidkyle and dan-rubinstein January 16, 2025 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference task type endpoints #3545

Inference task type endpoints #3545

jonathan-buttner commented Jan 16, 2025 •

edited

Loading

jonathan-buttner Jan 16, 2025

jonathan-buttner Jan 16, 2025

jonathan-buttner Jan 16, 2025

jonathan-buttner Jan 16, 2025

jonathan-buttner Jan 16, 2025

jonathan-buttner Jan 16, 2025

github-actions bot commented Jan 16, 2025

Inference task type endpoints #3545

Are you sure you want to change the base?

Inference task type endpoints #3545

Conversation

jonathan-buttner commented Jan 16, 2025 • edited Loading

jonathan-buttner Jan 16, 2025

Choose a reason for hiding this comment

jonathan-buttner Jan 16, 2025

Choose a reason for hiding this comment

jonathan-buttner Jan 16, 2025

Choose a reason for hiding this comment

jonathan-buttner Jan 16, 2025

Choose a reason for hiding this comment

jonathan-buttner Jan 16, 2025

Choose a reason for hiding this comment

jonathan-buttner Jan 16, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 16, 2025

jonathan-buttner commented Jan 16, 2025 •

edited

Loading