Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference task type endpoints #3545

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Jan 16, 2025

This PR makes breaking changes to the client for the inference API. Prior to this PR we had a single endpoint for most task types supported in the inference API: _inference/<optional_task_type>/<inference id>. After discussion with @swallez we decided to make the task type required in the URL. This way we could have separate requests and responses for each task type.

This PR does not include another item of work to make well defined task_settings for each route. Correct me if I'm wrong, but I don't believe that would be a breaking change? If it is not a breaking change, I think we can defer that work until later.

@@ -1,5 +1,5 @@
{
"inference.stream_inference": {
"inference.stream_completion": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we might have a streaming endpoint for text embeddings for example.

*/
export class SparseEmbeddingInferenceResult {
// TODO should we make this optional if we ever support multiple encoding types? So we can make it a variant
sparse_embedding: Array<SparseEmbeddingResult>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see us having a variant here for a different type of response (like byte encoding for text embedding). That would be returned using the same URL so it wouldn't be a new response. Should we make this a variant and make sparse_embedding optional?

I suppose changing some from required to optional in the future would be a breaking change right?

* TextEmbeddingInferenceResult is an aggregation of mutually exclusive text_embedding variants
* @variants container
*/
export class TextEmbeddingInferenceResult {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, one URL multiple response formats so keeping this as it was.

/**
* Defines the completion result.
*/
export class CompletionInferenceResult {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to other ideas for naming the classes. *Result was already taken for everything for the nested field which is why I went with *InferenceResult.

/**
* Query input.
*/
query: string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query is required for the rerank task type.

/**
* Optional task settings
*/
task_settings?: TaskSettings
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this because I think it was missing before.

Copy link
Contributor

Following you can find the validation results for the APIs you have changed.

API Status Request Response
inference.chat_completion_unified Missing test Missing test
inference.completion Missing test Missing test
inference.delete Missing test Missing test
inference.get 🟢 1/1 1/1
inference.put Missing test Missing test
inference.rerank Missing test Missing test
inference.sparse_embedding Missing test Missing test
inference.stream_completion Missing test Missing test
inference.text_embedding Missing test Missing test
inference.update Missing test Missing test

You can validate these APIs yourself by using the make validate target.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant