-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference task type endpoints #3545
base: main
Are you sure you want to change the base?
Conversation
@@ -1,5 +1,5 @@ | |||
{ | |||
"inference.stream_inference": { | |||
"inference.stream_completion": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we might have a streaming endpoint for text embeddings for example.
*/ | ||
export class SparseEmbeddingInferenceResult { | ||
// TODO should we make this optional if we ever support multiple encoding types? So we can make it a variant | ||
sparse_embedding: Array<SparseEmbeddingResult> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see us having a variant here for a different type of response (like byte encoding for text embedding). That would be returned using the same URL so it wouldn't be a new response. Should we make this a variant and make sparse_embedding
optional?
I suppose changing some from required to optional in the future would be a breaking change right?
* TextEmbeddingInferenceResult is an aggregation of mutually exclusive text_embedding variants | ||
* @variants container | ||
*/ | ||
export class TextEmbeddingInferenceResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, one URL multiple response formats so keeping this as it was.
/** | ||
* Defines the completion result. | ||
*/ | ||
export class CompletionInferenceResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to other ideas for naming the classes. *Result
was already taken for everything for the nested field which is why I went with *InferenceResult
.
/** | ||
* Query input. | ||
*/ | ||
query: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query
is required for the rerank task type.
/** | ||
* Optional task settings | ||
*/ | ||
task_settings?: TaskSettings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this because I think it was missing before.
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
This PR makes breaking changes to the client for the inference API. Prior to this PR we had a single endpoint for most task types supported in the inference API:
_inference/<optional_task_type>/<inference id>
. After discussion with @swallez we decided to make the task type required in the URL. This way we could have separate requests and responses for each task type.This PR does not include another item of work to make well defined
task_settings
for each route. Correct me if I'm wrong, but I don't believe that would be a breaking change? If it is not a breaking change, I think we can defer that work until later.