Initial support for inference extension deployer #10676
Open
+1,064
−68
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Adds initial support for an InferencePool controller and deployer. This controller and deployer are independent of a gateway deployer. For now, both share the
GatewayConfig
type since refactoring this type would further increase the size of this PR.API changes
N/A
Code changes
Controller, deployer, and helm pkgs.
CI changes
N/A
Docs changes
Godocs added throughout. User docs will be added in a future PR.
Context
Supports #10411
Interesting decisions
This PR Implements a separate deployer since an InferencePool (and the supporting infra resources) do not depend on a Gateway. Instead, a deployed Gateway will use config from an InferencePool to learn how to connect to it, what failure mode to use, etc.
Testing steps
Unit and integration tests were added. e2e tests are still required and not included here due to the size of the PR.
Notes for reviewers
Refer to the upstream docs for add'l context.
Checklist: