Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Inference Extension Plugin #10684

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

danehans
Copy link
Contributor

@danehans danehans commented Feb 22, 2025

Description

Adds initial support for an inference extension endpoint picker plugin. The plugin will:

  1. Contributes an Upstream based on an InferencePool.
  2. Creates the endpoints collection based on namespace-local Pod IPs from the InferencePool selector.
  3. Creates an Envoy ORIGINAL_DST from the InferencePool-generated.

API changes

N/A

Code changes

  • Adds inferenceextension/endpointpicker pkg
  • Updates krtcollections pkg to transform InferencePool endpoints into Upstream endpoints.
  • Bumps go deps for inference extension pkg
  • Runs code generators due to go dep bump.

CI changes

N/A

Docs changes

Godocs added throughout. User docs will be added in a future PR.

Context

Supports #10411

Interesting decisions

To keep the PR small, this is the first of multiple PRs to implement the endpoint picker plugin. This PR _does not create the ext-prc cluster nor does it add the ext-proc filter to the listeners filter chain.

Testing steps

Unit tests were added. e2e tests are still required and not included here due to the size of the PR.

Notes for reviewers

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

@danehans
Copy link
Contributor Author

Note: To pass CI, this PR includes a few identical commits as #10676 for go mod bump, code generations, etc.

@danehans danehans changed the title Issue 10411 epp plugin Initial Inference Extension Plugin Feb 22, 2025
@danehans danehans force-pushed the issue_10411_epp_plugin branch from bd4d69b to b981485 Compare February 22, 2025 21:08
}, krtOpts.ToOptions("EndpointPickerUpstreams")...)

// Create the endpoints collection
inputs := krtcollections.NewInfPoolEndpointsInputs(krtOpts, infPoolUpstream, pods)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can move this collection to this package; the original was there because of how it worked in ggv2

out.CircuitBreakers = &envoy_config_cluster_v3.CircuitBreakers{
Thresholds: []*envoy_config_cluster_v3.CircuitBreakers_Thresholds{
{
MaxConnections: wrapperspb.UInt32(40000),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is 40000 coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UseHttpHeader: true,
HttpHeaderName: "x-gateway-destination-endpoint",
}
anyLbConfig, err := anypb.New(lbConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use our utils.MessageToAny to do data-plane anypb serialization, as they set deterministic to true.
we should make sure other places in the code do that too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #10693 to make this change in other areas of the code and will update in the endpointpicker plugin.

poolCfg.OurPool = func(pool *infextv1a1.InferencePool) bool {
// List HTTPRoutes in the same namespace.
var routes apiv1.HTTPRouteList
if err := poolCfg.Mgr.GetClient().List(ctx, &routes, client.InNamespace(pool.Namespace)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that our onwership of inference pool is dynamic - i.e. if an httproute we own routes to it?
is this temporary until we pass in a controller field to the pool spec?

if this is the way we want to solve this going forward, let's make this function O(1) by using indexing (as i imagine would be called frequently)

For(&infextv1a1.InferencePool{}, builder.WithPredicates(
predicate.NewPredicateFuncs(func(object client.Object) bool {
if pool, ok := object.(*infextv1a1.InferencePool); ok {
return c.poolCfg.OurPool(pool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OurPool is doing uses the client - i'm not sure you should do that (or any other I/O) from a predicate function. i believe there is no guarantee that the client semantics are kept when predicates are called.

Namespace: pool.Namespace,
Name: pool.Name,
},
Obj: pool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you need to include ObjIR here

@danehans
Copy link
Contributor Author

danehans commented Feb 25, 2025

@yuval-k 4fd238f (WIP) adds support for creating the ext_proc cluster and route filters, PTAL. This commit does not include your review feedback. I'll work on that today.

@danehans danehans marked this pull request as draft February 25, 2025 15:43
@danehans danehans requested review from lgadban and yuval-k February 25, 2025 15:43
@danehans danehans force-pushed the issue_10411_epp_plugin branch from 4fd238f to a8af675 Compare February 25, 2025 16:20
@danehans danehans force-pushed the issue_10411_epp_plugin branch from a8af675 to 3539224 Compare February 25, 2025 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants