-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prioritized Alternatives in Device Requests #128586
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super cursory quick look...
31aaf8f
to
19ab559
Compare
/assign |
19ab559
to
e49122d
Compare
e49122d
to
fa17f45
Compare
573c9d8
to
07dc92d
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mortent The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
022c5f9
to
1dd7e83
Compare
/test pull-kubernetes-apidiff-client-go |
@pohly @johnbelamaric This should be ready for review. This PR adds the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review. The idea was that others cover DRA API changes and I do the same for non-DRA APIs.
pkg/apis/resource/install/install.go
Outdated
utilruntime.Must(scheme.SetVersionPriority(v1beta1.SchemeGroupVersion, v1alpha3.SchemeGroupVersion)) | ||
utilruntime.Must(v1beta2.AddToScheme(scheme)) | ||
// We should change the serialization version to v1beta2 | ||
// for 1.34. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that, although it won't help much with removal of v1beta1: support for that has to be kept until it's certain that there are no stored objects with that version, which is typically impossible to determine.
Can you create an issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #129889 and also linked to it in the TODO.
I remember this from a previous API I was working on, and I think the idea was that by not changing the serialization version until one version later, it would make rollbacks less risky. But it might be that I don't remember the full reason for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. https://kubernetes.io/docs/reference/using-api/deprecation-policy/ describes our case (v1beta1 -> v1beta2). We are at "X+3": v1beta2 and v1beta1 available, v1beta1 is deprecated, v1beta1 as storage version.
@@ -25,6 +25,7 @@ import ( | |||
"k8s.io/kubernetes/pkg/apis/resource" | |||
"k8s.io/kubernetes/pkg/apis/resource/v1alpha3" | |||
"k8s.io/kubernetes/pkg/apis/resource/v1beta1" | |||
"k8s.io/kubernetes/pkg/apis/resource/v1beta2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds the v1beta2 API, but doesn't update the scheduler/allocator or the e2e tests to use the new API.
I agree with that decision, primarily because it avoids one potential pitfall: users who continue to enable only the v1beta1 API in their cluster would end up with an unusable cluster if the control plane components required v1beta2.
Let's start planning the v1beta1 removal. Can you:
- add a remark to the release note:
DRA: the v1beta1 is deprecated. It will be removed in 1.36. Use v1beta2 instead.
- create an issue as reminder, which includes rewriting the control plane components
- add the
/kind deprecation
(not doing it now myself in case that we want to discuss further)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering if we should update the e2e tests. It seems like maybe we should have tests both for v1beta1 and v1beta2, but that ends up being a lot of tests.
Let's do a few E2E test which uses v1beta2:
- one with
exactly
- one with
firstAvailable
Remember to add v1beta2 to test/e2e/dra/kind.yaml
, otherwise those tests will fail in pull-kubernetes-kind-dra
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created issue #129891 and added a release note and the deprecation label to the PR.
Also added e2e tests for the v1beta2
API.
test/e2e/feature/feature.go
Outdated
// owning-sig: sig-node | ||
// kep: https://kep.k8s.io/4816 | ||
// test-infra jobs: | ||
// - "dra-alpha" in https://testgrid.k8s.io/sig-node-dynamic-resource-allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// - "dra-alpha" in https://testgrid.k8s.io/sig-node-dynamic-resource-allocation | |
// - "ci-kind-dra-all" in https://testgrid.k8s.io/sig-node-dynamic-resource-allocation |
I'm not seeing any such tests in https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/128586/pull-kubernetes-kind-dra-all/1884325562575491072 for this PR.
Ah, because of:
+ hack/ginkgo-e2e.sh '-ginkgo.label-filter=Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Alpha, Beta, DynamicResourceAllocation, DRAAdminAccess} && !Flaky && !Slow'
I'll update the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the PR to reference to correct job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pohly
kubernetes/test-infra#34231 runs the tests for the correct features, but it still doesn't enable the correct features in kind. From the test run:
+ echo 'Enabling DRA feature(s): DRAAdminAccess DRAPrioritizedList DRAResourceClaimDeviceStatus.'
Enabling DRA feature(s): DRAAdminAccess DRAPrioritizedList DRAResourceClaimDeviceStatus.
+ kind create cluster --retain --config /dev/fd/63 --image dra/node:latest
++ cat test/e2e/dra/kind.yaml
++ for feature in ${features}
++ echo ' DRAAdminAccess: true'
Creating cluster "kind" ...
I have created kubernetes/test-infra#34242 that I think should fix this. It is very similar to kubernetes/test-infra#34231
@@ -132,7 +132,7 @@ func CreateResourceClaimController(ctx context.Context, tb ktesting.TB, clientSe | |||
podInformer := informerFactory.Core().V1().Pods() | |||
claimInformer := informerFactory.Resource().V1beta1().ResourceClaims() | |||
claimTemplateInformer := informerFactory.Resource().V1beta1().ResourceClaimTemplates() | |||
claimController, err := resourceclaim.NewController(klog.FromContext(ctx), true /* admin access */, clientSet, podInformer, claimInformer, claimTemplateInformer) | |||
claimController, err := resourceclaim.NewController(klog.FromContext(ctx), true /* admin access */, true /* prioritized list */, clientSet, podInformer, claimInformer, claimTemplateInformer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set of booleans is becoming a bit silly - sorry for even starting with it!
Should we add 1<<iota
enums with a custom int type such that this code can be come resourceclaim.NewController(klog.FromContext(ctx), resourceclaim.DRAAdminAccessFeatureEnabled|resourceclaim.DRAPrioritizedListEnabled, ...
?
Not required, it's quite a bit of boilerplate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that. I already planned to look at this for the partitionable devices feature, since adding the third boolean here really made it awkward. But I'll take a look at it here.
1dd7e83
to
77f49bc
Compare
3692eee
to
b351572
Compare
b351572
to
c3e73ef
Compare
/test pull-kubernetes-kind-dra-all-canary |
/test pull-kubernetes-kind-dra-all-canary It got killed and I don't know why. It should now run tests for DRAPrioritizedList, see kubernetes/test-infra#34231. |
@mortent: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind feature
/kind api-change
/kind deprecation
What this PR does / why we need it:
This implements https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/4816-dra-prioritized-list
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: