-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kind, check-cluster-up: Enable Kubevirt CPUManager FG when SR-IOV provider is tested #1348
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test check-up-kind-sriov |
3476f78
to
ae4e2e5
Compare
ae4e2e5
to
92b836a
Compare
92b836a
to
6a5991f
Compare
SR-IOV lane is green and ran the tests that failed all over https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/pr-logs/pull/kubevirt_kubevirtci/1348/check-up-kind-sriov/1880962417102426112 |
… constantly failing (kubevirt#3878)" This reverts commit 86cf6b7. The PR kubevirt/kubevirtci#1348 fixes the issue and stabilize the lane. Signed-off-by: Or Mergi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @ormergi.
Could you please give a few words about why is the PR needed?
@@ -69,6 +69,11 @@ export CRI_BIN=${CRI_BIN:-$(detect_cri)} | |||
fi | |||
${kubectl} wait -n kubevirt kv kubevirt --for condition=Available --timeout 15m | |||
|
|||
if [[ "$KUBEVIRT_PROVIDER" =~ "sriov" ]]; then | |||
# Some SR-IOV tests require Kubevirt CPUManager feature | |||
${kubectl} patch kubevirts -n kubevirt kubevirt --type=json -p='[{"op": "replace", "path": "/spec/configuration/developerConfiguration/featureGates","value": ["CPUManager"]}]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding a feature gate to the end of the existing list, instead of replacing the whole list, as it will enable future expansion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case additional FG should be enabled I think there should still be a single patch call with all necessary FGs.
The FG names can be aggregated and then passed to the patch call.
I didnt exported the FG name to var because its the only one at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should not come as a hard dependency of the SR-IOV provider.
Do you see a problem with directly marking the need to have CPUManager
as input from the caller?
Also, the patching is odd to me too.
- Why do you use
replace
and not a simpleadd
? - Why do you think it is better to assume there is only one FG? It will just make it harder for the next contributor to add other FGs in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont see problem with setting CPUManager feature to always on, in fact in kubevirt/kubevirt its always on.
We can go with that. Let me know what you think.
EDIT: In kubevirt/kubevirt tests, CPUManager FG is enabled unless the env architecture is s390x (it used to be always on), it might be necessary following the same logic here.
I also updated the PR title & description to express that it affects SR-IOV provider only.
Regarding the patch, I used "replace" because "add" didn't work for me in away I can add FG with one-liner.
When Kubevirt CR developerConfiguration.featureGates
is not initialized, it will require patch for initializing it and another one for adding the FG.
Using "replace" the way I did enable having one-liner.
Passing the provider name is the most simple way I could find to solve it and get the lane gating as soon as possible.
We can introduce some additional env var to hold a FG list, on a follow up PR in case it will be needed (I tried to keep things simple).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormergi I am not convinced that here is the right place to patch kubevirt? IMHO it should go into https://github.com/kubevirt/kubevirt/tree/main/hack/cluster-deploy.sh . WDYT?
EDIT: My reasoning behind that is that kubevirt is not a component of kubevirtci.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brianmcarey FYI ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhiller Changing deploy-cluster.sh affect cluster-sync flow, I am not sure its we should do that.
Please note "check-up-kind-sriov" calls "kind/check-cluster-up.sh" directly, and "kind/check-cluster-up.sh" deploy kubevirt (from nightly release yamls).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhiller WDYT?
Done |
/test check-up-kind-sriov |
Thank you. |
The "check-up-kind-sriov" is turned optional and not gating because it constantly failing, due to following tests failures: SRIOV VMI connected to single SRIOV network should have cloud-init meta_data with tagged interface and aligned cpus to sriov interface numa node for VMIs with dedicatedCPUs SRIOV VMI connected to single SRIOV network [test_id:3959]should create a virtual machine with sriov interface and dedicatedCPUs The mentioned tests causing the lane to fail following removal of programmatic skips in kubevirt/kubevirt tests kubevirt/kubevirt#13144, affecting the mentioned tests. Previously the mentioned tests were skipped silently (bad) and now, following the programmatic skip removal, fail loudly. The root cause for the failures (or previous skips) is tests depends on Kubevirt's CPUManager feature but its not enabled at all. Enable Kubeivrt CPUManager FG when the SR-IOV provider is tested. Signed-off-by: Or Mergi <[email protected]>
6a5991f
to
56a5439
Compare
Done |
@ormergi: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
The "check-up-kind-sriov" is turned optional and not gating because it constantly failing, due to following tests failures:
The mentioned tests causing the lane to fail following removal of programmatic skips in kubevirt/kubevirt tests kubevirt/kubevirt#13144, affecting the mentioned tests.
Previously the mentioned tests were skipped silently (bad) and now, following the programmatic skip removal, fail loudly.
The root cause for the failures (or previous skips) is tests depends on Kubevirt's CPUManager feature but its not enabled at all, see below notes section for more details *.
This PR fixes the lane by enabling Kubevirt's CPUManager features when SR-IOV provider is tested.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
The failing tests, creates VMs with dedicated-CPUs option, Kubevirt will label such VM's virt-launcher pod with node-selector signifying
cpumanager=true
label.The end result is the tested VMs fail to become ready on time due to impossible scheduling; VMs has
cpumanager=ture
node selector, but no node hascpumanager=true
label.Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: