-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run manager with 2 replicas #180
base: main
Are you sure you want to change the base?
Conversation
Hi @k-keiichi-rh. Thanks for your PR. I'm waiting for a medik8s member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test 4.14-openshift-e2e |
c7df3fe
to
f4676fc
Compare
Signed-off-by: Keiichi Kii <[email protected]> Signed-off-by: Keiichi Kii <[email protected]>
f4676fc
to
136e5f9
Compare
Just increasing replicas to 2 is not enough to resolve ECOPROJECT-1855. AntiAffinity is not configured to the deployment. So 2 of the controllers may be executed on the same node. |
/test 4.14-openshift-e2e |
That's an interesting point we should probably consider that for our other operators which are using the same approach. However from what I understand the change (i.e anti-affinity for control-plane) only makes sure the manager pod isn't running on a control plane node, so I'm not sure how it would solve the problem. |
The following configuration ensures that no two pods with the label "control-plane: controller-manager" can't be scheduled on the same node(host).
For example, it's trying to deploy two manager pods, but the only one worker is available now.
Before applying this patch, both of manager pods can be deployed on the worker-0, but now the above AntiAffinity is configured to the deployment. So the 2nd manager pod can't be deployed on the worker-0 and it becomes the "Pending" status.
The 1st manager pod, self-node-remediation-controller-manager-88646cd74-cs5m8, is already deployed on the worker-0.
We can confirm the events if the AntiAffinity is working well as expected.
After one more worker node(worker-1) is available, the 2nd manager pod will be scheduled on the worker-1 soon.
|
Thanks for the explanation, and really good job of catching this use-case ! /lgtm Just a small Nit: I think you meant "can" instead of "can't"
|
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: k-keiichi-rh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general good catch with the affinity, I have some questions / concerns though, see comments inside
@@ -337,6 +337,16 @@ spec: | |||
control-plane: controller-manager | |||
self-node-remediation-operator: "" | |||
spec: | |||
affinity: | |||
podAntiAffinity: | |||
requiredDuringSchedulingIgnoredDuringExecution: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this fail operator installation in case we only have one node available during installation?
Also, what happens on updates, when there aren't enough nodes to start the pods of the new version?
(the latter can be solved with matchLabelKeys and pod-template-hash, but that's alpha in k8s 1.29 only. PopologySpreadConstraints, also with matchLabelKeys and pod-template-hash, would also work AFAIU, but there matchLabelKeys also is too new, beta since k8s 1.27 / OCP 4.14).
Maybe preferredDuringSchedulingIgnoredDuringExecution
is a better choice? It would potentially deploy both pods on the same node, but that's better than nothing, is it? However, I'm not sure if that would still make a difference to the default scheduling behaviour, which already respects replicasets... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the update concern can also be solved with this:
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
it will force k8s to first delete one of the pods before starting a new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this fail operator installation in case we only have one node available during installation?
I will test it this case. In my opinion, I think it should fail to install the operator installation.
There are the following cases:
- We have only one node available like SNO(Single Node OpenShift)
=> No reason to deploy SNR/FAR/NHC - We have a plan to deploy a cluster with multiple nodes, but there is only one node available during installation due to the node failures
=> We are unhappy to succeed to deploy the cluster without noticing the node failures. It should stop the operator installation.
looks like the update concern can also be solved with this:
strategy: rollingUpdate: maxSurge: 0 maxUnavailable: 1 type: RollingUpdate
it will force k8s to first delete one of the pods before starting a new one
I will fix it with the above. And the Workload Availability project has the Node Maintenance Operator for maintenance. So we also need to define PodDisruptionBudget like https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/thanos-querier/pod-disruption-budget.yaml.
It will prevent from stopping all of the managers accidentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have only one node available like SNO(Single Node OpenShift)
=> No reason to deploy SNR/FAR/NHC
agree
We have a plan to deploy a cluster with multiple nodes, but there is only one node available during installation due to the node failures
=> We are unhappy to succeed to deploy the cluster without noticing the node failures. It should stop the operator installation.
hm.... yes, actually it does not make sense to succeed, it wouldn't help fixing the cluster, because the agents would not be deployed on the failed nodes.
we also need to define PodDisruptionBudget
Let's discuss this in an issue please. As with this affinity, we need to be aware of side effects. Let's not make our operators more important than they are, compared to other deployments.
requiredDuringSchedulingIgnoredDuringExecution: | ||
- labelSelector: | ||
matchExpressions: | ||
- key: control-plane |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
control-plane: controller-manager
is a default label which is added by operator-sdk when scaffolding a new operator. So there is a very big risk that other operator's pods also have this label, which can lead to undesired behavior. Please use the self-node-remediation-operator: ""
label, which is unique for SNR and also doesn't conflict with the daemonset pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use the self-node-remediation-operator: ""
label. Thank you for the suggestion.
/test ? |
@mshitrit: The following commands are available to trigger required jobs:
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test 4.14-openshift-e2e |
@k-keiichi-rh: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This pr is to run the manager with 2 replicas for high availability.
We are testing this if our issue is resolved in https://issues.redhat.com/browse/ECOPROJECT-1855.