-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
052 server side apply #9306
base: main
Are you sure you want to change the base?
052 server side apply #9306
Conversation
233eb6b
to
1aea465
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the PodOperator the right place to try things out? Pods are in general never patched but deleted and created. So the SSA gets litttle use there and I wonder whether any tests actually touch it there.
...or/src/main/java/io/strimzi/operator/cluster/operator/resource/ResourceOperatorSupplier.java
Outdated
Show resolved
Hide resolved
52fef64
to
b40b42e
Compare
Good point, I chose it as a simple example only really. In the proposal the issue was seen on StatefulSet, but I think StrimziPodSet is the replacement for this? So I'll look into doing it on the StrimziPodSetOperator instead |
b40b42e
to
51e3e24
Compare
I think StrimziPodSets would work. Or Service Accounts for example might be a good example.
So how does it differentiate from the other mock server? |
@@ -262,6 +297,23 @@ protected T patchOrReplace(String namespace, String name, T desired) { | |||
return operation().inNamespace(namespace).withName(name).patch(PatchContext.of(PatchType.JSON), desired); | |||
} | |||
|
|||
protected T patchOnly(Reconciliation reconciliation, String namespace, String name, T desired) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the logic behind naming it patchOnly
? I'm not sure I understand what would be the alternative to it.
protected T patchOnly(Reconciliation reconciliation, String namespace, String name, T desired) { | ||
try { | ||
return operation().inNamespace(namespace).withName(name).patch(serverSideApplyPatchContext(false), desired); | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to watch here for some more specific exception or error code?
try { | ||
return operation().inNamespace(namespace).withName(name).patch(serverSideApplyPatchContext(false), desired); | ||
} catch (Exception e) { | ||
LOGGER.debugCr(reconciliation, "{} {} in namespace {} failed to apply, using force", resourceKind, name, namespace, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is unexpected, should it be a warning message?
I left some more comments. I guess the next steps would be:
|
And I guess share what I tried ... it does not seem to work on my ARM based MacOS. It seems to fail with this when trying to run the
And this from command line:
|
@scholzj We did try it, and seems it doesn't work on Power(ppc64le).
|
I get those connection warnings too, perhaps there is a way to turn them off in the logging, but they are not necessarily indicating a failure - @Vaibhav-Nazare do you have further logs after the ones you showed? It should eventually show the usual test output. Mine does pass and doesn't take very long ( This issue confirming that the warnings in the logs are just readiness checks: java-operator-sdk/jenvtest#97 Also an issue with someone experiencing the same timeout: java-operator-sdk/jenvtest#105 And another suggesting maybe an existing kubeconfig file could be the problem: java-operator-sdk/jenvtest#116 My logs for reference:
|
bc62be9
to
10174a8
Compare
I've updated the regression tests to use server side apply and added the flag to all the operators, so opening the PR properly for review. Please let me know if there are any remaining changes you think are needed. |
I will have another look and try to run the system tests, But we still need to make the unit tests work for everyone and on all platforms we support. We cannot rely on system tests only. |
In the first case of Power(ppc64le), we need to see the full log output, but I won't be able to help to reproduce/diagnose it. In the case of ARM based MacOS, this is the same platform as mine so it could be a different issue - did my previous comment shed any light on the problem? I linked some issues from the project that could be related i.e. maybe an existing kube config file is causing the problem? |
/azp run feature-gates-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
5a737fe
to
c775fa2
Compare
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run feature-gates-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
Hey @scholzj! I will be taking over work on this PR with help from @jamesfarlow (if needed). Thanks for triggering regression tests! I see a few of them failed and Azure killed two jobs as well. Is that normal? Looking at the previous runs I don't think so but still, this run took a lot more time than other runs. I also ran all failed ones (namely |
There is a 6 hours limit. So if there are too many failures, the pipeline will be killed. If you check the logs, you should see at least some of the tests that failed in them. Also, in the pipelines that failed in time, you should see what failed. When a lot of stuff fails like this, it is often related to similar issues.
Are you sure you run then with the server-side-apply enabled? I did not had much time to work on this lately. I was looking at how to deal with the mocked tests to work on all platforms and to work in a reasonable time. One of the alternatives might be to use https://github.com/dajudge/kindcontainer instead of the Jenvtest. But the main issue is that all of them are much much slower than the current solution. And what really needs to be done for this PR is not just have single test for server apply. But have all the MockKube tests use something that supports server side apply. Unfortunately, I'm busy with Kraft support, so did not had much time to move forward with it :-(. |
54403af
to
f8f61de
Compare
Thanks for the changes.
I did not realized that. I think what you did would work in general. But I'm not sure it is optimal. I wonder if it would be better to have a single global feature gates configuration in operator-common shared between all classes. That way you can:
I realize that I suggested to copy them. But seeing the result, I wonder if it would be better to do one of this:
Can you elaborate on this please? Why do you think it is not needed?
This would be worth investigating so that we know what to expect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess I suggested to just copy it. I wonder if using some parametrization or abstract class with two implementations each using just a different SSA setting would make it easier 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably have some unit tests? Also, how does this handle interdependencies between feature gates which are a very common thing?
/** | ||
* Configuration string with feature gates settings | ||
*/ | ||
public static final ConfigParameter<FeatureGates> FEATURE_GATES = new ConfigParameter<>("STRIMZI_FEATURE_GATES", parseFeatureGates(), "", CONFIG_VALUES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. But I guess you also need to pass it to the user operator based on the setting in the cluster operator.
Hey, unfortunately Adam is not gonna be able to work further on this. I'm gonna step into his place. Wanted to inform you as we're still interested to push this forward and not just let this feature rot. |
@krystian-kulgawczuk-natwest Ok, thanks for letting us know. Please check the last comments I opened and let me know if there is something you need me to help with. I think this is very close to being complete, so it would be great to get it done. |
FYI: I opened a new proposal strimzi/proposals#118 to best deal with how the feature gates are configured in the User and Topic Operators. Hopefully that way we can split it off from this PR and simplify things a bit. |
Hey @scholzj, read through the proposal and it seems like a good preemptive move to not cause a lot of mess with feature gates. I just got back from vacation, so I will be able to give more attention here 😅 |
I'm fine to implement the proposal once it is approved. But we will then need to rebase this PR on top of it once it is merged. Let's see how many and hwat kind of comments will the proposal get. If things go well, it might be done next week. |
Ok cool. I'll try to close everything else that is not related to feature gates. |
I looked through those tests and tried to elaborate what Adam had in mind. I think what he meant was for the tests you mentioned, controllers/operator/etc that would actually take as an argument this is an example of such a test. In order to properly test the Please correct me wherever I'm wrong and I'm up for any suggestions how do you think this should be done. |
@krystian-kulgawczuk-natwest The
|
I get the point of writing tests for both of these cases, but I'm not exactly sure how do we want to achieve that for |
Sorry, I don't follow what exactly you mean with that. |
Okay, let me try to explain this as detailed as possible 😅 So for
*1) kube api supplier edit1: |
In some cases you might need to mock these things yes. But in others, you likely mock the whole |
Triaged on 27.6.2024: @krystian-kulgawczuk-natwest are you still interested in working on this? There are some conflicts and comments that should be resolved before proceeding. Thanks |
Adding a couple of mockito tests for SSA via abstract classes Refactor integration tests that don't use mockito and are written for SSA to use the same abstract class pattern. Signed-off-by: Krystian Kulgawczuk <[email protected]>
Yeah, would be nice if somebody could review if this is the direction we want to take with the tests as requested. I'll get back to this as soon as I can and resolve comments and conflicts. |
I guess you can use something like this for the tests and make it much easier to run it with / without SSA? Lines 85 to 86 in 2c41e2a
Also, I think you need to rebase this, these is not much value in the reviews when it is not clear if it works and how will it look like after the rebase. |
@krystian-kulgawczuk-natwest I was just wondering if you are still interested to work on this and move forward, there are several conflicts to fix and comments to address. |
Type of change
Description
This PR is an initial attempt at adding Server Side Apply (SSA) to the reconcile process, as per proposal 052 - https://github.com/strimzi/proposals/blob/main/052-k8s-server-side-apply.md
It is not finished work but I am opening the PR to get early feedback - some details below and also prompts/questions on areas that I am unsure about:
PodOperatorServerSideApplyApiServerTest.java
test.PodOperatorServerSideApplyTest.java
which overrides any tests which were failing in the originalPodOperatorTest.java
so that it is obvious where the behaviour is changed and how we have to change the mocks/verifys to match. How would you see this approach scaling if we enable SSA on all the operators? Should we create new abstract classes to represent the SSA behaviour and create new tests extending these for all the operators when SSA is enabled?FeatureGatesST
?ResourceOperatorSupplier.java
but I can't actually get this part of the project to build on my IDE so may not be correct. I also can't build on the terminal when I rebased to the latest upstream changes due to errors unrelated to my changesChecklist
Please go through this checklist and make sure all applicable tasks have been done