-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(webhook-receivers.md): responsiveness #612
Conversation
lloydchang
commented
Oct 31, 2021
•
edited
Loading
edited
- responsiveness
- fix broken links
@resouer @wonderflow @christianh814 Please review this #612 when you time permits. This relates to our conversation in #wg-gitops at https://cloud-native.slack.com/archives/C01G9DEE88M/p1635706634045400?thread_ts=1635251844.014900 Thank you! 🙂 |
This is an incorrect change, in my opinion. Webhook receivers do not alter Flux's behavior so that it is "push-based" instead of "pull-based" and therefore they are not "counter to GitOps principles" in any way – a webhook receiver only receives events, and it only triggers the same pull-based reconciliation that is considered as central to GitOps. It would be different if the webhook pushed a specific commit hash, or any out-of-band instructions or information at all that were not fully encapsulated in a One is absolutely still "doing proper GitOps" with any receiving webhooks enabled, as this is done in order to shorten the control loop. The only procedural difference with receivers enabled is that reconcilers no longer need to poll in order to achieve a fast inner-loop and to ensure that changes from the repo are applied within a reasonable amount of time. With Receivers configured, the "sync" event can no longer depend on |
This is something that was being discussed by the chairs @scottrigby @todaywasawesome @murillodigital (weak GitOps vs strong GitOps) IMHO, doing a webhook to shorten the wait time between reconciliation is fine, and can exist within the principles of GitOps. As long as the reconcile isn't turned off, having web hooks trigger the reconciler earlier is fine. |
Hi @kingdonb, @yebyen, @christianh814, @scottrigby, @todaywasawesome, @murillodigital, @resouer, @wonderflow, TL;DR:
To avoid groupthink in consensus decision-making, I am elaborating on why I created this pull request: @kingdonb, @yebyen, @christianh814, I re-read your feedback, ...1412 and ...3011, several times. I empathize and understand that it may be accidental, or maybe even easier, to conflate GitOps Principles 3, 4, 5, holistically as one set. Let's not conflate them in this pull request. Thank you! 🙂 I would like to double-check whether you are conflating, or not conflating, GitOps Principles 3 and 4. GitOps Principle 3 says "automatically" and not "automated". Since it is a principle, I interpret "Pulled Automatically" to mean fully automatic as a target operating model.
In fully automatic, there are no intentional events at all — There wouldn't be webhooks driving push-based events intentionally from server-to-server.
Webhooks compromise security posture by introducing new risks that need to be assessed. For example:
In the context of Flux CD version 2, rate limiting and security risks were discussed at fluxcd/notification-controller#180 and fluxcd/notification-controller#193 While event-driven webhooks are intentional and automated, they aren't automatic in the context of GitOps.
Pushed-based pipelines lack the degree of security designed in GitOps. The intent in webhooks makes push-based pipelines automated with different outcomes between happy path and threat model, thus not GitOps. The intent in webhooks contrasts with GitOps Principle 3. Pulled Automatically.
To play devil's advocate, one can split hairs and say it's semi-automated or semi-automatic within the spectrum:
This pull request is about GitOps Principle 3: Pulled Automatically. Whereas GitOps Principle 4 is about reconciliation.
@christianh814 and @yebyen — As for weak GitOps or strong GitOps, I introduced that terminology to @todaywasawesome and @scottrigby, with documented examples at open-gitops/documents#24 (comment). In that context, it is about proposed GitOps Principle 5.
Separately from this pull request, I feel that autonomic is more applicable to GitOps Principle 5. Operated in a closed loop, also known as strong GitOps. @christianh814 wrote:
The reconcile can be turned off, e.g. ❔ Curiously, how does this discussion about Flux compare and contrast with Carvel, kpt, and Argo CD: Webhook notification service? @christianh814 wrote in #wg-gitops:
The push-based pipelines range from intentional event-driven webhooks that need rate-limiting to mitigate security risks to CIOps & CI Ops push-based pipelines.
@scottrigby, @todaywasawesome and I had discussed push-based Continuous Integration Operations (CIOps & CI Ops) at open-gitops/documents#8 The following automobile analogy, cruise control, is an homage to CruiseControl open-source software from 2001 (20 years ago). CruiseControl was one of the first of its kind of open-source software which paved a path for continuous integration. Automated: Semi-automatic: Fully-automatic: Please let me know if you feel like we are splitting hairs between: Thank you @kingdonb, @yebyen, @christianh814, @scottrigby, @todaywasawesome, @murillodigital, @resouer, @wonderflow for your time and feedback 🙂 |
The reconcile can be turned off, e.g. flux suspend ...
This does not just disable the interval reconciling, it disables the
resource from being reconciled even by webhook. This is incorrect.
You would not be able to disable reconciling at an interval by suspending
the GitRepository resource in Flux, and then expect it to receive webhook
events yet and still reconcile for them.
Flux Kustomizations are also automatically reconciled by means of an
internal subscription, which reconciles the Kustomization upon completion
of the GitRepository's reconcile when there are changes. This is basically
equivalent to a webhook, though fully internal to the system; I still don't
think it compromises the principle #3 at all. Reconciles are not
independent events. A Flux Kustomization reconciles from a source into the
Kubernetes API. A GitRepository source reconciles from a git remote into
the cluster, to a local cache of commits. In a closed loop, the
Kustomization will only ever reconcile changes when the source has changed.
But Flux still reconciles at the interval, whether there are changes to
apply/any drift to correct or not.
Receivers can also be rate limited, and webhooks may not be guaranteed
delivery. So there is an incentive to leave the sync interval small, even
with a webhook receiver in place. Non-delivery of a webhook event should
never compromise the functioning operation of a GitOps system, though it
can compromise the fastness and may introduce undesirable delays.
It should be noted also, that sending webhook events at too high a
frequency should only cause GitRepository to attempt to sync more
frequently. Without any change to sync, this will be a relatively
inexpensive no-op, and it will not cause the downstream Kustomizations to
be reconciled again by the subscription that I described earlier, unless
the versioned commit SHA head was updated between attempts.
These practical performance and DoS concerns are further mitigated by the
capability of webhooks to verify signatures. An exposed webhook properly
configured will not cause sync of anything if the request is not signed
with a correct webhook token secret in Flux.
… |
@kingdonb wrote:
❔ Would you have a concrete example? Thank you. ❔ Are you referring to HMAC signature at fluxcd/notification-controller#127 from January? The above questions are to avoid over-promising and under-delivering. It might be an unintentional security oversight and over-promising to claim that webhooks signatures are validated without further clarifying that payloads aren't validated. After reading a report by @aliusmiles in October at fluxcd/notification-controller#256
My understanding is that payload validation isn't happening at all, which makes sense to me because of the complexity to validate payloads. In April, when I reviewed fluxcd/notification-controller#180 (comment), I wrote:
Due to the complexity to validate payloads, fluxcd/notification-controller#180 was closed as Won't Do in May. It is okay to not validate payloads because of the complexity. The corollary is to be clearer in the documentation In particular, the sentence
is misleading. In February, perhaps @dholbach unintentionally added that sentence without fully realizing that the payload isn't validated at all. Thank you, everyone 🙂 |
The reason that it's safe to not validate payloads, is because the payload is actually not used for anything, across all of the Webhook receiver types. (Not only because it would be too difficult) The Webhook only has one message, which makes the API surface very small and easy to verify its safety. The content does not affect the function. It's like a method that takes zero parameters, the content is discarded in toto. The content is never processed for anything, with the possible exception of selecting which event (push) can trigger the webhook to fire (part of the security posture also.) The example is every Receiver in the pack, with the exception of only So even for BitBucket Cloud (which intentionally doesn't permit signing requests, to leave it as an advantage of the paid BitBucket Server?) the webhook can be protected as long as the URL itself is kept secret. Other Receiver types are based on the hmac signing pattern, I haven't read them all but my naive understanding is they all work this way. |
I wasn't aware that payloads are not validated at all, thanks for adding this @lloydchang |
The way we designed the webhook integrations doesn't make Flux less GitOpsy. The webhooks don't tell Flux what to pull or from where, Flux doesn't look at the webhook payload. The sole scope of the Flux |
It does say
so whether or not you agree that it's doctrinally OK to have webhooks, the text is inaccurate in the first place (not to mention: "GitOps pipeline" is not defined anywhere, is it?). I would just remove that whole sentence. |
I agree with @squaremo except that:
there was clearly meant to be a "compare" / "contrast" flow to this paragraph. Removing this whole sentence will break that sentence structure and compromise the flow, (leaving a weird intro paragraph that has only ebb left, and no flow.) Something like this maybe?
|
push-based means the control belongs to the pushing process; e.g., a CI system that tells the cluster to sync, then waits to see if it succeeds. That isn't what is happening here -- the control belongs to the pulling process. On that basis, it's better to say that using webhooks makes it more responsive, and leave it at that. |
I think we still want to use the words push-based here, if nothing else then for SEO reasons and to draw the attention of people that still think they want something push-based. This version does not apply the descriptor "GitOps" to the category I am fine with anything that reads like this. Can you amend the language of your PR @lloydchang or we can close it and resolve this with a separate PR 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm late to the party! Thanks, @lloydchang for raising the question, I think this has been a very interesting discussion. And thanks everyone for participating! I'm guessing there will be more conversations like this for different functionality, configurations, and documentation of Flux and other tools in the GitOps ecosystem.
@lloydchang To move this forward, would you please update this PR wording according to maintainer suggestions, or else close this PR if you don't have time to do that? I believe a few options are given above.
@squaremo I agree with you this is different from traditional CI and/or CD push-based systems for reasons you mentioned. However, I believe @stefanprodan added that wording to make clear the need has been met for end-users who don't want to wait until the next scheduled interval whenever they purposefully make a source change. They are often looking for "push-based" support. We offer this, just architected in a GitOps way.
We understand that while the webhook receiver accepts a "push" message, that's only to let it know it's time to do another pull immediately. We could call this a "poke" informally since it's done differently than the less secure traditional CI & CD "push-only" models 😛 It's just architected to conform to GitOps principles. But I think "push" is accurate here.
End users can read more about this in the docs, but often users are looking for a "push-based" CD because they want this exact behavior. This poke method is literally still a "push" event. Once they learn it's equally fast, more secure, and adheres to GitOps principles too, that's added value for them, not false advertising IMO. Defer to you and the other Flux oversight committee members to decide on whether or not to keep push-based wording in some way. My 2¢ is that there's still value in using the term here for those end-users looking for this, and in terms of accuracy if anything it's chance to educate folks on GitOps while making clear Flux satisfies their legitimate need.
Signed-off-by: lloydchang <[email protected]>
amended |
I think I can approve this, @scottrigby WDYT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I re-ran the tests but I am not able to see why lychee link checker test is failing right now. |
fix https://github.com/fluxcd/website/runs/4156089344 fix 2 errors reported by lychee Link Checker: Errors in content/en/docs/guides/webhook-receivers.md ✗ file:///github/workspace/content/en/docs/installation (Cannot find file file:///github/workspace/content/en/docs/installation) ✗ file:///github/workspace/content/en/docs/get-started/index.md (Cannot find file file:///github/workspace/content/en/docs/get-started/index.md) Signed-off-by: lloydchang <[email protected]>
Looks like that did it, thank you @lloydchang Please approve if you agree @scottrigby then this can be merged 👍 |
@kingdonb please re-review. To fix (pre-existing) broken links to pass lychee Link Checker, I added another Git commit. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @lloydchang
The raised issues were addressed