docs(webhook-receivers.md): responsiveness #612

lloydchang · 2021-10-31T19:36:21Z

responsiveness
fix broken links

lloydchang · 2021-10-31T19:42:23Z

@resouer @wonderflow @christianh814 Please review this #612 when you time permits.

This relates to our conversation in #wg-gitops at https://cloud-native.slack.com/archives/C01G9DEE88M/p1635706634045400?thread_ts=1635251844.014900

Thank you! 🙂

kingdonb · 2021-10-31T19:51:44Z

This is an incorrect change, in my opinion.

Webhook receivers do not alter Flux's behavior so that it is "push-based" instead of "pull-based" and therefore they are not "counter to GitOps principles" in any way – a webhook receiver only receives events, and it only triggers the same pull-based reconciliation that is considered as central to GitOps. It would be different if the webhook pushed a specific commit hash, or any out-of-band instructions or information at all that were not fully encapsulated in a GitRepository.

One is absolutely still "doing proper GitOps" with any receiving webhooks enabled, as this is done in order to shorten the control loop. The only procedural difference with receivers enabled is that reconcilers no longer need to poll in order to achieve a fast inner-loop and to ensure that changes from the repo are applied within a reasonable amount of time.

With Receivers configured, the "sync" event can no longer depend on spec.interval; instead the request to sync reaches Flux through the webhook. It is the exact same control loop as before, but with receivers we have incorporated a better timing source that obviates the need to set a short interval and poll frequently.

christianh814 · 2021-10-31T20:02:49Z

This is something that was being discussed by the chairs @scottrigby @todaywasawesome @murillodigital (weak GitOps vs strong GitOps)

IMHO, doing a webhook to shorten the wait time between reconciliation is fine, and can exist within the principles of GitOps.

As long as the reconcile isn't turned off, having web hooks trigger the reconciler earlier is fine.

lloydchang · 2021-10-31T20:04:01Z

Relates to:
• gitops-working-group/gitops-working-group#166
• open-gitops/documents#34
• https://cloud-native.slack.com/archives/C01G9DEE88M/p1635706634045400?thread_ts=1635251844.014900

lloydchang · 2021-11-01T01:14:54Z

Hi @kingdonb, @yebyen, @christianh814, @scottrigby, @todaywasawesome, @murillodigital, @resouer, @wonderflow,

TL;DR:

I appreciated your time and feedback 🙂
GitOps, without any pushes at all, is designed to be secure.
Webhooks, by introducing pushes, compromise our GitOps security posture.
Push-based pipelines and event-driven webhooks in event-driven architecture that trigger an automated pull are different from GitOps Principle 3. Pulled Automatically.
While the two English words are spelled similarly, there is a significant degree of intentional differences between automated and automatic.

To avoid groupthink in consensus decision-making, I am elaborating on why I created this pull request:

@kingdonb, @yebyen, @christianh814, I re-read your feedback, ...1412 and ...3011, several times.

I empathize and understand that it may be accidental, or maybe even easier, to conflate GitOps Principles 3, 4, 5, holistically as one set. Let's not conflate them in this pull request. Thank you! 🙂

I would like to double-check whether you are conflating, or not conflating, GitOps Principles 3 and 4.

GitOps Principle 3 says "automatically" and not "automated". Since it is a principle, I interpret "Pulled Automatically" to mean fully automatic as a target operating model.

Pulled Automatically
Software agents automatically pull the desired state declarations from the source.

In fully automatic, there are no intentional events at all — There wouldn't be webhooks driving push-based events intentionally from server-to-server.

GitOps, without any pushes at all, is designed to be secure.
Webhooks, by introducing pushes, compromise our GitOps security posture.

Webhooks compromise security posture by introducing new risks that need to be assessed. For example:

When webhooks push intentionally at high rates, this situation introduces security risks of incoming events needing to be rate limited at the ingress (i.e. ingress-nginx, or load balancer (LB), or web application firewall (WAF), etc.) to prevent denial of service (DOS) attacks.

In the context of Flux CD version 2, rate limiting and security risks were discussed at fluxcd/notification-controller#180 and fluxcd/notification-controller#193

While event-driven webhooks are intentional and automated, they aren't automatic in the context of GitOps.

In a happy path, webhooks at intentional low rates trigger Kubernetes controllers to poll earlier.
In a threat model, webhooks at intentional high rates could overwhelm Kubernetes controllers with webhook receivers.

Pushed-based pipelines lack the degree of security designed in GitOps.

The intent in webhooks makes push-based pipelines automated with different outcomes between happy path and threat model, thus not GitOps.

The intent in webhooks contrasts with GitOps Principle 3. Pulled Automatically.

While the two English words are spelled similarly, there is a significant degree of intentional differences between automated and automatic.

automated
/ˈôdəˌmādəd/

operated by largely automatic equipment.

au·to·mat·ic
/ˌôdəˈmadik/

(of a device or process) working by itself with little or no direct human control.

done or occurring spontaneously, without conscious thought or intention.

A person who manually triggered a webhook, then the polling earlier, then subsequent pulling, triggered an automated process — This is different from GitOps Principle 3. Pulled Automatically.
This makes pushed-based pipelines an automated process, but not a fully automatic process.

To play devil's advocate, one can split hairs and say it's semi-automated or semi-automatic within the spectrum:

manually
semi-automated
automated
semi-automatic
automatic
fully automatic
autonomic

This pull request is about GitOps Principle 3: Pulled Automatically.

Whereas GitOps Principle 4 is about reconciliation.

Continuously Reconciled
Software agents continuously observe actual system state and attempt to apply the desired state.

@christianh814 and @yebyen — As for weak GitOps or strong GitOps, I introduced that terminology to @todaywasawesome and @scottrigby, with documented examples at open-gitops/documents#24 (comment). In that context, it is about proposed GitOps Principle 5.

Operated in a closed loop
Software agents observe desired state and meta-state to take actions based on policy when the desired state cannot be reached. This may include things like notifications, rollbacks, etc.

Separately from this pull request, I feel that autonomic is more applicable to GitOps Principle 5. Operated in a closed loop, also known as strong GitOps.

au·to·nom·ic
/ˌôdəˈnämik/

involuntary or unconscious; relating to the autonomic nervous system.

@christianh814 wrote:

As long as the reconcile isn't turned off, having web hooks trigger the reconciler earlier is fine.

The reconcile can be turned off, e.g. flux suspend ...

❔ Curiously, how does this discussion about Flux compare and contrast with Carvel, kpt, and Argo CD: Webhook notification service?

@christianh814 wrote in #wg-gitops:

I haven't seen carvel but kpt seems event driven so it might not fit. I would have to do some testing around it.

Example: I can turn off reconciliation on Argo CD and use web hooks to do the sync.
^ this isn't GitOps (although a valid workflow).

The push-based pipelines range from intentional event-driven webhooks that need rate-limiting to mitigate security risks to CIOps & CI Ops push-based pipelines.

❔ How secure are the typical CI & CD pipelines?
Let’s see how we can improve the typical CI & CD pipelines with GitOps.

@scottrigby, @todaywasawesome and I had discussed push-based Continuous Integration Operations (CIOps & CI Ops) at open-gitops/documents#8

The following automobile analogy, cruise control, is an homage to CruiseControl open-source software from 2001 (20 years ago). CruiseControl was one of the first of its kind of open-source software which paved a path for continuous integration.

Automated:
🚗 🦶 🔘 👈

Cruise control operations: The driver must bring the vehicle up to speed manually and use a button to set the cruise control to the current speed.

Semi-automatic:
🚗 🚙

Adaptive cruise control (ACC) is an available cruise control advanced driver-assistance system for road vehicles that automatically adjusts the vehicle speed to maintain a safe distance from vehicles ahead.

Fully-automatic:
🏎️ 🏍️ 💨 🛻 🛺 🚘 💥 🚌 🚷 🚶 🚷 🚌 ⚠️ 🚗 🚙

Cooperative Adaptive Cruise Control (CACC) utilizes vehicle-to-vehicle communications so that the vehicle has information not just on the vehicle immediately in front (through sensors), but also on a leading vehicle or vehicles further in front, through vehicle-to-vehicle communications of key parameters such as position, velocity, acceleration.

Please let me know if you feel like we are splitting hairs between:
• automated
• semi-automatic
• fully automatic
🙂

Thank you @kingdonb, @yebyen, @christianh814, @scottrigby, @todaywasawesome, @murillodigital, @resouer, @wonderflow for your time and feedback 🙂

kingdonb · 2021-11-01T01:53:58Z

The reconcile can be turned off, e.g. flux suspend ...

This does not just disable the interval reconciling, it disables the resource from being reconciled even by webhook. This is incorrect. You would not be able to disable reconciling at an interval by suspending the GitRepository resource in Flux, and then expect it to receive webhook events yet and still reconcile for them. Flux Kustomizations are also automatically reconciled by means of an internal subscription, which reconciles the Kustomization upon completion of the GitRepository's reconcile when there are changes. This is basically equivalent to a webhook, though fully internal to the system; I still don't think it compromises the principle #3 at all. Reconciles are not independent events. A Flux Kustomization reconciles from a source into the Kubernetes API. A GitRepository source reconciles from a git remote into the cluster, to a local cache of commits. In a closed loop, the Kustomization will only ever reconcile changes when the source has changed. But Flux still reconciles at the interval, whether there are changes to apply/any drift to correct or not. Receivers can also be rate limited, and webhooks may not be guaranteed delivery. So there is an incentive to leave the sync interval small, even with a webhook receiver in place. Non-delivery of a webhook event should never compromise the functioning operation of a GitOps system, though it can compromise the fastness and may introduce undesirable delays. It should be noted also, that sending webhook events at too high a frequency should only cause GitRepository to attempt to sync more frequently. Without any change to sync, this will be a relatively inexpensive no-op, and it will not cause the downstream Kustomizations to be reconciled again by the subscription that I described earlier, unless the versioned commit SHA head was updated between attempts. These practical performance and DoS concerns are further mitigated by the capability of webhooks to verify signatures. An exposed webhook properly configured will not cause sync of anything if the request is not signed with a correct webhook token secret in Flux.

…

lloydchang · 2021-11-01T06:31:47Z

@kingdonb wrote:

capability of webhooks to verify signatures

❔ Would you have a concrete example? Thank you.

❔ Are you referring to HMAC signature at fluxcd/notification-controller#127 from January?

The above questions are to avoid over-promising and under-delivering.

It might be an unintentional security oversight and over-promising to claim that webhooks signatures are validated without further clarifying that payloads aren't validated.

After reading a report by @aliusmiles in October at fluxcd/notification-controller#256

generic/hmac receivers and found out that payload actually doesn't matter at all for them.

it's worth updating docs on HMAC webhook to add clear statement that payload content is not validated, only signature is verified?

singing [sic] empty string is not a good idea and any random text must be used

My understanding is that payload validation isn't happening at all, which makes sense to me because of the complexity to validate payloads.

In April, when I reviewed fluxcd/notification-controller#180 (comment), I wrote:

My understanding is the applications listed at https://toolkit.fluxcd.io/components/notification/controller/ have different webhook formats. You wouldn't find a standard payload amongst them.

Due to the complexity to validate payloads, fluxcd/notification-controller#180 was closed as Won't Do in May.

It is okay to not validate payloads because of the complexity. The corollary is to be clearer in the documentation
at https://fluxcd.io/docs/components/notification/receiver
and https://github.com/fluxcd/website/blob/main/content/en/docs/guides/webhook-receivers.md
that the payload isn't validated at all.

In particular, the sentence

Notification controller validates the authenticity of the payload using HMAC

is misleading.

In February, perhaps @dholbach unintentionally added that sentence without fully realizing that the payload isn't validated at all.

Thank you, everyone 🙂

yebyen · 2021-11-01T11:41:13Z

payload actually doesn't matter at all for them

The reason that it's safe to not validate payloads, is because the payload is actually not used for anything, across all of the Webhook receiver types. (Not only because it would be too difficult)

The Webhook only has one message, which makes the API surface very small and easy to verify its safety. The content does not affect the function. It's like a method that takes zero parameters, the content is discarded in toto. The content is never processed for anything, with the possible exception of selecting which event (push) can trigger the webhook to fire (part of the security posture also.)

The example is every Receiver in the pack, with the exception of only generic type, to account for services like BitBucket Cloud which do not offer webhook secret or signing. The URL itself is randomly generated based on the content of the Receiver and the secret token though.

So even for BitBucket Cloud (which intentionally doesn't permit signing requests, to leave it as an advantage of the paid BitBucket Server?) the webhook can be protected as long as the URL itself is kept secret. Other Receiver types are based on the hmac signing pattern, I haven't read them all but my naive understanding is they all work this way.

yebyen · 2021-11-01T11:46:11Z

I wasn't aware that payloads are not validated at all, thanks for adding this @lloydchang
(It might be important for cryptographic purposes, even if the payload is not used for anything; having a payload that varies from one request to the next, could be important for ensuring that the cryptographic integrity of the message is not compromised, and that replay attacks are impossible. I don't know enough about the crypto to say one way or another.)

stefanprodan · 2021-11-01T11:53:25Z

The way we designed the webhook integrations doesn't make Flux less GitOpsy. The webhooks don't tell Flux what to pull or from where, Flux doesn't look at the webhook payload. The sole scope of the Flux Receiver API is to request a reconciliation outside of the specified interval, similar to flux reconcile command.

squaremo · 2021-11-01T12:12:26Z

It does say

push-based [GitOps] pipelines

so whether or not you agree that it's doctrinally OK to have webhooks, the text is inaccurate in the first place (not to mention: "GitOps pipeline" is not defined anywhere, is it?). I would just remove that whole sentence.

kingdonb · 2021-11-01T13:54:32Z

I agree with @squaremo except that:

Flux is by design pull-based. In order to notify the Flux controllers about changes in Git or Helm repositories,
you can setup webhooks and trigger a cluster reconciliation every time a source changes. Using webhook receivers, you can build push-based GitOps pipelines that react to external events.

there was clearly meant to be a "compare" / "contrast" flow to this paragraph. Removing this whole sentence will break that sentence structure and compromise the flow, (leaving a weird intro paragraph that has only ebb left, and no flow.)

Something like this maybe?

every time a source changes. Using webhook receivers, you can build responsive and reactive workflows that appear to behave like push-based pipelines, with a fast inner dev loop experience, and without need for aggressive polling of Git sources.

squaremo · 2021-11-01T14:09:10Z

that appear to behave like push-based pipelines

push-based means the control belongs to the pushing process; e.g., a CI system that tells the cluster to sync, then waits to see if it succeeds. That isn't what is happening here -- the control belongs to the pulling process. On that basis, it's better to say that using webhooks makes it more responsive, and leave it at that.

kingdonb · 2021-11-08T15:35:53Z

every time a source changes. Using webhook receivers, you can build fast and reactive workflows that happen as quickly as possible, and compete with push-based pipelines in terms of responsiveness to changes, with a fast inner dev loop experience, and without need for aggressive polling of Git sources.

I think we still want to use the words push-based here, if nothing else then for SEO reasons and to draw the attention of people that still think they want something push-based. This version does not apply the descriptor "GitOps" to the category push-based anymore, so I think it satisfies the original concern of the report/PR. It is probably still 50% more words than we want in this paragraph, but captures all of the meaning I think we wanted to convey. Would be happy if there was a shorter way to say as much, (writing terse prose is not my personal strongest point.)

I am fine with anything that reads like this. Can you amend the language of your PR @lloydchang or we can close it and resolve this with a separate PR 👍

scottrigby

I'm late to the party! Thanks, @lloydchang for raising the question, I think this has been a very interesting discussion. And thanks everyone for participating! I'm guessing there will be more conversations like this for different functionality, configurations, and documentation of Flux and other tools in the GitOps ecosystem.

@lloydchang To move this forward, would you please update this PR wording according to maintainer suggestions, or else close this PR if you don't have time to do that? I believe a few options are given above.

@squaremo I agree with you this is different from traditional CI and/or CD push-based systems for reasons you mentioned. However, I believe @stefanprodan added that wording to make clear the need has been met for end-users who don't want to wait until the next scheduled interval whenever they purposefully make a source change. They are often looking for "push-based" support. We offer this, just architected in a GitOps way.

We understand that while the webhook receiver accepts a "push" message, that's only to let it know it's time to do another pull immediately. We could call this a "poke" informally since it's done differently than the less secure traditional CI & CD "push-only" models 😛 It's just architected to conform to GitOps principles. But I think "push" is accurate here.

End users can read more about this in the docs, but often users are looking for a "push-based" CD because they want this exact behavior. This poke method is literally still a "push" event. Once they learn it's equally fast, more secure, and adheres to GitOps principles too, that's added value for them, not false advertising IMO. Defer to you and the other Flux oversight committee members to decide on whether or not to keep push-based wording in some way. My 2¢ is that there's still value in using the term here for those end-users looking for this, and in terms of accuracy if anything it's chance to educate folks on GitOps while making clear Flux satisfies their legitimate need.

Signed-off-by: lloydchang <[email protected]>

lloydchang · 2021-11-09T05:56:04Z

amended

kingdonb · 2021-11-09T14:35:14Z

I think I can approve this, @scottrigby WDYT

kingdonb

LGTM

kingdonb · 2021-11-09T18:24:24Z

I re-ran the tests but I am not able to see why lychee link checker test is failing right now.

fix https://github.com/fluxcd/website/runs/4156089344 fix 2 errors reported by lychee Link Checker: Errors in content/en/docs/guides/webhook-receivers.md ✗ file:///github/workspace/content/en/docs/installation (Cannot find file file:///github/workspace/content/en/docs/installation) ✗ file:///github/workspace/content/en/docs/get-started/index.md (Cannot find file file:///github/workspace/content/en/docs/get-started/index.md) Signed-off-by: lloydchang <[email protected]>

kingdonb · 2021-11-09T23:09:26Z

Looks like that did it, thank you @lloydchang

Please approve if you agree @scottrigby then this can be merged 👍

lloydchang · 2021-11-09T23:09:40Z

@kingdonb please re-review. To fix (pre-existing) broken links to pass lychee Link Checker, I added another Git commit. Thank you.

stefanprodan

LGTM

Thanks @lloydchang

The raised issues were addressed

lloydchang mentioned this pull request Oct 31, 2021

docs(LANDSCAPE.md): add open-gitops/documents#34

Closed

lloydchang mentioned this pull request Oct 31, 2021

Add Argo into the recommendation. gitops-working-group/gitops-working-group#166

Closed

lloydchang mentioned this pull request Nov 1, 2021

Update docs on Generic/HMAC receiver payload fluxcd/notification-controller#256

Open

scottrigby added the question Further information is requested label Nov 8, 2021

scottrigby previously requested changes Nov 8, 2021

View reviewed changes

docs(webhook-receivers.md): responsiveness

e857c3f

Signed-off-by: lloydchang <[email protected]>

lloydchang force-pushed the patch-1 branch from d50ab86 to e857c3f Compare November 9, 2021 05:53

lloydchang changed the title ~~docs(website/content/en/docs/guides/webhook-receivers.md): remove GitOps from push-based pipelines~~ docs(webhook-receivers.md): responsiveness Nov 9, 2021

lloydchang requested a review from scottrigby November 9, 2021 05:54

kingdonb self-requested a review November 9, 2021 14:35

kingdonb approved these changes Nov 9, 2021

View reviewed changes

lloydchang requested a review from kingdonb November 9, 2021 23:10

stefanprodan approved these changes Nov 10, 2021

View reviewed changes

stefanprodan merged commit 2d80ca4 into fluxcd:main Nov 10, 2021

lloydchang mentioned this pull request Nov 10, 2021

Security focused group? open-gitops/project#32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(webhook-receivers.md): responsiveness #612

docs(webhook-receivers.md): responsiveness #612

lloydchang commented Oct 31, 2021 •

edited

Loading

lloydchang commented Oct 31, 2021

kingdonb commented Oct 31, 2021 •

edited

Loading

christianh814 commented Oct 31, 2021

lloydchang commented Oct 31, 2021

lloydchang commented Nov 1, 2021

kingdonb commented Nov 1, 2021 via email

lloydchang commented Nov 1, 2021 •

edited

Loading

yebyen commented Nov 1, 2021

yebyen commented Nov 1, 2021 •

edited by kingdonb

Loading

stefanprodan commented Nov 1, 2021 •

edited

Loading

squaremo commented Nov 1, 2021

kingdonb commented Nov 1, 2021 •

edited

Loading

squaremo commented Nov 1, 2021

kingdonb commented Nov 8, 2021 •

edited

Loading

scottrigby left a comment •

edited

Loading

lloydchang commented Nov 9, 2021

kingdonb commented Nov 9, 2021

kingdonb left a comment

kingdonb commented Nov 9, 2021

kingdonb commented Nov 9, 2021

lloydchang commented Nov 9, 2021

stefanprodan left a comment

docs(webhook-receivers.md): responsiveness #612

docs(webhook-receivers.md): responsiveness #612

Conversation

lloydchang commented Oct 31, 2021 • edited Loading

lloydchang commented Oct 31, 2021

kingdonb commented Oct 31, 2021 • edited Loading

christianh814 commented Oct 31, 2021

lloydchang commented Oct 31, 2021

lloydchang commented Nov 1, 2021

kingdonb commented Nov 1, 2021 via email

lloydchang commented Nov 1, 2021 • edited Loading

yebyen commented Nov 1, 2021

yebyen commented Nov 1, 2021 • edited by kingdonb Loading

stefanprodan commented Nov 1, 2021 • edited Loading

squaremo commented Nov 1, 2021

kingdonb commented Nov 1, 2021 • edited Loading

squaremo commented Nov 1, 2021

kingdonb commented Nov 8, 2021 • edited Loading

scottrigby left a comment • edited Loading

Choose a reason for hiding this comment

lloydchang commented Nov 9, 2021

kingdonb commented Nov 9, 2021

kingdonb left a comment

Choose a reason for hiding this comment

kingdonb commented Nov 9, 2021

kingdonb commented Nov 9, 2021

lloydchang commented Nov 9, 2021

stefanprodan left a comment

Choose a reason for hiding this comment

lloydchang commented Oct 31, 2021 •

edited

Loading

kingdonb commented Oct 31, 2021 •

edited

Loading

lloydchang commented Nov 1, 2021 •

edited

Loading

yebyen commented Nov 1, 2021 •

edited by kingdonb

Loading

stefanprodan commented Nov 1, 2021 •

edited

Loading

kingdonb commented Nov 1, 2021 •

edited

Loading

kingdonb commented Nov 8, 2021 •

edited

Loading

scottrigby left a comment •

edited

Loading