Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial work to opensource guardian #9849

Merged
merged 20 commits into from
Feb 28, 2025

Conversation

Brian-McM
Copy link
Contributor

Description

Related issues/PRs

Todos

  • Tests
  • Documentation
  • Release note

Release Note

TBD

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

@marvin-tigera marvin-tigera added this to the Calico v3.30.0 milestone Feb 14, 2025
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Feb 14, 2025
@Brian-McM Brian-McM force-pushed the bm-add-guardian-to-oss branch 7 times, most recently from 8581bb4 to 09d1626 Compare February 18, 2025 19:19
ClientCertPath string `json:"clientCertPath"`
ClientKeyPath string `json:"clientKeyPath"`

Unauthenticated bool `json:"unauthenticated,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so... this seems to only apply to the other side of the tunnel upon closer inspection (and probably not even for the tunnel itself).

)

const (
tunnelNetwork = "voltron-tunnel"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is any benefits to configure this at start up instead of hard-coding it. In case we want to change it, it would be better to not hard-coded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'm just going to keep it hardcoded (since this is how it's currently done).

@Brian-McM Brian-McM force-pushed the bm-add-guardian-to-oss branch 2 times, most recently from 72d8a25 to 8045f6c Compare February 24, 2025 21:49
} else {
backlog = append(backlog, cmd)
}
case cmd := <-executor.backlogChan:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have more than 100 tasks in the backlog channel ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at the same time. Since this loop never blocks it's unlikely we'd every reach that level of requests, and if we do it would be cleared out.


func (executor *commandExecutor[C, R]) Send(params C) <-chan Result[R] {
cmd, resultChan := NewCommand[C, R](params)
executor.cmdChan <- cmd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we send 101 requests ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at the same time, but the other side of the channel shouldn't block so there wouldn't be any issue.

)

type CommandExecutor[C any, R any] interface {
Send(C) <-chan Result[R]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that in most cases, we call this as Send(nill). Do we really need parameter here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating this, I was thinking this would be more general so you can send parameters in other cases. Seemed like it might be overkill to have a different interface just to not send nil. OTOH though it might not be difficult and would make it cleaner (could use the same implementation under the hood?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd thought about this some more, and I'd like to just stick with having the parameter here. I could make another interface that doesn't take in a parameter, but it seems like that would be unnecessary given that you just pass nil as the parameter as opposed to having different interfaces to work with.

This is intended to be a general use interface as well, and plan on moving it to somewhere more general. This will at least be used for voltron.

}

func (cfg *Config) Cert() (string, *x509.CertPool, error) {
if strings.ToLower(cfg.VoltronCAType) == "public" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between public and Tigera CA Type. I assumed Tigera was the public one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure tbh, but it's in the current version of guardian so I need to make sure it's in this version.

ConnectionRetryAttempts int `default:"25" split_words:"true"`
ConnectionRetryInterval time.Duration `default:"5s" split_words:"true"`

Listen bool `default:"true"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this variable ? When can this set to false ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure, but it's in the existing guardian.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just about to remove this (after our chat yesterday) but I'm realizing that this setting may have some value. From a conceptual POV, it would make sense to not allow guardian to push up to the cluster and have this in some sort of read only state.

Given that this setting is already there (as in production), I'm inclined to leave it.

@Brian-McM Brian-McM marked this pull request as ready for review February 26, 2025 22:15
@Brian-McM Brian-McM requested a review from a team as a code owner February 26, 2025 22:15
@Brian-McM Brian-McM added docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact and removed release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Feb 26, 2025
@Brian-McM Brian-McM force-pushed the bm-add-guardian-to-oss branch from 31c441c to 56513a2 Compare February 26, 2025 22:58
This commit adds guardian, a component that can connect to calico cloud. This facilitates open source clusters connecting to calico free tier.
@Brian-McM Brian-McM force-pushed the bm-add-guardian-to-oss branch from e7ed676 to 65c05ba Compare February 27, 2025 21:26
var drainCoordinatorCh <-chan asyncutil.Result[asyncutil.Signaler]
// Use a FunctionCallRateLimiter to 1) ensure we don't rapidly retry recreating the session if it's constantly
// disconnecting and 2) return an error if this is called to many times within a 30-second time window.
drainCoordinator := asyncutil.NewFunctionCallRateLimiter(2*time.Second, 30*time.Second, 5, func(any) (asyncutil.Signaler, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this values be in the config ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for right now, I'm not sure if I want this to be configurable or now. I need to think about this more as followup.


// Close closes the listener. A closed listener cannot be used again
func (l *listener) Close() error {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we are not doing anything here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but it has to be here since it's implementing the net.Listener interface.

// Clear drains the internal buffer and returns when there's nothing left.
// Not that writing to the error buffer should not be done while clearing, since if writing is happening as quick
// as clearing is then the buffer will never be cleared.
func (b *asyncErrorBuffer) Clear() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you add some mutex variables to synchronise write, clear and close ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed, these are writing too and interacting with channels. Can you point out a specific race condition you're seeing?

}

func NewAsyncErrorBuffer() AsyncErrorBuffer {
return &asyncErrorBuffer{errs: make(chan error, 1)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always buffer 1 error ? Do we need to buffer multiple errors ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current use case no we don't. If an error occurs is used to decide whether or not we drain and backlog the executors or kill the tunnel. If multiple errors come in at the same time we only ever use the first error to decide what to do and clear the other errors if we're not exiting.

The reason for this is because we don't want multiple errors from multiple commands triggering a drain and backlog rapidly, we only want one signal to do this and we drain and backlog all executors and assume that's solve the issue for all of them. If it hasn't solve the issue for all of them, then they'll continue to return errors and we'll eventually just shutdown the tunnel.

So whether it doesn't really matter if the buffer is size 1 or 100.

return b.errs
}

func (b *asyncErrorBuffer) Close() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can close be called twice ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No and if it is it should panic (that's a developer error if this is called more than once).

return s.ch
}

func (s *signaler) Close() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can close be called twice ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No and if it is it should panic (that's a developer error if this is called more than once).

}

// Signaler is an interface used for waiting for and sending simple signals.
type Signaler interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of remaning this to Channel ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to keep this as Signaler. This isn't inteaded to be used as a Channel, only send / receive a signal with no information.


// Add all outstanding commands in the backlog or cmd channels to the backlog slice.
executor.backlog = append(executor.backlog, ReadAll(executor.backlogChan)...)
executor.backlog = append(executor.backlog, ReadAll(executor.cmdChan)...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to perform this operation before line 142: https://github.com/projectcalico/calico/pull/9849/files#diff-332e67201f01430847e13e593d0dc866cbb68781014e7dc94babff2de33d1f92R142. Either that or you do not need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be done before, this is actually intentional. I'm closing the channel so nothing more can be sent over it. Closing the channel doesn't "clear" it, all the items will remain on the channel until they're read off.

This basically forces a panic to happen if somebody writes to the channel after it's closed and we're trying to clean up. If they could write to it after reading everything off, then that would result in some commands hanging after close.

@caseydavenport caseydavenport merged commit 987746b into projectcalico:master Feb 28, 2025
3 of 4 checks passed
@Brian-McM Brian-McM deleted the bm-add-guardian-to-oss branch February 28, 2025 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants