Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard-coded service name precludes deployment of multiple operators #27

Closed
grzleadams opened this issue Apr 22, 2024 · 9 comments
Closed

Comments

@grzleadams
Copy link

The serviceaccount name is not reliant on the release name so there can only be one per cluster. When trying to deploy a second operator instance to another namespace, you'll get an error like this:

Error: Unable to continue with install: ServiceAccount "pulp-operator-controller-manager" in namespace "pulp" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "pulp-test": current value is "pulp"; annotation validation error: key "meta.helm.sh/release-namespace" must equal "pulp-test": current value is "pulp"
@grzleadams
Copy link
Author

grzleadams commented Apr 22, 2024

Obviously deploying the operator multiple times is not a typical workflow for operators since the service account should get cluster roles that allow it to deploy in multiple namespaces... this is just a case where I wanted to test a newer version of the operator without modifying the older one.

@grzleadams
Copy link
Author

It looks like it's not possible to deploy a cluster-wide operator (the deployment sets WATCH_NAMESPACE) so it should be fairly straightforward to just append the namespace to the service account name. I'll submit a PR to do so.

@git-hyagi
Copy link
Contributor

Hi @grzleadams,

Sorry for the delayed response!
To give a little bit of context, in the past, pulp-operator was cluster-scoped but after some discussion the team decided to change it to a "namespace-scoped" operator.
We did some tests and thought about "supporting" multiple installations of pulp (in different namespaces) using a single operator deployment and also thought about documenting the possibility of doing so, but we didn't extensively test it to consider a "supported feature".

So, in a high level thought about this issue, I think that we first need to address the multiple installations of pulp (in pulp-operator repo), then we can update the Chart with the necessary changes.
WDYT?

@grzleadams
Copy link
Author

I'm ambivalent; I think there's a lot of value in providing the capability to deploy a cluster-scoped operator, but I also have significant concerns with the security and operational implications of such operators. Most operators/controllers that I've seen (GitHub's ARC, for example) default to cluster scope but can be restricted to a single namespace by the user, which is the approach that makes the most sense to me. The main problem with the multiple-controller deployment method is that all of the controllers in a cluster typically need to be upgraded simultaneously if there are CRD changes, which can be painful (and why I normally opt for a single, cluster-wide controller deployment).

@git-hyagi
Copy link
Contributor

I think there's a lot of value in providing the capability to deploy a cluster-scoped operator, but I also have significant concerns with the security and operational implications of such operators.

We are on the same page . 👍

Most operators/controllers that I've seen (GitHub's ARC, for example) default to cluster scope but can be restricted to a single namespace by the user, which is the approach that makes the most sense to me.

I agree.

The main problem with the multiple-controller deployment method is that all of the controllers in a cluster typically need to be upgraded simultaneously if there are CRD changes, which can be painful (and why I normally opt for a single, cluster-wide controller deployment).

This is something that we also encountered in some internal deployments (and indeed, it was very painful to do the upgrades).
For a single cluster-wide controller, did you have any issue with performance or resource utilization? I'm asking this because, besides the upgrade issue when CRD changes, some additional concerns that we have (we will need to test) with a single cluster-wide controller:

  • we will need to verify the HA features of the controller (multiple replicas, leader election, etc) because, since it will be managing multiple Pulp instances, an outage would be more critical
  • the resource allocation for the controller pods (more instances of CRs will probably impact in the number of reconciliations, which will probably impact in the cpu/memory utilization)
  • multiple instances of Pulp in the same namespace and name clashes (possibly letting the controller in an infinite loop)

@dkliban, do you think that we should bring back this discussion of multiple Pulp installations in the next deployers meeting? Do you agree with the above statements?

@dkliban
Copy link
Member

dkliban commented Apr 23, 2024

@git-hyagi We should consider supporting multiple Pulp deployments with one controller. I agree with the concerns your outline above. Though I think we could deliver this feature initially without HA support. Let's discuss next week.

@grzleadams
Copy link
Author

For a single cluster-wide controller, did you have any issue with performance or resource utilization?

For Pulp specifically, or with operators in general? I wasn't using Pulp before that change to namespace-specific operator scope was made, so unfortunately I can't really comment on that performance. But for other operators, I haven't really had a problem with performance when they're watching the whole cluster. I will say the Pulp operator is somewhat chattier than others so it's unclear to me whether the reconciliation is actually using more cycles than others or if it's just louder about it (for example, Pulp operator does reconciliation when any metadata updates occur, and things like Rancher will update otherwise-useless metadata a lot).

@grzleadams
Copy link
Author

grzleadams commented Apr 24, 2024

And to your other two points re: cluster-scoped controllers:

we will need to verify the HA features of the controller (multiple replicas, leader election, etc) because, since it will be managing multiple Pulp instances, an outage would be more critical

I'm not convinced this is really all that important for the operator itself. If you're in a situation where your Pulp installation requires reconciliation and the operator is dead, you probably have bigger problems. If the operator goes down by itself, though, the Pulp deployments shouldn't be impacted.

multiple instances of Pulp in the same namespace and name clashes (possibly letting the controller in an infinite loop)

You can use the CR metadata as a key to avoid name collisions. The combination of metadata.name and metadata.namespace will always be unique.

@git-hyagi
Copy link
Contributor

#32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants