-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: recreate autoscaler resources if deleted or modified #2409
Conversation
great fix, do you think it would be worth adding a test? Something that just creates a pipeline, deletes the pod, and checks to see that it comes back |
While it's great to have many tests, I feel this behavior is really niche and the value we will get from asserting it is low compared to the time and ongoing maintenance we would have to invest in the test. But I'm open to be convinced otherwise if you feel it's important. |
@@ -61,6 +63,9 @@ func (r *CollectorsGroupReconciler) Reconcile(ctx context.Context, req ctrl.Requ | |||
func (r *CollectorsGroupReconciler) SetupWithManager(mgr ctrl.Manager) error { | |||
return ctrl.NewControllerManagedBy(mgr). | |||
For(&odigosv1.CollectorsGroup{}). | |||
Owns(&appsv1.DaemonSet{}). // in case the ds is deleted or modified for any reason, this will reconcile and recreate it | |||
Owns(&appsv1.Deployment{}). // in case the deployment is deleted or modified for any reason, this will reconcile and recreate it | |||
Owns(&corev1.ConfigMap{}). // in case the configmap is deleted or modified for any reason, this will reconcile and recreate it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update the event filter.
This will add a lot of events which we can easily filter by namespace.
Or do we filter in the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure exactly how it's implemented, but i've tested it to work, and I believe the reconcile will be called only if the resource has an owner of this type which we do.
Anyhow, the cache pulls in only the relevant objects, so this will not trigger on every daemonset or deployment in the cluster, just those in cache
If for any reason, the data-collection daemonset, gateway deployment, or one of the config maps is deleted, then it's not re-created at the moment.
This PR makes it so they are.
We had a bug where the collectors group would get deleted, and it also deleted the daemonset. then if the collectors group is re-created immediately, the daemonset would stay deleted and odigos would not work