Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller panics when reconciling JobSet which is not mutated by webhook #796

Open
ChenYi015 opened this issue Feb 24, 2025 · 4 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ChenYi015
Copy link
Contributor

What happened:

I create a JobSet before installing the jobset controller/webhook, which means the JobSet is not mutated by the webhook, then I install the jobset and the controller panics:

2025-02-24T08:44:21Z    ERROR    Observed a panic    {"controller": "jobset", "controllerGroup": "jobset.x-k8s.io", "controllerKind": "JobSet", "JobSet": {"name":"demo","n
amespace":"default"}, "namespace": "default", "name": "demo", "reconcileID": "8c487ccd-e84e-4494-adbb-ca3584275adf", "panic": "runtime error: invalid memory address or nil
 pointer dereference", "panicGoValue": "\"invalid memory address or nil pointer dereference\"", "stacktrace": "goroutine 313 [running]:\nk8s.io/apimachinery/pkg/util/runti
me.logPanic({0x1d8b168, 0xc0009219e0}, {0x189e500, 0x2ae5af0})\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:107 +0xbc\nsigs.k8s.io/controller-run
time/pkg/internal/controller.(*Controller[...]).Reconcile.func1()\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:105 +0x114\np
anic({0x189e500?, 0x2ae5af0?})\n\t/usr/local/go/src/runtime/panic.go:785 +0x132\nsigs.k8s.io/jobset/pkg/controllers.dnsHostnamesEnabled(...)\n\t/workspace/pkg/controllers/
jobset_controller.go:848\nsigs.k8s.io/jobset/pkg/controllers.(*JobSetReconciler).createHeadlessSvcIfNecessary(0xc0007bad80, {0x1d8b168, 0xc000921ce0}, 0xc000ba1180)\n\t/wo
rkspace/pkg/controllers/jobset_controller.go:590 +0x66\nsigs.k8s.io/jobset/pkg/controllers.(*JobSetReconciler).reconcile(0xc0007bad80, {0x1d8b168, 0xc0009219e0}, 0xc000ba1
180, 0xc000ddaf20)\n\t/workspace/pkg/controllers/jobset_controller.go:196 +0x7ed\nsigs.k8s.io/jobset/pkg/controllers.(*JobSetReconciler).Reconcile(0xc0007bad80, {0x1d8b168
, 0xc0009219e0}, {{{0xc00090bc04?, 0x1ad4998?}, {0xc00090bc00?, 0x100?}}})\n\t/workspace/pkg/controllers/jobset_controller.go:121 +0x12d\nsigs.k8s.io/controller-runtime/pk
g/internal/controller.(*Controller[...]).Reconcile(0xc0009218f0?, {0x1d8b168?, 0xc0009219e0?}, {{{0xc00090bc04?, 0x0?}, {0xc00090bc00?, 0x0?}}})\n\t/go/pkg/mod/sigs.k8s.io
/[email protected]/pkg/internal/controller/controller.go:116 +0xbf\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler(0x1
d98ea0, {0x1d8b1a0, 0xc00038d1d0}, {{{0xc00090bc04, 0x7}, {0xc00090bc00, 0x4}}})\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.g
o:303 +0x398\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem(0x1d98ea0, {0x1d8b1a0, 0xc00038d1d0})\n\t/go/pkg/mod/sigs.k8s.i
o/[email protected]/pkg/internal/controller/controller.go:263 +0x20e\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2()\n\
t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224 +0x85\ncreated by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Co
ntroller[...]).Start.func2 in goroutine 268\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:220 +0x490\n"}
runtime.sigpanic
    /usr/local/go/src/runtime/signal_unix.go:900
sigs.k8s.io/jobset/pkg/controllers.dnsHostnamesEnabled
    /workspace/pkg/controllers/jobset_controller.go:848
sigs.k8s.io/jobset/pkg/controllers.(*JobSetReconciler).createHeadlessSvcIfNecessary
    /workspace/pkg/controllers/jobset_controller.go:590
sigs.k8s.io/jobset/pkg/controllers.(*JobSetReconciler).reconcile
    /workspace/pkg/controllers/jobset_controller.go:196
sigs.k8s.io/jobset/pkg/controllers.(*JobSetReconciler).Reconcile
    /workspace/pkg/controllers/jobset_controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224

What you expected to happen:

The controller should not panic even when reconciling a JobSet which is not mutated by webhook.

How to reproduce it (as minimally and precisely as possible):

Create a JobSet without defining .spec.network.enableDNSHostnames before installing the jobset controller/webhook.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • JobSet version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • Install tools:
  • Others:
@ChenYi015 ChenYi015 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 24, 2025
@kannon92
Copy link
Contributor

I am confused..

If you are not installing the controller/webhook, how are you able to create a jobset? Are you creating the CRDs without the controller?

@ChenYi015
Copy link
Contributor Author

In the recent days, I was testing the Helm chart, installed it and then uninstalled it. The CRD still exists in the cluster after I uninstalled the Helm chart, so I was able to create a JobSet without controller/webhook. Then I install the Helm chart again, I found that the controller panics due to nil pointer exception.

I found that there are some fields that will be nil if they are not defaulted by webhook (like js.Spec.Network), but the controller code assumes it to be non-nil, for example:

func dnsHostnamesEnabled(js *jobset.JobSet) bool {
return js.Spec.Network.EnableDNSHostnames != nil && *js.Spec.Network.EnableDNSHostnames
}

@ahg-g
Copy link
Contributor

ahg-g commented Feb 25, 2025

Thanks, would you like to contribute a fix?

@kannon92 we should think about defaulting use CEL for all possible cases, wdyt?

@ChenYi015
Copy link
Contributor Author

Thanks, would you like to contribute a fix?

I will raise a PR to fix it when I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants