Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OpenSearch operator panics and crashes when adding an OpenSearchISMPolicy #801

Closed
nilushancosta opened this issue May 6, 2024 · 1 comment · Fixed by #805
Closed
Labels
bug Something isn't working

Comments

@nilushancosta
Copy link
Contributor

nilushancosta commented May 6, 2024

What is the bug?

When adding an OpenSearchISMPolicy while the OpenSearch cluster is getting created, the controller panics resulting in a container crash

2024-05-06T18:19:54.202Z	INFO	Reconciling OpensearchISMPolicy	{"controller": "opensearchismpolicy", "controllerGroup": "opensearch.opster.io", "controllerKind": "OpenSearchISMPolicy", "OpenSearchISMPolicy": {"name":"sample-policy","namespace":"test"}, "namespace": "test", "name": "sample-policy", "reconcileID": "adc1b967-662a-42d0-9c17-95e048ad0ad6", "tenant": {"name":"sample-policy","namespace":"test"}}
2024-05-06T18:19:54.279Z	DEBUG	events	error creating opensearch client	{"type": "Warning", "object": {"kind":"OpenSearchISMPolicy","namespace":"test","name":"sample-policy","uid":"abab26b9-2ca0-4882-a167-4cf37994dcb9","apiVersion":"opensearch.opster.io/v1","resourceVersion":"463314"}, "reason": "OpensearchError"}
2024-05-06T18:19:54.284Z	INFO	Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference	{"controller": "opensearchismpolicy", "controllerGroup": "opensearch.opster.io", "controllerKind": "OpenSearchISMPolicy", "OpenSearchISMPolicy": {"name":"sample-policy","namespace":"test"}, "namespace": "test", "name": "sample-policy", "reconcileID": "adc1b967-662a-42d0-9c17-95e048ad0ad6"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x11f2d64]

goroutine 442 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:115 +0x1a4
panic({0x141dec0?, 0x27073d0?})
	/usr/local/go/src/runtime/panic.go:770 +0x124
github.com/Opster/opensearch-k8s-operator/opensearch-operator/opensearch-gateway/services.(*OsClusterClient).GetISMConfig(0x0, {0x18fcd30, 0x4000e77dd0}, {0x4000c5a410?, 0x0?})
	/workspace/opensearch-gateway/services/os_client.go:314 +0x44
github.com/Opster/opensearch-k8s-operator/opensearch-operator/opensearch-gateway/services.PolicyExists({0x18fcd30?, 0x4000e77dd0?}, 0x4001436700?, {0x4000c5a410?, 0x7?})
	/workspace/opensearch-gateway/services/os_ism_service.go:31 +0x4c
github.com/Opster/opensearch-k8s-operator/opensearch-operator/pkg/reconcilers.(*IsmPolicyReconciler).Reconcile(0x40008d5d00)
	/workspace/pkg/reconcilers/ismpolicy.go:159 +0x72c
github.com/Opster/opensearch-k8s-operator/opensearch-operator/controllers.(*OpensearchISMPolicyReconciler).Reconcile(0x400051abe0, {0x18fcd30, 0x4000e77dd0}, {{{0x4001558638, 0x4}, {0x4001558640, 0xd}}})
	/workspace/controllers/opensearchism_controller.go:53 +0x2ec
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x18fcd30?, {0x18fcd30?, 0x4000e77dd0?}, {{{0x4001558638?, 0x1348fc0?}, {0x4001558640?, 0x4000677e08?}}})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118 +0x8c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x400028a640, {0x18fcd68, 0x400051b630}, {0x149b600, 0x40002689e0})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314 +0x294
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0x400028a640, {0x18fcd68, 0x400051b630})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265 +0x198
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226 +0x74
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 129
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222 +0x404

The operator pod will crash several times and then continue running.

How can one reproduce the bug?

  1. Install the operator
helm install opensearch-operator opensearch-operator/opensearch-operator --version 2.6.0 -n test
  1. Create an OpenSearch cluster using kubectl apply. This is the cluster definition I used
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: test
spec:
  general:
    serviceName: my-first-cluster
    version: 2.11.1
  dashboards:
    enable: false
    version: 2.11.1
    replicas: 0
  nodePools:
    - component: nodes
      replicas: 3
      diskSize: "5Gi"
      nodeSelector:
      resources:
         requests:
            memory: "1Gi"
            cpu: "500m"
         limits:
            memory: "1Gi"
            cpu: "500m"
      roles:
        - "cluster_manager"
        - "data"
  1. Apply the following ISM policy using kubectl apply
apiVersion: opensearch.opster.io/v1
kind: OpenSearchISMPolicy
metadata:
   name: sample-policy
   namespace: test
spec:
   opensearchCluster:
      name: my-first-cluster
   description: Sample policy
   policyId: sample-policy
   defaultState: hot
   states:
      - name: hot
        actions:
           - replicaCount:
                numberOfReplicas: 4
        transitions:
           - stateName: warm
             conditions:
                minIndexAge: "10d"
      - name: warm
        actions:
           - replicaCount:
                numberOfReplicas: 2
        transitions:
           - stateName: delete
             conditions:
                minIndexAge: "30d"
      - name: delete
        actions:
           - delete: {}

At this point, the operator pod would exit with an error

What is the expected behavior?

EXpected the ISM Policy to be added without an issue

What is your host/environment?

Kubernetes 1.25
OpenSearch 2.11.1
OpenSearch operator 2.6.0

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

If I do step 2 above and wait for the OpenSearch cluster to complete getting created (i.e. the 3 nodes come to a running state and the cluster health is green) and then do step 3 (add ISM policy), the panic does not happen.
But if I do step 3 immediately after step 2, then the operator panics and crashes several times and.

However, when using deployment pipelines, we cannot control the delay between resources

@nilushancosta nilushancosta added bug Something isn't working untriaged Issues that have not yet been triaged labels May 6, 2024
@swoehrl-mw
Copy link
Collaborator

Hi @nilushancosta. Thanks for reporting this. This is clearly a bug and the operator should just wait if the cluster is not yet correctly reachable.

@swoehrl-mw swoehrl-mw removed the untriaged Issues that have not yet been triaged label May 7, 2024
swoehrl-mw pushed a commit that referenced this issue May 13, 2024
### Description
Add retry for opensearch client creation in ISM policy reconciler to fix
panic
Minor change - Remove extra whitespace in developing.md file

### Issues Resolved
Resolves
#801

### Check List
- [x] Commits are signed per the DCO using --signoff 
- [ ] Unittest added for the new/changed functionality and all unit
tests are successful
- [ ] Customer-visible features documented
- [x] No linter warnings (`make lint`)

If CRDs are changed:
- [ ] CRD YAMLs updated (`make manifests`) and also copied into the helm
chart
- [ ] Changes to CRDs documented

Please refer to the [PR
guidelines](https://github.com/opensearch-project/opensearch-k8s-operator/blob/main/docs/developing.md#submitting-a-pr)
before submitting this pull request.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Nilushan Costa <[email protected]>
swoehrl-mw pushed a commit to swoehrl-mw/opensearch-k8s-operator that referenced this issue May 16, 2024
…ensearch-project#805)

### Description
Add retry for opensearch client creation in ISM policy reconciler to fix
panic
Minor change - Remove extra whitespace in developing.md file

### Issues Resolved
Resolves
opensearch-project#801

### Check List
- [x] Commits are signed per the DCO using --signoff
- [ ] Unittest added for the new/changed functionality and all unit
tests are successful
- [ ] Customer-visible features documented
- [x] No linter warnings (`make lint`)

If CRDs are changed:
- [ ] CRD YAMLs updated (`make manifests`) and also copied into the helm
chart
- [ ] Changes to CRDs documented

Please refer to the [PR
guidelines](https://github.com/opensearch-project/opensearch-k8s-operator/blob/main/docs/developing.md#submitting-a-pr)
before submitting this pull request.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Nilushan Costa <[email protected]>
(cherry picked from commit ea46394)
swoehrl-mw pushed a commit that referenced this issue Jun 18, 2024
### Description
Add retry for opensearch client creation in ISM policy reconciler to fix
panic
Minor change - Remove extra whitespace in developing.md file

### Issues Resolved
Resolves
#801

### Check List
- [x] Commits are signed per the DCO using --signoff
- [ ] Unittest added for the new/changed functionality and all unit
tests are successful
- [ ] Customer-visible features documented
- [x] No linter warnings (`make lint`)

If CRDs are changed:
- [ ] CRD YAMLs updated (`make manifests`) and also copied into the helm
chart
- [ ] Changes to CRDs documented

Please refer to the [PR
guidelines](https://github.com/opensearch-project/opensearch-k8s-operator/blob/main/docs/developing.md#submitting-a-pr)
before submitting this pull request.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Nilushan Costa <[email protected]>
(cherry picked from commit ea46394)
swoehrl-mw pushed a commit to MaibornWolff/opensearch-operator that referenced this issue Jul 2, 2024
…ensearch-project#805)

### Description
Add retry for opensearch client creation in ISM policy reconciler to fix
panic
Minor change - Remove extra whitespace in developing.md file

### Issues Resolved
Resolves
opensearch-project#801

### Check List
- [x] Commits are signed per the DCO using --signoff
- [ ] Unittest added for the new/changed functionality and all unit
tests are successful
- [ ] Customer-visible features documented
- [x] No linter warnings (`make lint`)

If CRDs are changed:
- [ ] CRD YAMLs updated (`make manifests`) and also copied into the helm
chart
- [ ] Changes to CRDs documented

Please refer to the [PR
guidelines](https://github.com/opensearch-project/opensearch-k8s-operator/blob/main/docs/developing.md#submitting-a-pr)
before submitting this pull request.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Nilushan Costa <[email protected]>
(cherry picked from commit ea46394)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants