Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/mariadb-galera] helm upgrade with "tls.enabled=true,tls.autoGenerated=true" makes existing galera nodes fail to communicate #15525

Open
ledroide opened this issue Mar 16, 2023 · 3 comments
Labels
mariadb-galera on-hold Issues or Pull Requests with this label will never be considered stale tech-issues The user has a technical issue about an application

Comments

@ledroide
Copy link

ledroide commented Mar 16, 2023

Name and Version

bitnami/mariadb-galera 7.5.3

What steps will reproduce the bug?

Configuration :

tls:
  enabled: true
  autoGenerated: true

How to reproduce :

  • helm install --upgrade --values myvalues.yaml database --namespace=galera bitnami/mariadb-galera
  • use your cluster, everything is synchronized, wsrep members satus okay
  • same command as before : helm install --upgrade --values myvalues.yaml database --namespace=galera bitnami/mariadb-galera
  • the certificates change, new pods and older ones cannot communicate and sync anymore, so they crash.

A diff before upgrading shows that tls certificates are renewed even if existing already :

--- /tmp/LIVE-2635943621/v1.Secret.galera.database-crt  2023-03-16 14:24:20.740023113 +0100
+++ /tmp/MERGED-1283566051/v1.Secret.galera.database-crt        2023-03-16 14:24:20.740023113 +0100
@@ -1,8 +1,8 @@
 apiVersion: v1
 kind: Secret
 data:
-  ca.crt: '*** (before)'
-  tls.crt: '*** (before)'
-  tls.key: '*** (before)'
+  ca.crt: '*** (after)'
+  tls.crt: '*** (after)'
+  tls.key: '*** (after)'

Problem : starting members are unable to communicate with other members, then raises for new pods an Error, then CrashLoopBackOff.

Here is what I can see for any starting pod in the logs when the StatefulSet is restarting the cluster pods :

2023-03-16 09:47:56
mariadb 09:47:56.83 DEBUG ==> Setting wsrep_provider_options to ''socket.ssl_cert=/bitnami/mariadb/certs/tls.crt;socket.ssl_key=/bitnami/mariadb/certs/tls.key;socket.ssl_ca=/bitnami/mariadb/certs/ca.crt'' in mariadb configuration file /opt/bitnami/mariadb/conf/my.cnf
2023-03-16 09:47:56
mariadb 09:47:56.82 DEBUG ==> Setting ssl_key to '/bitnami/mariadb/certs/tls.key' in mariadb configuration file /opt/bitnami/mariadb/conf/my.cnf
2023-03-16 09:47:56
mariadb 09:47:56.81 DEBUG ==> Setting ssl_cert to '/bitnami/mariadb/certs/tls.crt' in mariadb configuration file /opt/bitnami/mariadb/conf/my.cnf
2023-03-16 09:42:51
2023-03-16  9:42:51 0 [Warning] WSREP: Handshake failed: tlsv1 alert unknown ca
2023-03-16 09:42:51
2023-03-16  9:42:51 0 [Warning] WSREP: Handshake failed: tlsv1 alert unknown ca
2023-03-16 09:42:50
2023-03-16  9:42:50 0 [Warning] WSREP: Handshake failed: tlsv1 alert unknown ca
2023-03-16 09:42:50
2023-03-16  9:42:50 0 [Warning] WSREP: Handshake failed: tlsv1 alert unknown ca

That means that I need two different values files - one with autoGenerated=true, one with autoGenerated=false" - depending on an existing cluster or not -> so this is not immutable, not even idempotent.

The only workaround that I have found is to scale members to 0 and then scale up - and this causes downtime, unfortunately.

Is there an option I missed that would manage this case - and not replace existing certificates, but only generate them if they do not exist?

What architecture are you using?

amd64
Kubernetes 1.26.2
helm 3.11.1

Issues seen before

Maybe related to #7071 or #8424 issues

@ledroide ledroide added the tech-issues The user has a technical issue about an application label Mar 16, 2023
@github-actions github-actions bot added the triage Triage is needed label Mar 16, 2023
@ledroide ledroide changed the title helm upgrade wuth "tls.enabled=true,tls.autoGenerated=true" makes existing galera nodes fail to communicate helm upgrade with "tls.enabled=true,tls.autoGenerated=true" makes existing galera nodes fail to communicate Mar 16, 2023
@github-actions github-actions bot added in-progress and removed triage Triage is needed labels Mar 16, 2023
@aoterolorenzo aoterolorenzo changed the title helm upgrade with "tls.enabled=true,tls.autoGenerated=true" makes existing galera nodes fail to communicate [bitnami/mariadb-galera] helm upgrade with "tls.enabled=true,tls.autoGenerated=true" makes existing galera nodes fail to communicate Mar 22, 2023
@aoterolorenzo
Copy link
Contributor

aoterolorenzo commented Mar 22, 2023

Hey @ledroide,

How about using

 tls:
  enabled: true
  autoGenerated: true

at the installation, and:

 tls:
  enabled: true
  autoGenerated: false

for the upgrades?

It is true that it doesn't seem very fancy to regenerate the existing certs at the upgrades, but I'm not sure if this is a bug or a conception issue.

@ledroide
Copy link
Author

@aoterolorenzo : That's why I was writing :

(this values.yaml configuration) is not immutable, not even idempotent

If we apply twice the same values.yaml, the first run works, the second run crashes the whole.

From the ops or c-i point of vue, it's clearly a bug regarding a common use case. There should be some check point before replacing existing certificates - automatically or triggered following a boolean variable, let's say tls.replaceTlsCertsIfExist: false

@aoterolorenzo
Copy link
Contributor

Yep, completely agree! Let me create an internal task for the team to take a deeper look and address the issue. We will reach you back here as soon as our workload allow us to work on it (no ETA could be provided I'm afraid).

Thanks for reporting!

@aoterolorenzo aoterolorenzo added the on-hold Issues or Pull Requests with this label will never be considered stale label Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mariadb-galera on-hold Issues or Pull Requests with this label will never be considered stale tech-issues The user has a technical issue about an application
Projects
None yet
Development

No branches or pull requests

3 participants