nexus: use all CockroachDB hosts from DNS to create DB connection URL. #3783
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First pass at #3763 for crdb.
Even though we did query internal DNS, we were previously using only a single host as part of connecting to crdb from Nexus. And since the internal DNS server always returns records in the same order, that meant every Nexus instance was always using the same CockroachDB instance even now that we've been provisioning multiple. This also meant if that CRDB instance went down we'd be hosed (as seen in #3763).
To help with that, this PR changes Nexus to use all the CRDB hosts reported via Internal DNS when creating the connection URL. There are some comments in the code, but this still not quite as robust as we could be, but short of something cueball-like it's still an improvement.
To test I disabled the initial crdb nexus connected to and it was able to recover by connecting to the next crdb instance and continue serving requests. From the log we can see a successful query, connection errors once i disabled
fd00:1122:3344:101::5
, and then a successful query with connection reestablished to next crdb instance (fd00:1122:3344:101::3
):