Skip to content

Commit

Permalink
nexus: use all CockroachDB hosts from DNS to create DB connection URL. (
Browse files Browse the repository at this point in the history
#3783)

First pass at #3763 for crdb.

Even though we did query internal DNS, we were previously using only a
single host as part of connecting to crdb from Nexus. And since the
internal DNS server always returns records in the same order, that meant
every Nexus instance was always using the same CockroachDB instance even
now that we've been provisioning multiple. This also meant if that CRDB
instance went down we'd be hosed (as seen in #3763).

To help with that, this PR changes Nexus to use all the CRDB hosts
reported via Internal DNS when creating the connection URL. There are
some comments in the code, but this still not quite as robust as we
could be, but short of something cueball-like it's still an improvement.

To test I disabled the initial crdb nexus connected to and it was able
to recover by connecting to the next crdb instance and continue serving
requests. From the log we can see a successful query, connection errors
once i disabled `fd00:1122:3344:101::5`, and then a successful query
with connection reestablished to next crdb instance
(`fd00:1122:3344:101::3`):
```
23:43:24.729Z DEBG 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): authorize result
    action = Query
    actor = Some(Actor::UserBuiltin { user_builtin_id: 001de000-05e4-4000-8000-000000000003, .. })
    resource = Database
    result = Ok(())
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.730Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.730Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:30.803Z DEBG 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): roles
    roles = RoleSet { roles: {} }
23:43:30.804Z DEBG 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): authorize result
    action = Query
    actor = Some(Actor::UserBuiltin { user_builtin_id: 001de000-05e4-4000-8000-000000000003, .. })
    resource = Database
    result = Ok(())
```
  • Loading branch information
luqmana authored Jul 31, 2023
1 parent e53de82 commit a39a1a9
Showing 1 changed file with 16 additions and 6 deletions.
22 changes: 16 additions & 6 deletions nexus/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -168,16 +168,21 @@ impl ServerContext {
nexus_config::Database::FromUrl { url } => url.clone(),
nexus_config::Database::FromDns => {
info!(log, "Accessing DB url from DNS");
let address = loop {
// It's been requested but unfortunately not supported to directly
// connect using SRV based lookup.
// TODO-robustness: the set of cockroachdb hosts we'll use will be
// fixed to whatever we got back from DNS at Nexus start. This means
// a new cockroachdb instance won't picked up until Nexus restarts.
let addrs = loop {
match resolver
.lookup_socket_v6(ServiceName::Cockroach)
.lookup_all_socket_v6(ServiceName::Cockroach)
.await
{
Ok(address) => break address,
Ok(addrs) => break addrs,
Err(e) => {
warn!(
log,
"Failed to lookup cockroach address: {e}"
"Failed to lookup cockroach addresses: {e}"
);
tokio::time::sleep(std::time::Duration::from_secs(
1,
Expand All @@ -186,9 +191,14 @@ impl ServerContext {
}
}
};
info!(log, "DB address: {}", address);
let addrs_str = addrs
.iter()
.map(ToString::to_string)
.collect::<Vec<_>>()
.join(",");
info!(log, "DB addresses: {}", addrs_str);
PostgresConfigWithUrl::from_str(&format!(
"postgresql://root@{address}/omicron?sslmode=disable",
"postgresql://root@{addrs_str}/omicron?sslmode=disable",
))
.map_err(|e| format!("Cannot parse Postgres URL: {}", e))?
}
Expand Down

0 comments on commit a39a1a9

Please sign in to comment.