Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sled agent] Zone initialization causes maghemite advertisement, but nothing stops this on zone teardown #7377

Open
smklein opened this issue Jan 21, 2025 · 5 comments
Labels
bug Something that isn't working. networking Related to the networking. Sled Agent Related to the Per-Sled Configuration and Management

Comments

@smklein
Copy link
Collaborator

smklein commented Jan 21, 2025

This issue is spun off of #7373

When Sled Agent initializes an Internal DNS zone, it calls the following:

// If this address is in a new ipv6 prefix, notify
// maghemite so it can advertise it to other sleds.
self.advertise_prefix_of_address(*gz_address).await;

This is explained in a block comment here:

// Internal DNS zones require a special route through
// the global zone, since they are not on the same part
// of the underlay as most other services on this sled
// (the sled's subnet).

This advertisement through maghemite allows other sleds to find a route to the DNS zone, through this sled's GZ, but it's missing an important piece: nothing ever instructs the sled to stop advertising this prefix.

This is problematic during zone expungement. Namely, even if we instruct a sled to stop running internal DNS, the prefix is still advertised.

This explains some of the behavior @jgallagher observed here: #7373 (comment)

@smklein smklein added bug Something that isn't working. Sled Agent Related to the Per-Sled Configuration and Management labels Jan 21, 2025
@smklein
Copy link
Collaborator Author

smklein commented Jan 21, 2025

Although maghemite exposes an API to advertise prefixes:

https://github.com/oxidecomputer/maghemite/blob/e336b69a6b9b88a2ef067edb03d280a8fcfde3dc/ddm/src/admin.rs#L188-L192

and also to withdraw prefixes:

https://github.com/oxidecomputer/maghemite/blob/e336b69a6b9b88a2ef067edb03d280a8fcfde3dc/ddm/src/admin.rs#L260-L294

The Omicron wrapper around ddm-admin-client invokes advertise_prefixes with the single address passed to advertise_prefix (which is the GZ address of the DNS zone, mentioned in the block comment above).

This wrapper suffers from the issues mentioned in #7378 - we probably want to be more cautious about this if we want Sled Agent to be in charge of reconciling prefixes, to withdraw unwanted prefixes and add ones that should be added.

@smklein smklein added the networking Related to the networking. label Jan 21, 2025
@davepacheco
Copy link
Collaborator

On a call today we discussed a few options for dealing with this.

  1. The obvious solution is to delete the advertisement when we tear down the zone. This would work in the happy path. But the challenge here is that we might not always go through the normal zone teardown process. Suppose sled agent crashes after having written a new OmicronZonesConfig ledger but before having torn down the zone. On restart, sled agent tears down everything and starts up the configured zones. Would it know in this case to delete the advertisement, for a zone that it doesn't have in its configuration any more?
  2. Centralize the place in Sled Agent where at least the internal DNS advertisements are managed and turn it into a reconciler loop. So the code path Sean mentioned above would just ping some other tokio task that would wake up, see all the internal DNS prefixes that are supposed to be advertised, compare that with what's in Maghemite, and add/remove whatever it needs to. Note there should be no concurrency issue with this read-modify-write because there's only one sled agent on the system. This was the preferred approach, I think. We left it undecided whether the task also owns the advertisements for bootstrap and underlay prefixes. It could, and that might be simpler to reason about. But it would also be a more invasive change and it might couple these code paths unnecessarily. (Are there contexts where we might we try to adjust one and not want to try to resolve the other?)
  3. Do (1) plus on startup have Sled Agent delete all the internal DNS prefixes it doesn't know about. (This is kind of a lazy version of 2.)
  4. In Maghemite, put timeout on the advertisement and force sled agent to continually re-assert it.
  5. Tie the advertisement to the lifetime of something else (like the zone) that maghemite could watch. I threw this out but I think it's probably really messy.
  6. Have Reconfigurator fully own this, completely separately from sled agent. This is appealing in that it's essentially just another thing that has to happen to deploy/remove zones. On the other hand, the scope of the problem is limited to a single system: sled agent talking to its own Maghemite. It's not necessary to get Reconfigurator involved here, it would be a pretty big architectural change, and we'd need another solution for rack cold start since these advertisements are needed for that but Reconfigurator isn't running yet. Also, if the sled restarts, Reconfigurator would have to know to update the sled's own Maghemite.

@davepacheco
Copy link
Collaborator

In terms of prioritization:

  1. Affected use cases: This is a problem any time we expunge an internal DNS zone (not the whole sled) and then put its replacement onto another sled. There are three contexts I can think of this being a problem:
    • if we need to expunge a disk containing an internal DNS zone
    • if we're doing an online upgrade
    • in general, expunging a zone should work for testing purposes
  2. Impact/severity: The impact is that either one internal DNS IP is not working or (worse) it can be serving incorrect results. This could be almost arbitrarily bad, although in practice it'd likely just result in some process being stuck (like in Blueprint execution stuck on sled-agent failed requests #7373) and most of the time that would involve another bug (e.g., not behaving correctly when one DNS server is offline).
  3. Workaround: If we got a support call for this, once we'd root-caused the problem (which might not be easy if someone doesn't know to look for this), the problem should be resolvable by manually deleting the advertisement.

This does not affect sled expungement.

@smklein
Copy link
Collaborator Author

smklein commented Jan 21, 2025

See also: oxidecomputer/maghemite#432, for possible improvements to the maghemite API to make the sled agent reconciler loop easier to write.

@davepacheco
Copy link
Collaborator

Ah, right. A related idea we discussed was having Maghemite accept a list of prefixes, possibly scoped to a tag as 432 suggests, to avoid the read-modify-write. This would be nice but doesn't change much in terms of correctness vs. putting the reconciler in sled agent since there's only one consumer so the read-modify-write works okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that isn't working. networking Related to the networking. Sled Agent Related to the Per-Sled Configuration and Management
Projects
None yet
Development

No branches or pull requests

2 participants