Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reconfigurator] introduce a simulator, make CLI use it #7022

Conversation

sunshowers
Copy link
Contributor

@sunshowers sunshowers commented Nov 9, 2024

For randomized testing, it's useful to have a higher-level notion of a
succession of system states, which can be inspected, rewound, and other
branches explored as desired. Introduce this via a new Simulator struct.

The individual states are roughly identical to the existing
ReconfiguratorSim in the CLI, and are a pretty standard tree structure. For
now I've chosen not to use a Merkle tree, just a UUID-based tree, but if
there's a good use case for it we may choose to make it a Merkle tree in the
future.

I've added a few basic features like storing the log of changes (though it's
not complete... I think I'll want to make SystemDescription follow the same
structure with read-only and mutable versions.)

As a proof of concept, this PR also rebuilds the current reconfigurator-cli
script on top of the simulator. I've not made any functional changes to the
CLI, but hope to add features like looking at the tree of states and undoing
them, and with non-deterministic testing to be able to load up the tree of
states.

The simulator uses a bunch of structural sharing via Arc. I was wondering
whether I could use the im crate, but it doesn't seem to have an IndexMap
equivalent. The Arcs should make cloning cheap enough that I don't think it's
an issue in any case. (There's a comment in the PR about why I haven't used an
object-store-like style instead -- it's just easier to use Arc for now.)

There's a bunch of work to be done before this can be landed, but I wanted to
put it up for folks to have an early look.

I've left a few TODOs in for future work. I don't think any of them block this
PR.

Created using spr 1.3.6-beta.1
Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just a few nits / questions.

let mut state = sim.current_state().to_mut();
let sled_id = add.sled_id.unwrap_or_else(|| state.rng_mut().next_sled_id());
let new_sled = SledBuilder::new().id(sled_id);
state.system_mut().description_mut().sled(new_sled)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nit - should this be add_sled()? I'd assume fn sled() would return a sled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the old name which I haven't changed in this PR -- I agree with you, we need to make a pass to make all the names indicate setters.

nexus/reconfigurator/simulation/src/sim.rs Outdated Show resolved Hide resolved
/// Get the state for the given UUID.
pub fn get_state(&self, id: ReconfiguratorSimUuid) -> Option<&SimState> {
if id == Self::ROOT_ID {
return Some(&self.root_state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine, but out of curiosity - why not insert the root state into self.states?

Copy link
Contributor Author

@sunshowers sunshowers Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seemed cleaner to me that way -- it could certainly live in self.states instead of being its own member but I guess I just prefer this personally.

self.internal_dns
.get(&generation)
.map(|d| &**d)
.ok_or_else(|| KeyError::internal_dns(generation))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have a preference, but get_{collection,blueprint} use a match and get_{internal,external}_dns use .map().ok_or_else() to do the same thing, I think? Probably would be less surprising if they all used the same style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, changed everything to match.

let generation = params.generation;
match self.system.internal_dns.entry(generation) {
std::collections::btree_map::Entry::Vacant(entry) => {
entry.insert(params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the internal/external DNS operations get entries in self.log?

Copy link
Contributor Author

@sunshowers sunshowers Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so there's a distinction between internal and external callers here -- external callers add to logs while internal callers don't. But I realized the code wasn't clear about this at all. I've refactored it into having two layers in two separate structs: the external public API layer which does logging, and the internal layer which does not.

}
});

// XXX: Should this error ever happen? The only case where it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd maybe move this comment down into the match? On first read I wasn't sure what error it was talking about, since there's no ? and the method is documented as infallible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I've done that and combined it with the existing comment below.

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome @sunshowers! I Would love to walk through it a bit live and chat about goals and design.

///
/// # Implementation notes
///
/// We currently index by UUIDs, but we could index by the hash of the contents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is that important right now. One use case I can think of for merkle trees is if we wanted to take two runs and compare them to find a point of divergence. However, that would require using the same seed anyway, which wouldn't make much sense. And for a single run, when you "rewind" and explore other states you can just record that point to go back to via uuid.

nexus/reconfigurator/simulation/src/sim.rs Show resolved Hide resolved
nexus/reconfigurator/simulation/src/config.rs Show resolved Hide resolved
nexus/reconfigurator/simulation/src/lib.rs Show resolved Hide resolved
// TODO: Should this be its own type to avoid confusion with other
// Generation instances? Generation numbers start from 1, but in our case 0
// provides a better user experience.
generation: Generation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably change this to Version to avoid confusion.

nexus/reconfigurator/simulation/src/system.rs Outdated Show resolved Hide resolved
rng,
log,
};
sim.add_state(Arc::new(state));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of comments around why things look this way, but I'm still not sold that it's the right way to go. For example, there's a requirement that sim.next_sim_uuid() is called before sim.add_state(), but this cannot be checked at compile time. I think rather than passing in a Simulator to this method, the Simulator should have a method to return a mutable reference to the "next" SimStateBuilder, or even better, take a closure that can generate the next SimState when passed an &SimStateBuilder. Then when that closure returns the committed state can be added by the simulator.

In short, I'd like this all to be "driven" via the Simulator without having to pass in the simulator.

It's totally possible I'm missing something here, but I think there may be room to simplify the API a bit and would like to discuss more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, I would prefer to keep it this way because it means that if the caller decides to throw away an intermediate state, they can easily do so. This is the general model I've seen other tools follow, and while it's slightly looser at compile time I think the flexibility is worth it.

There are some dependencies that can't easily be checked at compile-time, but only one that's externally visible: the simulator passed in must be the one the state originated from. I don't think this is a major burden.

For example, there's a requirement that sim.next_sim_uuid() is called before sim.add_state(), but this cannot be checked at compile time.

This is true, but it's not externally visible -- this is managed purely within the reconfigurator-sim code.

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted offline with @sunshowers. All my comments are minor. More than happy to see this merged and build upon it!

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
@sunshowers sunshowers enabled auto-merge (squash) November 14, 2024 04:00
@sunshowers sunshowers mentioned this pull request Nov 14, 2024
@sunshowers sunshowers merged commit 38cb554 into main Nov 14, 2024
16 of 18 checks passed
@sunshowers sunshowers deleted the sunshowers/spr/reconfigurator-introduce-a-simulator-make-cli-use-it branch November 14, 2024 06:36
@sunshowers
Copy link
Contributor Author

Accidentally broke main -- see #7068 and #7069.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants