[reconfigurator] introduce a simulator, make CLI use it #7022

sunshowers · 2024-11-09T01:18:58Z

For randomized testing, it's useful to have a higher-level notion of a
succession of system states, which can be inspected, rewound, and other
branches explored as desired. Introduce this via a new Simulator struct.

The individual states are roughly identical to the existing
ReconfiguratorSim in the CLI, and are a pretty standard tree structure. For
now I've chosen not to use a Merkle tree, just a UUID-based tree, but if
there's a good use case for it we may choose to make it a Merkle tree in the
future.

I've added a few basic features like storing the log of changes (though it's
not complete... I think I'll want to make SystemDescription follow the same
structure with read-only and mutable versions.)

As a proof of concept, this PR also rebuilds the current reconfigurator-cli
script on top of the simulator. I've not made any functional changes to the
CLI, but hope to add features like looking at the tree of states and undoing
them, and with non-deterministic testing to be able to load up the tree of
states.

The simulator uses a bunch of structural sharing via Arc. I was wondering
whether I could use the im crate, but it doesn't seem to have an IndexMap
equivalent. The Arcs should make cloning cheap enough that I don't think it's
an issue in any case. (There's a comment in the PR about why I haven't used an
object-store-like style instead -- it's just easier to use Arc for now.)

There's a bunch of work to be done before this can be landed, but I wanted to
put it up for folks to have an early look.

I've left a few TODOs in for future work. I don't think any of them block this
PR.

Created using spr 1.3.6-beta.1

jgallagher

Looks good to me! Just a few nits / questions.

jgallagher · 2024-11-11T21:39:12Z

dev-tools/reconfigurator-cli/src/main.rs

+    let mut state = sim.current_state().to_mut();
+    let sled_id = add.sled_id.unwrap_or_else(|| state.rng_mut().next_sled_id());
+    let new_sled = SledBuilder::new().id(sled_id);
+    state.system_mut().description_mut().sled(new_sled)?;


Naming nit - should this be add_sled()? I'd assume fn sled() would return a sled.

This is the old name which I haven't changed in this PR -- I agree with you, we need to make a pass to make all the names indicate setters.

nexus/reconfigurator/simulation/src/sim.rs

jgallagher · 2024-11-12T16:01:52Z

nexus/reconfigurator/simulation/src/sim.rs

+    /// Get the state for the given UUID.
+    pub fn get_state(&self, id: ReconfiguratorSimUuid) -> Option<&SimState> {
+        if id == Self::ROOT_ID {
+            return Some(&self.root_state);


This seems fine, but out of curiosity - why not insert the root state into self.states?

Seemed cleaner to me that way -- it could certainly live in self.states instead of being its own member but I guess I just prefer this personally.

jgallagher · 2024-11-12T16:12:49Z

nexus/reconfigurator/simulation/src/system.rs

+        self.internal_dns
+            .get(&generation)
+            .map(|d| &**d)
+            .ok_or_else(|| KeyError::internal_dns(generation))


I don't really have a preference, but get_{collection,blueprint} use a match and get_{internal,external}_dns use .map().ok_or_else() to do the same thing, I think? Probably would be less surprising if they all used the same style.

Agreed, changed everything to match.

jgallagher · 2024-11-12T16:16:08Z

nexus/reconfigurator/simulation/src/system.rs

+        let generation = params.generation;
+        match self.system.internal_dns.entry(generation) {
+            std::collections::btree_map::Entry::Vacant(entry) => {
+                entry.insert(params);


Should the internal/external DNS operations get entries in self.log?

Ah so there's a distinction between internal and external callers here -- external callers add to logs while internal callers don't. But I realized the code wasn't clear about this at all. I've refactored it into having two layers in two separate structs: the external public API layer which does logging, and the internal layer which does not.

jgallagher · 2024-11-12T16:18:59Z

nexus/reconfigurator/simulation/src/state.rs

+                    }
+                });
+
+            // XXX: Should this error ever happen? The only case where it


I'd maybe move this comment down into the match? On first read I wasn't sure what error it was talking about, since there's no ? and the method is documented as infallible.

Good idea! I've done that and combined it with the existing comment below.

andrewjstone

This is awesome @sunshowers! I Would love to walk through it a bit live and chat about goals and design.

andrewjstone · 2024-11-12T00:24:50Z

nexus/reconfigurator/simulation/src/sim.rs

+///
+/// # Implementation notes
+///
+/// We currently index by UUIDs, but we could index by the hash of the contents


I don't think this is that important right now. One use case I can think of for merkle trees is if we wanted to take two runs and compare them to find a point of divergence. However, that would require using the same seed anyway, which wouldn't make much sense. And for a single run, when you "rewind" and explore other states you can just record that point to go back to via uuid.

nexus/reconfigurator/simulation/src/sim.rs

nexus/reconfigurator/simulation/src/config.rs

nexus/reconfigurator/simulation/src/lib.rs

andrewjstone · 2024-11-12T16:06:03Z

nexus/reconfigurator/simulation/src/state.rs

+    // TODO: Should this be its own type to avoid confusion with other
+    // Generation instances? Generation numbers start from 1, but in our case 0
+    // provides a better user experience.
+    generation: Generation,


I'd probably change this to Version to avoid confusion.

nexus/reconfigurator/simulation/src/system.rs

andrewjstone · 2024-11-12T16:21:57Z

nexus/reconfigurator/simulation/src/state.rs

+            rng,
+            log,
+        };
+        sim.add_state(Arc::new(state));


There are a bunch of comments around why things look this way, but I'm still not sold that it's the right way to go. For example, there's a requirement that sim.next_sim_uuid() is called before sim.add_state(), but this cannot be checked at compile time. I think rather than passing in a Simulator to this method, the Simulator should have a method to return a mutable reference to the "next" SimStateBuilder, or even better, take a closure that can generate the next SimState when passed an &SimStateBuilder. Then when that closure returns the committed state can be added by the simulator.

In short, I'd like this all to be "driven" via the Simulator without having to pass in the simulator.

It's totally possible I'm missing something here, but I think there may be room to simplify the API a bit and would like to discuss more.

As we discussed, I would prefer to keep it this way because it means that if the caller decides to throw away an intermediate state, they can easily do so. This is the general model I've seen other tools follow, and while it's slightly looser at compile time I think the flexibility is worth it.

There are some dependencies that can't easily be checked at compile-time, but only one that's externally visible: the simulator passed in must be the one the state originated from. I don't think this is a major burden.

For example, there's a requirement that sim.next_sim_uuid() is called before sim.add_state(), but this cannot be checked at compile time.

This is true, but it's not externally visible -- this is managed purely within the reconfigurator-sim code.

andrewjstone

Chatted offline with @sunshowers. All my comments are minor. More than happy to see this merged and build upon it!

Created using spr 1.3.6-beta.1

sunshowers · 2024-11-14T06:59:36Z

Accidentally broke main -- see #7068 and #7069.

[spr] initial version

fc57f20

Created using spr 1.3.6-beta.1

sunshowers mentioned this pull request Nov 9, 2024

[WIP] [reconfigurator] introduce a simulator, make CLI use it #6856

Closed

sunshowers requested review from andrewjstone, smklein and jgallagher November 11, 2024 19:20

jgallagher approved these changes Nov 12, 2024

View reviewed changes

andrewjstone reviewed Nov 12, 2024

View reviewed changes

andrewjstone approved these changes Nov 12, 2024

View reviewed changes

sunshowers added 3 commits November 14, 2024 03:47

rebase on main

e6f33f8

Created using spr 1.3.6-beta.1

cleaner output

84ff94c

Created using spr 1.3.6-beta.1

trailing whitespace

2dc6988

Created using spr 1.3.6-beta.1

sunshowers enabled auto-merge (squash) November 14, 2024 04:00

sunshowers mentioned this pull request Nov 14, 2024

fix Cargo.lock merge #7064

Merged

sunshowers merged commit 38cb554 into main Nov 14, 2024
16 of 18 checks passed

sunshowers deleted the sunshowers/spr/reconfigurator-introduce-a-simulator-make-cli-use-it branch November 14, 2024 06:36

This was referenced Nov 14, 2024

ensure clippy (helios) and docs jobs are land-blocking #7069

Open

fix doc breakage in CI #7068

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[reconfigurator] introduce a simulator, make CLI use it #7022

[reconfigurator] introduce a simulator, make CLI use it #7022

sunshowers commented Nov 9, 2024 •

edited

Loading

jgallagher left a comment

jgallagher Nov 11, 2024

sunshowers Nov 14, 2024

jgallagher Nov 12, 2024

sunshowers Nov 14, 2024 •

edited

Loading

jgallagher Nov 12, 2024

sunshowers Nov 14, 2024

jgallagher Nov 12, 2024

sunshowers Nov 14, 2024 •

edited

Loading

jgallagher Nov 12, 2024

sunshowers Nov 14, 2024

andrewjstone left a comment

andrewjstone Nov 12, 2024

andrewjstone Nov 12, 2024

andrewjstone Nov 12, 2024

sunshowers Nov 14, 2024

andrewjstone left a comment

sunshowers commented Nov 14, 2024

[reconfigurator] introduce a simulator, make CLI use it #7022

[reconfigurator] introduce a simulator, make CLI use it #7022

Conversation

sunshowers commented Nov 9, 2024 • edited Loading

jgallagher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunshowers Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunshowers Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewjstone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewjstone left a comment

Choose a reason for hiding this comment

sunshowers commented Nov 14, 2024

sunshowers commented Nov 9, 2024 •

edited

Loading

sunshowers Nov 14, 2024 •

edited

Loading

sunshowers Nov 14, 2024 •

edited

Loading