[sled-agent] PUT zones doesn't create datasets #7006

smklein · 2024-11-07T19:34:14Z

Calls to PUT zones will now check that datasets exist -- as they should be initialized, with prior calls -- rather than
forcefully creating them.

This PR checks that datasets exist in the "configuration", rather than querying inventory.

Fixes #6991

jgallagher · 2024-11-07T21:01:37Z

sled-agent/src/sled_agent.rs

-                .storage
-                .upsert_filesystem(dataset_id, dataset_name)
-                .await?;
-        }


#6991 notes

(Or, at least, replacing it with a check that these datasets exist)

I think we should definitely do that; this should immediately reject the request if it knows up front that not all the zones can be started due to missing datasets.

Yeah, I'm on board with that, mostly just wanted to get a quick PR up to see if this would pass RSS and bring the system up, as it should.

Happy to make that change before getting this out of draft.

Done in 2b73938

smklein · 2024-11-14T19:39:45Z

sled-agent/src/services.rs

@@ -5039,6 +5101,18 @@ mod illumos_tests {
            .await
            .expect("Failed to ensure disks");
        assert!(!result.has_error(), "{:?}", result);
+        let result = harness


This is necessary now because we want to read the dataset ledger for our service tests, even if it's empty.

smklein · 2024-11-14T19:42:16Z

sled-storage/src/manager.rs

@@ -914,7 +915,11 @@ impl StorageManager {
        let result = self.datasets_ensure_internal(&log, &config).await;

        let ledger_data = ledger.data_mut();
-        if *ledger_data == config {
+        if had_old_ledger && *ledger_data == config {


This change is only tangentially related to this PR, but it's necessary for some tests.

Because we're reading the ledger of datasets when we provision zones, we were failing in tests that had not created a ledger of datasets.

To mitigate: when we're creating the "empty dataset ledger", we want to still write it to our simulated M.2s.

However, in the old version of this code, as an optimization, we would skip the re-write to the M.2 if nothing had changed. Unfortunately, if there was no prior ledger, we would initialize the ledger data with a "Default value", meaning that it wasn't actually possible to write the "default value" of the ledger to storage.

(This hadn't been an issue before, because our code typically has datasets it cares about inserting when this method gets called).

Anyway, to mitigate, I'm now skipping the ledger re-write if and only if: (1) There was an old ledger, and (2) it hasn't changed

I think this is probably obscure enough it warrants a (short) comment, and might be worth inverting the boolean to make it easier to understand? Something like

// Commit the requested ledger if we don't have one on disk or the // one on disk is now out of date. if !had_old_ledger || *ledger_data != config { *ledger_data = config; ledger.commit().await?; } Ok(result)

jgallagher · 2024-11-15T15:22:16Z

sled-storage/src/manager.rs

@@ -914,7 +915,11 @@ impl StorageManager {
        let result = self.datasets_ensure_internal(&log, &config).await;

        let ledger_data = ledger.data_mut();
-        if *ledger_data == config {
+        if had_old_ledger && *ledger_data == config {


I think this is probably obscure enough it warrants a (short) comment, and might be worth inverting the boolean to make it easier to understand? Something like

// Commit the requested ledger if we don't have one on disk or the // one on disk is now out of date. if !had_old_ledger || *ledger_data != config { *ledger_data = config; ledger.commit().await?; } Ok(result)

jgallagher · 2024-11-15T15:22:49Z

sled-storage/src/manager.rs

@@ -1181,7 +1187,7 @@ impl StorageManager {
            self.omicron_physical_disks_ensure_internal(&log, &config).await?;

        let ledger_data = ledger.data_mut();
-        if *ledger_data == config {
+        if had_old_ledger && *ledger_data == config {


Same note about commenting and maybe inverting the boolean

jgallagher · 2024-11-15T15:24:08Z

sled-agent/src/services.rs

+
+        // These datasets are configured to be part of the control plane.
+        let datasets_config = self.inner.storage.datasets_config_list().await?;
+        let existing_datasets: HashSet<_> = datasets_config


Tiny nit - maybe BTreeSet here so if a client sends the same request multiple times, the error will order the missing datasets the same way each time? (Might require line 3490 to be a BTreeSet too, which seems fine.)

Ah, I commented on the same thing but on line 3490, haha. :)

jgallagher · 2024-11-15T15:25:59Z

sled-agent/src/services.rs

+            }
+
+            if let Some(pool) = zone.filesystem_pool.as_ref() {
+                requested_datasets.insert(DatasetName::new(


Should this be a helper on zone? I'm wondering if in the future we'll store more than just the zpool in zone, at which point it would be nice to know clients of zone aren't constructing more detailed dataset info based on just the zpool.

hawkw · 2024-11-15T18:24:20Z

sled-agent/src/services.rs

+    Storage(#[from] sled_storage::error::Error),
+
+    #[error("Missing datasets: {datasets:?}")]
+    MissingDatasets { datasets: HashSet<DatasetName> },


Since the contents of the set will be printed out (e.g. in logs), perhaps it should be a BTreeSet rather than a HashSet, for consistent ordering? If I see a logged error like "missing datasets: {a, b, c}", it seems like it could be desirable to be able to grep for other occurrences of the same missing datasets error, which is difficult when the iteration order isn't stable...also, if they're always alphabetical, it's easier to write grep invocations or other regexes where you match any missing datasets error that includes at least some prefix/suffix of datasets.

Probably not a _huge_deal, but I figured I'd mention it.

hawkw · 2024-11-15T18:25:13Z

sled-agent/src/services.rs

+
+        // These datasets are configured to be part of the control plane.
+        let datasets_config = self.inner.storage.datasets_config_list().await?;
+        let existing_datasets: HashSet<_> = datasets_config


Ah, I commented on the same thing but on line 3490, haha. :)

smklein added 10 commits November 4, 2024 18:06

Use local properties when querying oxide values

73bd549

Better dataset population during RSS

2f448f0

adding debug, zone datasets too

a7c6f1b

Merge branch 'main' into rss-datasets

a62b283

Add tests for plan creation, ensuring datasets are populated

9a9fe45

Merge branch 'main' into rss-datasets

5a92eac

Stop creating datasets during omicron_zones_ensure

ce7ec38

Merge branch 'main' into rss-datasets

2aa4a0e

Merge branch 'rss-datasets' into put-zones-doesnt-put-datasets

0fb92ed

Unused import

ebdad20

jgallagher reviewed Nov 7, 2024

View reviewed changes

smklein added 11 commits November 7, 2024 15:37

Merge branch 'main' into rss-datasets

6506ce0

merge

b100385

Merge branch 'rss-datasets' into put-zones-doesnt-put-datasets

9ef56f3

Checking

2b73938

Patch comments

920ecff

fmt

784e377

Merge branch 'rss-datasets' into put-zones-doesnt-put-datasets

d1a6bd0

Merge branch 'main' into rss-datasets

96bf17f

Merge branch 'rss-datasets' into put-zones-doesnt-put-datasets

70040af

Bail on generation number conflict

6f6a87c

Merge branch 'rss-datasets' into put-zones-doesnt-put-datasets

9989a4b

smklein marked this pull request as ready for review November 8, 2024 22:04

smklein added 5 commits November 8, 2024 14:39

More consistent usage of Gzip-9 for debug dataset

084d6f8

Merge branch 'main' into rss-datasets

03ea021

Merge branch 'rss-datasets' into put-zones-doesnt-put-datasets

264afd0

EXPECTORATE

1ae6ba7

Merge branch 'rss-datasets' into put-zones-doesnt-put-datasets

afefc2e

Base automatically changed from rss-datasets to main November 12, 2024 18:23

Merge branch 'main' into rss-datasets

754eae8

smklein added 2 commits November 13, 2024 16:42

Merge branch 'rss-datasets' into put-zones-doesnt-put-datasets

dd73204

Create empty ledger for service tests that want to read it

5514606

smklein commented Nov 14, 2024

View reviewed changes

jgallagher approved these changes Nov 15, 2024

View reviewed changes

hawkw reviewed Nov 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sled-agent] PUT zones doesn't create datasets #7006

[sled-agent] PUT zones doesn't create datasets #7006

smklein commented Nov 7, 2024 •

edited

Loading

jgallagher Nov 7, 2024

smklein Nov 7, 2024

smklein Nov 8, 2024

smklein Nov 14, 2024

smklein Nov 14, 2024

jgallagher Nov 15, 2024

jgallagher Nov 15, 2024

jgallagher Nov 15, 2024

jgallagher Nov 15, 2024

hawkw Nov 15, 2024

jgallagher Nov 15, 2024

hawkw Nov 15, 2024

hawkw Nov 15, 2024

[sled-agent] PUT zones doesn't create datasets #7006

Are you sure you want to change the base?

[sled-agent] PUT zones doesn't create datasets #7006

Conversation

smklein commented Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein commented Nov 7, 2024 •

edited

Loading