-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sled-agent] PUT zones doesn't create datasets #7006
base: main
Are you sure you want to change the base?
Conversation
.storage | ||
.upsert_filesystem(dataset_id, dataset_name) | ||
.await?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#6991 notes
(Or, at least, replacing it with a check that these datasets exist)
I think we should definitely do that; this should immediately reject the request if it knows up front that not all the zones can be started due to missing datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm on board with that, mostly just wanted to get a quick PR up to see if this would pass RSS and bring the system up, as it should.
Happy to make that change before getting this out of draft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 2b73938
@@ -5039,6 +5101,18 @@ mod illumos_tests { | |||
.await | |||
.expect("Failed to ensure disks"); | |||
assert!(!result.has_error(), "{:?}", result); | |||
let result = harness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary now because we want to read the dataset ledger for our service tests, even if it's empty.
@@ -914,7 +915,11 @@ impl StorageManager { | |||
let result = self.datasets_ensure_internal(&log, &config).await; | |||
|
|||
let ledger_data = ledger.data_mut(); | |||
if *ledger_data == config { | |||
if had_old_ledger && *ledger_data == config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is only tangentially related to this PR, but it's necessary for some tests.
Because we're reading the ledger of datasets when we provision zones, we were failing in tests that had not created a ledger of datasets.
To mitigate: when we're creating the "empty dataset ledger", we want to still write it to our simulated M.2s.
However, in the old version of this code, as an optimization, we would skip the re-write to the M.2 if nothing had changed. Unfortunately, if there was no prior ledger, we would initialize the ledger data with a "Default value", meaning that it wasn't actually possible to write the "default value" of the ledger to storage.
(This hadn't been an issue before, because our code typically has datasets it cares about inserting when this method gets called).
Anyway, to mitigate, I'm now skipping the ledger re-write if and only if: (1) There was an old ledger, and (2) it hasn't changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probably obscure enough it warrants a (short) comment, and might be worth inverting the boolean to make it easier to understand? Something like
// Commit the requested ledger if we don't have one on disk or the
// one on disk is now out of date.
if !had_old_ledger || *ledger_data != config {
*ledger_data = config;
ledger.commit().await?;
}
Ok(result)
@@ -914,7 +915,11 @@ impl StorageManager { | |||
let result = self.datasets_ensure_internal(&log, &config).await; | |||
|
|||
let ledger_data = ledger.data_mut(); | |||
if *ledger_data == config { | |||
if had_old_ledger && *ledger_data == config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probably obscure enough it warrants a (short) comment, and might be worth inverting the boolean to make it easier to understand? Something like
// Commit the requested ledger if we don't have one on disk or the
// one on disk is now out of date.
if !had_old_ledger || *ledger_data != config {
*ledger_data = config;
ledger.commit().await?;
}
Ok(result)
@@ -1181,7 +1187,7 @@ impl StorageManager { | |||
self.omicron_physical_disks_ensure_internal(&log, &config).await?; | |||
|
|||
let ledger_data = ledger.data_mut(); | |||
if *ledger_data == config { | |||
if had_old_ledger && *ledger_data == config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note about commenting and maybe inverting the boolean
|
||
// These datasets are configured to be part of the control plane. | ||
let datasets_config = self.inner.storage.datasets_config_list().await?; | ||
let existing_datasets: HashSet<_> = datasets_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit - maybe BTreeSet
here so if a client sends the same request multiple times, the error will order the missing datasets the same way each time? (Might require line 3490 to be a BTreeSet too, which seems fine.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I commented on the same thing but on line 3490, haha. :)
} | ||
|
||
if let Some(pool) = zone.filesystem_pool.as_ref() { | ||
requested_datasets.insert(DatasetName::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a helper on zone
? I'm wondering if in the future we'll store more than just the zpool in zone
, at which point it would be nice to know clients of zone
aren't constructing more detailed dataset info based on just the zpool.
Storage(#[from] sled_storage::error::Error), | ||
|
||
#[error("Missing datasets: {datasets:?}")] | ||
MissingDatasets { datasets: HashSet<DatasetName> }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the contents of the set will be printed out (e.g. in logs), perhaps it should be a BTreeSet
rather than a HashSet
, for consistent ordering? If I see a logged error like "missing datasets: {a, b, c}", it seems like it could be desirable to be able to grep for other occurrences of the same missing datasets error, which is difficult when the iteration order isn't stable...also, if they're always alphabetical, it's easier to write grep
invocations or other regexes where you match any missing datasets error that includes at least some prefix/suffix of datasets.
Probably not a _huge_deal, but I figured I'd mention it.
|
||
// These datasets are configured to be part of the control plane. | ||
let datasets_config = self.inner.storage.datasets_config_list().await?; | ||
let existing_datasets: HashSet<_> = datasets_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I commented on the same thing but on line 3490, haha. :)
Calls to
PUT
zones will now check that datasets exist -- as they should be initialized, with prior calls -- rather thanforcefully creating them.
This PR checks that datasets exist in the "configuration", rather than querying inventory.
Fixes #6991