Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow instance_reconfigure to resize a stopped instance #6774

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Oct 4, 2024

mostly just adapted #6321 to the new instance update endpoint machinery. same general behavior as outlined there: can only resize stopped instances, updated instance still returns the updated InstanceAndActiveVmm. same size validation as we do for a new instance, tests checking all that as well.

Also fixes #3769.

@hawkw hawkw self-requested a review October 4, 2024 18:05
@karencfv
Copy link
Contributor

karencfv commented Oct 4, 2024

Heya! Can we hold off on merging this until v12 please? It may break the TF provider which I've already released for v11 😅
and I'll be out for 2 weeks

@iximeow iximeow added this to the 12 milestone Oct 4, 2024
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we will currently only perform updates that change both the number of vCPUs and the amount of memory, which I think is wrong. I think we should be able to change either individually.

Besides that, this seems basically fine; it would be nice to avoid the nop boot disk update but if the diesel is too scary, whatever...

Comment on lines 1226 to 1233
// Do not allow setting sizes of running instances to ensure that if an
// instance is running, its resource usage matches what we record in the
// database.
const OK_TO_SET_SIZE_STATES: &'static [InstanceState] = &[
InstanceState::NoVmm,
InstanceState::Failed,
InstanceState::Creating,
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that OK_TO_SET_SIZE_STATES and OK_TO_SET_BOOT_DISK_STATES are the same list of states, and what we really want in both cases is "states in which there is no VMM actively incarnating this instance". I kind of wonder if, rather than duplicating in both the instance-resize and instance-set-boot-disk queries, we should just have a constant on InstanceState that's like NOT_INCARNATED_STATES or similar, so that these queries and any future ones can all use the same set of states, and if we add new states that have "not incarnated" semantics, we can add them to the one definition of that list of states, instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on one hand i agree (and enough to go just do the thing, there's now InstanceState::NO_VMM_STATES*), but from the conversations we've had we've been pretty close to having caveouts for specific not-incarnated states and fields. namely "is it OK to set the boot disk in Creating", and does allowing that allow correctness issues (no) or a possibly-confusing-but-correct API result (yes). otoh "actions we can take on not-incarnated instances" is a much easier concept than having to think through each kind of action and state combination, so it's nicer in an axiomatic way.

*: after writing this i realize why you said NOT_INCARNATED rather than NO_VMM - because there is a NoVmm state which is very distinct from what this list is getting at. so i should go change that name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my thinking is that we should avoid having weird exceptions for setting specific parameters unless it's absolutely necessary. I'd be quite surprised if we don't basically just end up being able to divide every possible instance_reconfigure into either "can do this only if there is no Real Life VMM Process incarnating the instance" and "can do this whenever", and I'd prefer to have to burn that bridge only once we come to it...

nexus/db-queries/src/db/datastore/instance.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/instance.rs Outdated Show resolved Hide resolved
"better" here meaning "if we don't need to update the database, don't
update the database". if we would change the boot disk to the same
value, or we would change it from null to null, this change means we
will instead not even select the row for an update. since we still
`check_if_exists` we'll still fetch an Instance for the operation if
there's no update performed. so, if the database can detect that the
update would have been a no-op and elide it, this change is not
particularly useful. if the database would have to commit some state as
a result of an ineffectual `set`, then this is helpful!

more importantly, this is moderately more legible than smearing the
update logic and consistency checking between the Diesel query and
handling in the `NotUpdatedButExists` arm.

at the same time, Eliza rightfully pointed out that there are now (at
least) two copies of a list of states that are both roughly "there is no
VMM for this instance, so we can do things to its' database record".
extract that out to a const on `InstanceState` and name it
appropriately.

finally, the instance size change logic incorrectly rejected size
changes that only changed one of `(memory, cpus)`. fix that to allow
changing one or the other independently and add tests to catch a
future dingus (me) who might regress that somehow.
avoid ambiguity between "NoVmm the state" and "NO_VMM the states that do not have a VMM"
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall! I had some last nitpicks, but no blocking concerns now. Sorry I didn't get back to you on the latest changes sooner!

nexus/db-model/src/instance.rs Show resolved Hide resolved
Comment on lines -1131 to -1137
// * Allow setting the boot disk in NoVmm because there is no VMM to
// contend with.
// * Allow setting the boot disk in Failed to allow changing the boot
// disk of a failed instance and free its boot disk for detach.
// * Allow setting the boot disk in Creating because one of the last
// steps of instance creation, while the instance is still in
// Creating, is to reconfigure the instance to the desired boot disk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth keeping a comment similar to this one, but moving it to the filter(instance_dsl::state.eq_any(NOT_INCARNATED_STATES)) line? not a big deal though --- I do think it's nice to document someplace in here "we can't set the boot disk if the instance is incarnated because, you know..."

nexus/db-queries/src/db/datastore/instance.rs Show resolved Hide resolved
Comment on lines +1208 to +1212
// There should be no other reason the update fails on an
// existing instance.
return Err(err.bail(Error::internal_error(
"unable to reconfigure instance boot disk",
)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth adding more details to the internal message for this error, since it indicates pretty clearly that there was a bug --- it would be nice if the instance record was logged if this ever did happen? not a blocker though.

Comment on lines +1221 to +1223
/// Does not allow setting sizes of running instances to ensure that if an
/// instance is running, its resource reservation matches what we record in
/// the database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicky, not a big deal: maybe explicitly state that Error::Conflict is returned in that case?

Comment on lines +1267 to +1271
// There should be no other reason the update fails on an
// existing instance.
return Err(err.bail(Error::internal_error(
"unable to reconfigure instance size",
)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, this indicates some kind of bug in nexus, so it might be nice to record additional information about it, in case we end up debugging this in the future? not a big deal though.

assert_eq!(instance.ncpus.0, new_ncpus.0);
assert_eq!(instance.memory, new_memory);

// No harm in reverting to the original size one field at a time though:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for testing the case where only one value differs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Want ability to resize the vcpu and memory of stopped instances
4 participants