-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow instance_reconfigure to resize a stopped instance #6774
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Greg Colombo <[email protected]>
Heya! Can we hold off on merging this until v12 please? It may break the TF provider which I've already released for v11 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that we will currently only perform updates that change both the number of vCPUs and the amount of memory, which I think is wrong. I think we should be able to change either individually.
Besides that, this seems basically fine; it would be nice to avoid the nop boot disk update but if the diesel is too scary, whatever...
// Do not allow setting sizes of running instances to ensure that if an | ||
// instance is running, its resource usage matches what we record in the | ||
// database. | ||
const OK_TO_SET_SIZE_STATES: &'static [InstanceState] = &[ | ||
InstanceState::NoVmm, | ||
InstanceState::Failed, | ||
InstanceState::Creating, | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that OK_TO_SET_SIZE_STATES
and OK_TO_SET_BOOT_DISK_STATES
are the same list of states, and what we really want in both cases is "states in which there is no VMM actively incarnating this instance". I kind of wonder if, rather than duplicating in both the instance-resize and instance-set-boot-disk queries, we should just have a constant on InstanceState
that's like NOT_INCARNATED_STATES
or similar, so that these queries and any future ones can all use the same set of states, and if we add new states that have "not incarnated" semantics, we can add them to the one definition of that list of states, instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on one hand i agree (and enough to go just do the thing, there's now InstanceState::NO_VMM_STATES
*), but from the conversations we've had we've been pretty close to having caveouts for specific not-incarnated states and fields. namely "is it OK to set the boot disk in Creating
", and does allowing that allow correctness issues (no) or a possibly-confusing-but-correct API result (yes). otoh "actions we can take on not-incarnated instances" is a much easier concept than having to think through each kind of action and state combination, so it's nicer in an axiomatic way.
*: after writing this i realize why you said NOT_INCARNATED
rather than NO_VMM
- because there is a NoVmm
state which is very distinct from what this list is getting at. so i should go change that name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my thinking is that we should avoid having weird exceptions for setting specific parameters unless it's absolutely necessary. I'd be quite surprised if we don't basically just end up being able to divide every possible instance_reconfigure
into either "can do this only if there is no Real Life VMM Process incarnating the instance" and "can do this whenever", and I'd prefer to have to burn that bridge only once we come to it...
"better" here meaning "if we don't need to update the database, don't update the database". if we would change the boot disk to the same value, or we would change it from null to null, this change means we will instead not even select the row for an update. since we still `check_if_exists` we'll still fetch an Instance for the operation if there's no update performed. so, if the database can detect that the update would have been a no-op and elide it, this change is not particularly useful. if the database would have to commit some state as a result of an ineffectual `set`, then this is helpful! more importantly, this is moderately more legible than smearing the update logic and consistency checking between the Diesel query and handling in the `NotUpdatedButExists` arm. at the same time, Eliza rightfully pointed out that there are now (at least) two copies of a list of states that are both roughly "there is no VMM for this instance, so we can do things to its' database record". extract that out to a const on `InstanceState` and name it appropriately. finally, the instance size change logic incorrectly rejected size changes that only changed one of `(memory, cpus)`. fix that to allow changing one or the other independently and add tests to catch a future dingus (me) who might regress that somehow.
avoid ambiguity between "NoVmm the state" and "NO_VMM the states that do not have a VMM"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall! I had some last nitpicks, but no blocking concerns now. Sorry I didn't get back to you on the latest changes sooner!
// * Allow setting the boot disk in NoVmm because there is no VMM to | ||
// contend with. | ||
// * Allow setting the boot disk in Failed to allow changing the boot | ||
// disk of a failed instance and free its boot disk for detach. | ||
// * Allow setting the boot disk in Creating because one of the last | ||
// steps of instance creation, while the instance is still in | ||
// Creating, is to reconfigure the instance to the desired boot disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth keeping a comment similar to this one, but moving it to the filter(instance_dsl::state.eq_any(NOT_INCARNATED_STATES))
line? not a big deal though --- I do think it's nice to document someplace in here "we can't set the boot disk if the instance is incarnated because, you know..."
// There should be no other reason the update fails on an | ||
// existing instance. | ||
return Err(err.bail(Error::internal_error( | ||
"unable to reconfigure instance boot disk", | ||
))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be worth adding more details to the internal message for this error, since it indicates pretty clearly that there was a bug --- it would be nice if the instance record was logged if this ever did happen? not a blocker though.
/// Does not allow setting sizes of running instances to ensure that if an | ||
/// instance is running, its resource reservation matches what we record in | ||
/// the database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicky, not a big deal: maybe explicitly state that Error::Conflict
is returned in that case?
// There should be no other reason the update fails on an | ||
// existing instance. | ||
return Err(err.bail(Error::internal_error( | ||
"unable to reconfigure instance size", | ||
))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, this indicates some kind of bug in nexus, so it might be nice to record additional information about it, in case we end up debugging this in the future? not a big deal though.
assert_eq!(instance.ncpus.0, new_ncpus.0); | ||
assert_eq!(instance.memory, new_memory); | ||
|
||
// No harm in reverting to the original size one field at a time though: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for testing the case where only one value differs!
mostly just adapted #6321 to the new instance update endpoint machinery. same general behavior as outlined there: can only resize stopped instances, updated instance still returns the updated
InstanceAndActiveVmm
. same size validation as we do for a new instance, tests checking all that as well.Also fixes #3769.