-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GET: Allow VPCI_DEVICE_CONTROL to interleave with most host requests #964
base: main
Are you sure you want to change the base?
GET: Allow VPCI_DEVICE_CONTROL to interleave with most host requests #964
Conversation
@@ -479,9 +484,9 @@ pub(crate) struct ProcessLoop<T: RingMem> { | |||
#[inspect(skip)] | |||
write_recv: mesh::Receiver<WriteRequest>, | |||
#[inspect(skip)] | |||
igvm_attest_requests: VecDeque<Pin<Box<dyn Future<Output = Result<(), FatalError>> + Send>>>, | |||
async_host_requests: VecDeque<Pin<Box<dyn Future<Output = Result<(), FatalError>> + Send>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we have a queue-per-message kind? i.e: with this model, would an outstanding IGVM_ATTEST request block a VPCI_DEVICE_CONTROL message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understood the current GET protocol here, these "async" or "delegate" host requests are asynchronous with respect to the normal/serial host requests, but they are not asynchronous with respect to one another.
That is, it would be illegal for OpenHCL to have two of these requests outstanding at a given time, and so we should have them queue together.
In my opinion, it would be desirable for a future revision of the GET protocol to relax that serialization
Maybe "async" is a poor name for this "secondary queue of serialized requests"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked too deep at how the current host-code is structured, but if there is indeed a shared delegate for all async requests, then yes - this would be an accurate model of the current protocol revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more question (asked without looking too deep at the current code) - do we guarantee that there is only ever one async request sent from OpenHCL at a time? If not, then we may want to tweak this code to buffer sending new async requests to the host if we have already sent one earlier (as opposed to sending out multiple requests, and then waiting for multiple responses).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my goal, but I am not convinced I have guaranteed it. I think it would be super useful to write a test that confirms the behavior but I'm still trying to figure out what test code is available for the GET client. Please let me know if you have any pointers/examples!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requests sent to GED will be serialized, but there is no guarantee for the responses if they are handled asynchronously. I added a separate queue so that OpenHCL can check the asynchronous IGVM attest response separately from the main queue (which handles each request and response sequentially) so that we don’t block (or being blocked).
Now if we put another request in the same queue with IGVM attest, OpenHCL will handle the both responses sequentially; I.e., the responses for both request types will now block each other. If GED handles these requests asynchronously, we should put the new request in a new queue.
@microsoft-github-policy-service agree company="Microsoft" |
// handled by the `host_requests` queue. | ||
let run_next_igvm_attest = async { | ||
while let Some(request) = self.igvm_attest_requests.front_mut() { | ||
// DEVNOTE: There may only be one outstanding async host request at a time, which means |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there can only be one, should async_host_requests be made an Option instead of a VecDeque?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ties back to the discussion in #964 (comment)
theoretically, we could just buffer outgoing messages in a queue, and only release them once the host has responded the last-sent message. that way, you'd still only ever have one message in flight at the protocol level, but OpenHCL could start using a "more reasonable" infallible* API for outgoing requests.
*well, we might want to cap the queue at some reasonable value, to avoid unbounded growth
This change addresses a circular wait loop involving VPCI_DEVICE_CONTROL messages. Specifically, in certain scenarios, serial processing of GET host requests blocks the guest OS in VTL0 from making forward progress.
For example, in order for a VPCI_DEVICE_CONTROL message revoking a VPCI device to complete, the VTL0 OS may be accessing other PCI devices that are themselves proxied over the GET (ex
VgaProxyPciReadRequest
).