Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GET: Allow VPCI_DEVICE_CONTROL to interleave with most host requests #964

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jackschefer-msft
Copy link

@jackschefer-msft jackschefer-msft commented Mar 5, 2025

This change addresses a circular wait loop involving VPCI_DEVICE_CONTROL messages. Specifically, in certain scenarios, serial processing of GET host requests blocks the guest OS in VTL0 from making forward progress.

For example, in order for a VPCI_DEVICE_CONTROL message revoking a VPCI device to complete, the VTL0 OS may be accessing other PCI devices that are themselves proxied over the GET (ex VgaProxyPciReadRequest).

@jackschefer-msft jackschefer-msft requested a review from a team as a code owner March 5, 2025 21:09
@jackschefer-msft jackschefer-msft marked this pull request as draft March 5, 2025 21:10
@@ -479,9 +484,9 @@ pub(crate) struct ProcessLoop<T: RingMem> {
#[inspect(skip)]
write_recv: mesh::Receiver<WriteRequest>,
#[inspect(skip)]
igvm_attest_requests: VecDeque<Pin<Box<dyn Future<Output = Result<(), FatalError>> + Send>>>,
async_host_requests: VecDeque<Pin<Box<dyn Future<Output = Result<(), FatalError>> + Send>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we have a queue-per-message kind? i.e: with this model, would an outstanding IGVM_ATTEST request block a VPCI_DEVICE_CONTROL message?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understood the current GET protocol here, these "async" or "delegate" host requests are asynchronous with respect to the normal/serial host requests, but they are not asynchronous with respect to one another.

That is, it would be illegal for OpenHCL to have two of these requests outstanding at a given time, and so we should have them queue together.

In my opinion, it would be desirable for a future revision of the GET protocol to relax that serialization

Maybe "async" is a poor name for this "secondary queue of serialized requests"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked too deep at how the current host-code is structured, but if there is indeed a shared delegate for all async requests, then yes - this would be an accurate model of the current protocol revision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more question (asked without looking too deep at the current code) - do we guarantee that there is only ever one async request sent from OpenHCL at a time? If not, then we may want to tweak this code to buffer sending new async requests to the host if we have already sent one earlier (as opposed to sending out multiple requests, and then waiting for multiple responses).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my goal, but I am not convinced I have guaranteed it. I think it would be super useful to write a test that confirms the behavior but I'm still trying to figure out what test code is available for the GET client. Please let me know if you have any pointers/examples!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requests sent to GED will be serialized, but there is no guarantee for the responses if they are handled asynchronously. I added a separate queue so that OpenHCL can check the asynchronous IGVM attest response separately from the main queue (which handles each request and response sequentially) so that we don’t block (or being blocked).

Now if we put another request in the same queue with IGVM attest, OpenHCL will handle the both responses sequentially; I.e., the responses for both request types will now block each other. If GED handles these requests asynchronously, we should put the new request in a new queue.

@jackschefer-msft
Copy link
Author

@microsoft-github-policy-service agree company="Microsoft"

// handled by the `host_requests` queue.
let run_next_igvm_attest = async {
while let Some(request) = self.igvm_attest_requests.front_mut() {
// DEVNOTE: There may only be one outstanding async host request at a time, which means
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there can only be one, should async_host_requests be made an Option instead of a VecDeque?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ties back to the discussion in #964 (comment)

theoretically, we could just buffer outgoing messages in a queue, and only release them once the host has responded the last-sent message. that way, you'd still only ever have one message in flight at the protocol level, but OpenHCL could start using a "more reasonable" infallible* API for outgoing requests.

*well, we might want to cap the queue at some reasonable value, to avoid unbounded growth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants