-
Notifications
You must be signed in to change notification settings - Fork 884
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing these bugs and submitting the PR! Please let me know if you agree with the slightly different solutions I proposed. In addition, please revise the commit message of each patch:
- We require a
Signed-off-by:
line at the end: https://github.com/intel/haxm/blob/master/CONTRIBUTING.md - The subject line need to be concise:
- For the first patch, how about "Avoid flushing outdated PDPTE0..3 cache to VMCS"?
- For the second patch, how about "page_walker: Fix PAE PDPT pointer calculation"?
- There are some helpful guidelines for writing a good commit message here: https://chris.beams.io/posts/git-commit/
core/page_walker.c
Outdated
pdpt_gpa >> PG_ORDER_4K, | ||
&pdpt_kmap, NULL); | ||
|
||
uint pdpt_offset_on_page = ((uint)pdpt_gpa) & ((1 << PG_ORDER_4K) - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Instead of modifying the CONFIG_HAX_EPT2
branch, how about postponing the offset fix until after line 671:
// Quoting original code from line 671
#endif // CONFIG_HAX_EPT2
if (pdpt_hva == NULL) {
retval = TF_FAILED;
goto out;
}
if (!is_lme) {
// In PAE paging mode, pdpt_gpa is 32-byte aligned, not 4KB-aligned
pdpt_hva += (uint)(pdpt_gpa & (PAGE_SIZE_4K - 1));
}
The !is_lme
check is optional, but makes sure the new logic is only applied to PAE Paging mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, pointer modifications should be done after checking the returned value.
core/vcpu.c
Outdated
@@ -79,7 +79,7 @@ static void vcpu_prepare(struct vcpu_t *vcpu); | |||
static void vcpu_init_emulator(struct vcpu_t *vcpu); | |||
|
|||
static void vmread_cr(struct vcpu_t *vcpu); | |||
static void vmwrite_cr(struct vcpu_t *vcpu); | |||
static void vmwrite_cr(struct vcpu_t *vcpu, bool can_update_pdpte); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding an explicit Boolean switch works, but I think it slightly hurts code readability: people might be looking at all the vmwrite_cr(vcpu, false);
calls and wonder what the second parameter is about. Since the sole purpose is to hack into the function and skip a few lines of PAE-specific code, I'd like to propose an alternative approach that preserves the signature of vmwrite_cr()
.
The idea is to move the Boolean to struct vcpu_t
(core/include/vcpu.h
), and make sure it's properly updated. In fact, vcpu_t
already contains an unnamed bit field, where some of the existing bits serve a similar purpose:
// Quoting from vcpu.h, line 181
struct {
uint64_t paused : 1;
uint64_t panicked : 1;
// ...
uint64_t debug_control_dirty : 1;
uint64_t dr_dirty : 1;
uint64_t rflags_dirty : 1;
uint64_t rip_dirty : 1;
uint64_t fs_base_dirty : 1;
uint64_t interruptibility_dirty : 1;
uint64_t pcpu_ctls_dirty : 1;
uint64_t pae_pdpt_dirty : 1;
uint64_t padding : 45;
};
Then, we set pae_pdpt_dirty
bit after updating vcpu_t::pae_pdptes[]
(in vcpu_prepare_pae_pdpt()
), and reset it after flushing these dirty values to VMCS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me both solutions are feasible and depend on the code style of the project. For many vmcs field modifications haxm uses dirty bits paradigm, but there are counterexamples - vmwrite_cr directly modifies SECONDARY_PROCESSOR_CONTROLS (PRIMARY_PROCESSOR_CONTROLS are modified using pcpu_ctls_dirty flag).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, #117 added pcpu_ctls_dirty
along with several other dirty flags. It only covered the VMCS fields that would benefit a lot (in terms of performance) from delayed VMWRITE
. There's also #130 (still WIP) which aims to convert more VMCS fields to use the cached VMREAD
/ delayed VMWRITE
paradigm. So I think the "style" is moving towards more prevalent use of dirty bits, and adding pae_pdpt_dirty
would be consistent with that.
core/vcpu.c
Outdated
vmwrite(vcpu, GUEST_PDPTE1, vcpu->pae_pdptes[1]); | ||
vmwrite(vcpu, GUEST_PDPTE2, vcpu->pae_pdptes[2]); | ||
vmwrite(vcpu, GUEST_PDPTE3, vcpu->pae_pdptes[3]); | ||
if( can_update_pdpte ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the alternative approach I proposed above, we just need a simple check here:
if (vcpu->pae_pdpt_dirty) {
// vcpu_prepare_pae_pdpt() has updated vcpu->pae_pdptes
// Note that because we do not monitor guest writes to CR3, the only
// case where vcpu->pae_pdptes is newer than VMCS GUEST_PDPTE{0..3}
// is following a guest write to CR0 or CR4 that requires PDPTEs to
// be reloaded, i.e. the pae_pdpt_dirty case. When the guest is in
// PAE paging mode but !pae_pdpt_dirty, VMCS GUEST_PDPTE{0..3} are
// already up-to-date following each VM exit (see Intel SDM Vol. 3C
// 27.3.4), and we must not overwrite them with our cached values
// (vcpu->pae_pdptes), which may be outdated.
vmwrite(vcpu, GUEST_PDPTE0, vcpu->pae_pdptes[0]);
vmwrite(vcpu, GUEST_PDPTE1, vcpu->pae_pdptes[1]);
vmwrite(vcpu, GUEST_PDPTE2, vcpu->pae_pdptes[2]);
vmwrite(vcpu, GUEST_PDPTE3, vcpu->pae_pdptes[3]);
vcpu->pae_pdpt_dirty = 0;
}
Note that I've also removed the outdated TODO comment and added a new comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Interception of CR3 modifications complicates the solution and should be avoided.
BTW, how did you test the patches? We'd like to add PAE paging mode to our test coverage. If you could share your guest image with us, that would be great. Otherwise, we'll try to build a Linux i386 kernel with PAE enabled. |
I didn't configure my windows guest to use PAE. I was even surprised that PAE mode is on, because I allocated only 600 Mb of RAM to my VM. I googled that windows can decide to use PAE automatically, based on some internal logic. This means that my image isn't acceptable for a test harness. |
|
Ah, sorry, I overlooked the part about your use of a 32-bit Win7 guest. So you have enabled Windows to boot on HAXM for the first time, that's truly awesome! #118 is about booting a x64 Windows guest, but even if it's still broken (likely due to lack of CR8 virtualization, which is only relevant to x64), we know we are now a lot closer to getting it working :) |
I just wanted to chime in with my observations of a quick test of 88ce709 on a 32-bit Ubuntu 16.04 (Xenial) system: I was able to boot a Windows 7 (SP1) x86 system on QEMU (v3.0.0) with '-accel hax'. A further test of a Windows 8.1 x86 system was also successful. The same can't be said when I then moved onto Windows 10 (1809), but this is still quite some progress. And of course any attempts using x64 boot media ended up in failure (e.g. 'VCPU shutdown request') |
FYI, fail when update windows 10. Here is log when launching Windows 10 32bit iso: |
I've now re-done the Win10 (1809) x86 test, and will attach the relevant part from the 'dmesg' output here. Whilst I can't interpret this log, I've also added in this attachment the "cleaned up" versions of the log from the Windows host posted above, and my test from a Linux host. Maybe someone else can "see" what's going on. |
@maronz Thanks a lot. @HaHoYou was actually running the test on macOS, but apparently the host OS doesn't really matter in this case, because this fatal error appears in all the logs:
According to Intel SDM Vol. 3D Appendix C, VM exit reason 9 is:
which refers to the hardware task switching mechanism. Most OSes do not use this mechanism at all (in favor of the more flexible software task switching), and I think it's no longer available in 64-bit mode. But for some reason, Windows 10 x86 uses it, so we'll need to support it at some point. |
|
Thanks, the new patches good! But before we can merge this PR, we need to remove the first two commits (e7d78b7 and 88ce709) from the branch, and only keep the last two. Could you rewrite the git history and force push?
That would be great! #118 can be used to track both issues, but if it makes sense, we can create a new one for Windows 10 x86 as well. |
Signed-off-by: Alexey Romko <[email protected]>
I removed all commits, force-pushed the branch, committed again and pushed. I didn't close the PR. Github thought that by reverting all commits I closed the PR? |
core/page_walker.c
Outdated
@@ -654,22 +654,26 @@ uint32_t pw_perform_page_walk( | |||
pdpt_gpa = first_table; | |||
} | |||
|
|||
char* pdpt_page_hva; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There are a few cosmetic issues with this declaration:
- All local variables should be declared at the beginning of the current scope, i.e. right below line 618 (C coding style).
char* p
=>char *p
(C coding style).- Although
sizeof(char)
is always 1, I'd still preferuint8_t
, which is explicit about the size.
Sorry I overlooked them yesterday. Let me fix them and push a new commit to your branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm having some trouble pushing to your branch, due to an HTTPS authentication error. Hopefully I'll find a solution soon. Meanwhile, please feel free to amend your second commit yourself:
diff --git a/core/page_walker.c b/core/page_walker.c
index 9b0f1c5..ffdea4d 100644
--- a/core/page_walker.c
+++ b/core/page_walker.c
@@ -616,8 +616,6 @@ uint32_t pw_perform_page_walk(
first_table = pw_retrieve_table_from_cr3(cr3, is_pae, is_lme);
if (is_pae) {
- uint8_t *pdpt_page_hva;
-
if (is_lme) {
pml4t_gpa = first_table;
#ifdef CONFIG_HAX_EPT2
@@ -656,6 +654,7 @@ uint32_t pw_perform_page_walk(
pdpt_gpa = first_table;
}
+ char* pdpt_page_hva;
#ifdef CONFIG_HAX_EPT2
pdpt_page_hva = gpa_space_map_page(&vcpu->vm->gpa_space,
pdpt_gpa >> PG_ORDER_4K,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. We'll start testing this PR now. Before I merge it, I'll merge the last two commits (git rebase -i
with fixup
) and force-push to your branch.
Thanks. You didn't have to force-push right after removing all commits, because then the branch would be identical to |
Signed-off-by: Alexey Romko <[email protected]>
Fixed two bugs
Both fixes were compiled only for Windows and were tested on a Win10 x64 host and Win7 x32 guest. I was able to run the guest, so this fix maybe fixes #118.