Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable PCIe hotplug in dom0 #6620

Open
DemiMarie opened this issue May 19, 2021 · 34 comments
Open

Enable PCIe hotplug in dom0 #6620

DemiMarie opened this issue May 19, 2021 · 34 comments
Labels
C: kernel C: Xen hardware support P: major Priority: major. Between "default" and "critical" in severity. waiting for upstream This issue is waiting for something from an upstream project to arrive in Qubes. Remove when closed.

Comments

@DemiMarie
Copy link

DemiMarie commented May 19, 2021

The problem you're addressing (if any)

PCIe hotplug is currently disabled in dom0. This causes breakage on some laptops and prevents Thunderbolt from being used, even though a Thunderbolt eGPU on recent hardware is the most secure method I know of to get hardware-accelerated graphics in a qube.

Describe the solution you'd like

We should enable PCIe hotplug in dom0.

Where is the value to a user, and who might that user be?

Many users, including our own @fepitre.

Describe alternatives you've considered

None

Additional context

Previously, having PCIe hotplug enabled in the dom0 kernel was considered a security risk, but Xen developers have indicated that it is not.

Relevant documentation you've consulted

Related, non-duplicate issues

#4353, #5522, #5453

@DemiMarie DemiMarie added T: enhancement P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels May 19, 2021
@brendanhoar
Copy link

brendanhoar commented May 19, 2021

Do you have links to Xen discussions re: change in security posture regarding thunderbolt and/or PCIe hotplug?

Personally I want this...but only if reasonably safe.

@marmarek
Copy link
Member

Historically Xen assigned all new devices to dom0 by default (at least IOMMU-wise). Since XSA-302, it gained a quarantine IOMMU domain support, which should (theoretically) be used instead. This indeed should make it reasonably safe to re-enable PCI hotplug, What remains to be done is:

  • verifying if newly plugged in PCI devices indeed are assigned to IOMMU quarantine domain (xl debug-key Q && xl dmesg should show that), and
  • verifying if no dom0 part (especially the kernel and toolstack) tries to move such freshly connected device to dom0; preventing relevant driver to load automatically could be desirable too (attach it to xen-pciback instead)

@andrewdavidwong
Copy link
Member

It sounds like enabling this would still be a non-zero increase in security risk, since the intended safety mechanism is yet another thing that could fail in unexpected ways, so shouldn't this be opt-in rather than enabled by default for everyone?

@iamahuman
Copy link

The decision to disable PCI hotplugging is at: #1673

#3245 is also related, since dom0 kernel is also used in AppVMs by default. From #3245 (comment):

QEMU notifies newly device_add-ed PCI devices to the VM via ACPI hotplugging mechanism, which is disabled in dom0 kernel. Maybe use in-VM kernel instead, or compile another kernel specifically for VMs -- see #5212.

This issue is a duplicate of an issue I have previously reported, which also happens to be among one of the GitHub issues that disappeared a while ago. 😓

@andrewdavidwong

This comment has been minimized.

@marmarek

This comment has been minimized.

@unman

This comment has been minimized.

@rustybird

This comment has been minimized.

@unman

This comment has been minimized.

@rustybird

This comment has been minimized.

@unman

This comment has been minimized.

@aslfv
Copy link

aslfv commented Sep 2, 2022

I am using a thunderbolt 4 docking station through which I connect my external displays. These wont be recognized (via xrandr) unless they are cold plugged (at boot). Is this issue (6620) the root cause? And are there possible workarounds known that are maybe specific to external displays connected via TB4?

@qtpies
Copy link

qtpies commented Sep 2, 2022

I am using a thunderbolt 4 docking station through which I connect my external displays. These wont be recognized (via xrandr) unless they are cold plugged (at boot). Is this issue (6620) the root cause? And are there possible workarounds known that are maybe specific to external displays connected via TB4?

Yes, this issue is the root cause. If there is a workaround it would be a bug. For me the workaround was to use a non-thunderbolt old-style Thinkpad Ultra dock, it is working fine with Qubes.

@aslfv
Copy link

aslfv commented Sep 11, 2022

Yes, this issue is the root cause. If there is a workaround it would be a bug. For me the workaround was to use a non-thunderbolt old-style Thinkpad Ultra dock, it is working fine with Qubes.

Thank you. But unfortunately it seems that this is no option for me as I have not found a non-thunderbolt docking station with 130 W power supply over USB-C.

@aslfv
Copy link

aslfv commented Sep 11, 2022

I wonder now if using a custom kernel with CONFIG_HOTPLUG_PCI =y would be acceptable in my case despite the risk described above and in #1673. These risks only apply to my settings to a limited extent: Firewire and expresscard are something I do not need to worry about. And as described in https://www.kernel.org/doc/html/latest/admin-guide/thunderbolt.html security levels can be defined for TB. In my case, I have already TB restricted to only video and usb via the bios.
Would using such a custom kernel still be discouraged under these circumstances?

@Pesicp
Copy link

Pesicp commented Nov 21, 2022

Will this ever be resolved in future updates?
I have updated to kernel 6.0.8-1 but still not fixed.
Please developers fix this problem

@Foosec
Copy link

Foosec commented Jun 8, 2023

Is this still an issue? I thought it would be and wanted to build a custom ISO, but in the sources i saw it as enabled, so i decided to just try the official ISO, and well thunderbolt hotplug works fine!

I really hope i didn't just ruin my own usecase and it was actually changed on purpose!

@andrewdavidwong andrewdavidwong removed this from the Release TBD milestone Aug 13, 2023
@HRio
Copy link

HRio commented Sep 7, 2023

i decided to just try the official ISO, and well thunderbolt hotplug works fine!

what build was that? just tested 4.2.0-rc3 and CONFIG_HOTPLUG_PCI is not enabled in the dom0 kernel.

@3hhh
Copy link

3hhh commented Jan 13, 2024

Some other users report thunderbolt to be working as well.

So I wonder whether
a) this issue was silently fixed and if so, how does it work now?
b) there may be a security issue?

@UndeadDevel
Copy link

@3hhh it works if you plug in the device (e.g. a dock) before boot; that is not hotplug, however. I don't think anyone reported PCI hotplug to work, including in the thread you linked.

@3hhh
Copy link

3hhh commented Jan 13, 2024 via email

@DemiMarie
Copy link
Author

Some other users report thunderbolt to be working as well.

So I wonder whether a) this issue was silently fixed and if so, how does it work now? b) there may be a security issue?

It’s a security issue in either their firmware or how their firmware is configured.

@Foosec
Copy link

Foosec commented Jan 15, 2024

Granted this was a while ago, but in my testing it worked as a hotplug, so plugging in after booting!

@DemiMarie
Copy link
Author

Granted this was a while ago, but in my testing it worked as a hotplug, so plugging in after booting!

Interesting!

@marmarek marmarek removed their assignment Mar 6, 2024
@ethnh
Copy link

ethnh commented Jun 13, 2024

Granted this was a while ago, but in my testing it worked as a hotplug, so plugging in after booting!

Is there any guide/forum post to replicate? I'd love my eGPU to hotplug

@DemiMarie
Copy link
Author

@marmarek: did your testing trust the log output from Xen or dom0, or did it actually try to perform a PCI DMA transaction and see if the operation succeeded?

@duncancmt
Copy link

duncancmt commented Jun 23, 2024

Chiming in as another user who desperately needs this feature. I'm a software engineer and AI has become an important part of the skillset. Rather than ship all my keystrokes off to OpenAI/Microsoft, I'd like to be able to run a LLM locally. I want to attach a TH3P4 eGPU to my laptop, but something about the boot process makes it always reset. Then the lack of hotplug means that I never actually get to see it.

If there isn't a workaround, I'm probably going to be forced to switch off of Qubes due to the importance of AI-based workflows 😢

@marmarek
Copy link
Member

There aren't any workarounds in R4.2, but there is a hope for some (even partial) support in R4.3. Partial means it isn't going to be fully security supported, but I hope to get it working at least for trusted devices. I'll update this ticket when I get some new information and/or something in testable state.

@DemiMarie
Copy link
Author

There aren't any workarounds in R4.2, but there is a hope for some (even partial) support in R4.3. Partial means it isn't going to be fully security supported, but I hope to get it working at least for trusted devices. I'll update this ticket when I get some new information and/or something in testable state.

Will a fully supported solution need to wait until R4.4?

@marmarek
Copy link
Member

Will a fully supported solution need to wait until R4.4?

It isn't a question whether it "needs" to wait. It's a question how much we will manage in time for R4.3. I would be more than happy to get everything working perfectly in R4.3 (or, even better - yesterday!), but I try to be realistic.
The plan minimum I mentioned is to make sure hot plugged devices land in a quarantine IOMMU domain, but when approved they get attached to dom0 (that's why only "trusted devices"), and can be further re-assigned to other domain then.
Ideally, device would be first assigned to non-dom0 domain and only then approved, but technically it's significantly more complicated (especially if PCIe bridges or switches are involved - which is the case for many TB4 devices), so that's a further step.

@DemiMarie
Copy link
Author

Ideally, device would be first assigned to non-dom0 domain and only then approved, but technically it's significantly more complicated (especially if PCIe bridges or switches are involved - which is the case for many TB4 devices), so that's a further step.

What makes this more complex?

@marmarek
Copy link
Member

@andrewdavidwong andrewdavidwong added P: major Priority: major. Between "default" and "critical" in severity. and removed P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Jun 24, 2024
@GC-Molski
Copy link

There aren't any workarounds in R4.2, but there is a hope for some (even partial) support in R4.3. Partial means it isn't going to be fully security supported, but I hope to get it working at least for trusted devices. I'll update this ticket when I get some new information and/or something in testable state.

Will this be any help for Asus kernel modules such as asus-nb-wmi that rely on PCI hotplug? (See #5453). Currently these systems have no thermal controls. A user on the forums wahcha ended up cutting wires to keep fans at 100%.

@marmarek
Copy link
Member

There is related development in Xen - PV-IOMMU: https://lore.kernel.org/xen-devel/e80bf868a009425c03ce4589bf8af09fb147a6e3.1737470269.git.teddy.astie@vates.tech/
This allows dom0 (and some domU in the future) to tighten IOMMU configuration. Right now, Xen manages IOMMU on the whole domain basis - device is allowed to access any memory of the domain it is attached to. With PV-IOMMU, the domain (initially only dom0) will be able to limit that, for example have a device that is assigned to dom0, but has blocked DMA.
This is just one of the building blocks for enabling PCIe hotplug, but a significant one.

Currently patches are at RFC stage (but v5 already), and quite a few parts are still missing. It won't make it into Xen 4.20 for sure, so it's also unlikely to land in Qubes OS 4.3. But maybe Qubes OS 4.4...

@andrewdavidwong andrewdavidwong added the waiting for upstream This issue is waiting for something from an upstream project to arrive in Qubes. Remove when closed. label Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: kernel C: Xen hardware support P: major Priority: major. Between "default" and "critical" in severity. waiting for upstream This issue is waiting for something from an upstream project to arrive in Qubes. Remove when closed.
Projects
None yet
Development

No branches or pull requests