Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe drop order of GbmDevice and GbmBufferedSurface #1102

Open
PolyMeilex opened this issue Aug 15, 2023 · 11 comments
Open

Unsafe drop order of GbmDevice and GbmBufferedSurface #1102

PolyMeilex opened this issue Aug 15, 2023 · 11 comments

Comments

@PolyMeilex
Copy link
Member

This order of drop causes segfault:

    gbm: gbm::GbmDevice<DrmDeviceFd>,
    surfaces: HashMap<crtc::Handle, GbmBufferedSurface<GbmAllocator<DrmDeviceFd>, ()>>,

This one works fine:

    surfaces: HashMap<crtc::Handle, GbmBufferedSurface<GbmAllocator<DrmDeviceFd>, ()>>,
    gbm: gbm::GbmDevice<DrmDeviceFd>,

Backtrace of said segfault

#0  0x00007fd67b4b7510 in  ()
#1  0x00007fd67dd08d88 in gbm_dri_bo_destroy () at /lib64/libgbm.so.1
#2  0x000055da0563e52f in gbm::buffer_object::{impl#6}::new::{closure#0}<()> (ptr=0x55da06bdc7e0) at /home/poly/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gbm-0.12.0/src/buffer_object.rs:493
#3  0x000055da05654218 in core::ops::function::FnOnce::call_once<gbm::buffer_object::{impl#6}::new::{closure_env#0}<()>, (*mut gbm_sys::gbm_bo)> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:250
#4  0x000055da0574c3f7 in alloc::boxed::{impl#47}::call_once<(*mut gbm_sys::gbm_bo), (dyn core::ops::function::FnOnce<(*mut gbm_sys::gbm_bo), Output=()> + core::marker::Send), alloc::alloc::Global> (self=..., args=...)
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/boxed.rs:1985
#5  0x000055da057faa54 in gbm::{impl#0}::drop<gbm_sys::gbm_bo> (self=0x55da06bc7930) at /home/poly/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gbm-0.12.0/src/lib.rs:133
#6  0x000055da057f60e7 in core::ptr::drop_in_place<gbm::PtrDrop<gbm_sys::gbm_bo>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#7  0x000055da057497c0 in alloc::sync::Arc<gbm::PtrDrop<gbm_sys::gbm_bo>>::drop_slow<gbm::PtrDrop<gbm_sys::gbm_bo>> (self=0x55da06bdc6f0) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/sync.rs:1263
#8  0x000055da057fbfa2 in alloc::sync::{impl#27}::drop<gbm::PtrDrop<gbm_sys::gbm_bo>> (self=0x55da06bdc6f0) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/sync.rs:1899
#9  0x000055da057f913b in core::ptr::drop_in_place<alloc::sync::Arc<gbm::PtrDrop<gbm_sys::gbm_bo>>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#10 0x000055da057f5d5b in core::ptr::drop_in_place<gbm::Ptr<gbm_sys::gbm_bo>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#11 0x000055da057f7927 in core::ptr::drop_in_place<gbm::buffer_object::BufferObject<()>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#12 0x000055da057fa196 in core::ptr::drop_in_place<core::option::Option<gbm::buffer_object::BufferObject<()>>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#13 0x000055da05655ef7 in core::ptr::drop_in_place<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#14 0x000055da056853df in alloc::sync::Arc<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>>::drop_slow<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>>
    (self=0x55da06d36628) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/sync.rs:1263
#15 0x000055da0565efc1 in alloc::sync::{impl#27}::drop<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>> (self=0x55da06d36628)
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/sync.rs:1899
#16 0x000055da0565689a in core::ptr::drop_in_place<alloc::sync::Arc<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#17 0x000055da05656e18 in core::ptr::drop_in_place<[alloc::sync::Arc<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>>; 4]> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#18 0x000055da05657062 in core::ptr::drop_in_place<smithay::backend::allocator::swapchain::Swapchain<smithay::backend::allocator::gbm::GbmAllocator<smithay::backend::drm::device::fd::DrmDeviceFd>>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#19 0x000055da05657499 in core::ptr::drop_in_place<smithay::backend::drm::surface::gbm::GbmBufferedSurface<smithay::backend::allocator::gbm::GbmAllocator<smithay::backend::drm::device::fd::DrmDeviceFd>, ()>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#20 0x000055da05658c53 in core::ptr::drop_in_place<rendering::surface::OutputSurface> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#21 0x000055da0565b35e in core::ptr::drop_in_place<(drm::control::crtc::Handle, rendering::surface::OutputSurface)> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#22 0x000055da056f0484 in core::ptr::mut_ptr::{impl#0}::drop_in_place<(drm::control::crtc::Handle, rendering::surface::OutputSurface)> (self=0x55da06d365c0)
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mut_ptr.rs:1449
#23 hashbrown::raw::Bucket<(drm::control::crtc::Handle, rendering::surface::OutputSurface)>::drop<(drm::control::crtc::Handle, rendering::surface::OutputSurface)> (self=0x7ffdf6eb80c0)
    at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:344
#24 0x000055da056f1e4f in hashbrown::raw::RawTable<(drm::control::crtc::Handle, rendering::surface::OutputSurface), alloc::alloc::Global>::drop_elements<(drm::control::crtc::Handle, rendering::surface::OutputSurface), alloc::alloc::Global> (self=0x55da06481640) at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:620
#25 0x000055da0566050e in hashbrown::raw::{impl#17}::drop<(drm::control::crtc::Handle, rendering::surface::OutputSurface), alloc::alloc::Global> (self=0x55da06481640)
    at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:1848
#26 0x000055da05655dfa in core::ptr::drop_in_place<hashbrown::raw::RawTable<(drm::control::crtc::Handle, rendering::surface::OutputSurface), alloc::alloc::Global>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#27 0x000055da05656a0a in core::ptr::drop_in_place<hashbrown::map::HashMap<drm::control::crtc::Handle, rendering::surface::OutputSurface, std::collections::hash::map::RandomState, alloc::alloc::Global>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#28 0x000055da05655eba in core::ptr::drop_in_place<std::collections::hash::map::HashMap<drm::control::crtc::Handle, rendering::surface::OutputSurface, std::collections::hash::map::RandomState>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#29 0x000055da05658319 in core::ptr::drop_in_place<rendering::Device> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#30 0x000055da0565aece in core::ptr::drop_in_place<(smithay::backend::drm::node::DrmNode, rendering::Device)> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#31 0x000055da056f0404 in core::ptr::mut_ptr::{impl#0}::drop_in_place<(smithay::backend::drm::node::DrmNode, rendering::Device)> (self=0x55da064815d0) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mut_ptr.rs:1449
#32 hashbrown::raw::Bucket<(smithay::backend::drm::node::DrmNode, rendering::Device)>::drop<(smithay::backend::drm::node::DrmNode, rendering::Device)> (self=0x7ffdf6eb8260)
    at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:344
#33 0x000055da056f1d6f in hashbrown::raw::RawTable<(smithay::backend::drm::node::DrmNode, rendering::Device), alloc::alloc::Global>::drop_elements<(smithay::backend::drm::node::DrmNode, rendering::Device), alloc::alloc::Global>
    (self=0x7ffdf6eb8970) at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:620
#34 0x000055da0566048e in hashbrown::raw::{impl#17}::drop<(smithay::backend::drm::node::DrmNode, rendering::Device), alloc::alloc::Global> (self=0x7ffdf6eb8970)
    at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:1848
#35 0x000055da05655b3a in core::ptr::drop_in_place<hashbrown::raw::RawTable<(smithay::backend::drm::node::DrmNode, rendering::Device), alloc::alloc::Global>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
--Type <RET> for more, q to quit, c to continue without paging--jjjjjjjjjjjjjjj
#36 0x000055da056567ca in core::ptr::drop_in_place<hashbrown::map::HashMap<smithay::backend::drm::node::DrmNode, rendering::Device, std::collections::hash::map::RandomState, alloc::alloc::Global>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#37 0x000055da05655dea in core::ptr::drop_in_place<std::collections::hash::map::HashMap<smithay::backend::drm::node::DrmNode, rendering::Device, std::collections::hash::map::RandomState>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#38 0x000055da05658133 in core::ptr::drop_in_place<rendering::State> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#39 0x000055da056b1067 in rendering::main () at smithay-drm-extras/examples/rendering/main.rs:120
#40 0x000055da05654c8b in core::ops::function::FnOnce::call_once<fn() -> core::result::Result<(), alloc::boxed::Box<dyn core::error::Error, alloc::alloc::Global>>, ()> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:250
#41 0x000055da05634a4e in std::sys_common::backtrace::__rust_begin_short_backtrace<fn() -> core::result::Result<(), alloc::boxed::Box<dyn core::error::Error, alloc::alloc::Global>>, core::result::Result<(), alloc::boxed::Box<dyn core::error::Error, alloc::alloc::Global>>> (f=0x55da056b08e0 <rendering::main>) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:135
#42 0x000055da0571c1e1 in std::rt::lang_start::{closure#0}<core::result::Result<(), alloc::boxed::Box<dyn core::error::Error, alloc::alloc::Global>>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:166
#43 0x000055da05989b25 in core::ops::function::impls::{impl#2}::call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/core/src/ops/function.rs:284
#44 std::panicking::try::do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panicking.rs:500
#45 std::panicking::try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/std/src/panicking.rs:464
#46 std::panic::catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panic.rs:142
#47 std::rt::lang_start_internal::{closure#2} () at library/std/src/rt.rs:148
#48 std::panicking::try::do_call<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panicking.rs:500
#49 std::panicking::try<isize, std::rt::lang_start_internal::{closure_env#2}> () at library/std/src/panicking.rs:464
#50 std::panic::catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panic.rs:142
#51 std::rt::lang_start_internal () at library/std/src/rt.rs:148
#52 0x000055da0571c1ba in std::rt::lang_start<core::result::Result<(), alloc::boxed::Box<dyn core::error::Error, alloc::alloc::Global>>> (main=0x55da056b08e0 <rendering::main>, argc=1, argv=0x7ffdf6eb9018, sigpipe=0)
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:165
#53 0x000055da056b16ce in main ()
#54 0x00007fd67d954b4a in __libc_start_call_main () at /lib64/libc.so.6
#55 0x00007fd67d954c0b in __libc_start_main_impl () at /lib64/libc.so.6
#56 0x000055da05631ff5 in _start ()
@PolyMeilex
Copy link
Member Author

Full code that reproduces this can be found here: PolyMeilex@39c7066

Just run the smithay-drm-extras rendering example from this commit, and wait 5s for the process to auto exit, when exiting it will drop those types and cause segfault.

@Drakulix
Copy link
Member

Oh interesting. Given that this happens in gbm_bo_destroy, this looks like it is a gbm.rs issue.

@ids1024
Copy link
Member

ids1024 commented Aug 15, 2023

/** Destroy the gbm device and free all resources associated with it.
 *
 * Prior to calling this function all buffers and surfaces created with the
 * gbm device need to be destroyed.
 *
 * \param gbm The device created using gbm_create_device()   
 */
GBM_EXPORT void
gbm_device_destroy(struct gbm_device *gbm)

I guess to offer a safe API, gbm.rs needs to reference count the struct gbm_device.

@ids1024
Copy link
Member

ids1024 commented Aug 17, 2023

Oh, I see it's already using WeakPtr. Using a Ptr there instead should fix this. Either that or some more complicated mechanism that would free all buffers and surfaces when the device is freed. (Probably better to stick to a strong reference, since weak isn't needed to break a reference cycle here.)

@Drakulix
Copy link
Member

Having buffers keep devices open seems potentially very leaky to me.

Closing the device invalidates surfaces and buffers.
So what we need to do is only conditionally call the destructor, if the device ptr is still valid.

@ids1024
Copy link
Member

ids1024 commented Aug 17, 2023

The documentation for gbm_device_destroy says "Prior to calling this function all buffers and surfaces created with the gbm device need to be destroyed."

Taken literally, simply not running the buffer and surface destructors wouldn't be valid. Though I don't know how it's implemented. I'd expect that may at least leak resources.

@Drakulix
Copy link
Member

The documentation for gbm_device_destroy says "Prior to calling this function all buffers and surfaces created with the gbm device need to be destroyed."

Oh no. I wonder if we should then instead track the buffers and surfaces in the device struct (just by tracking the raw pointers and destroying still registered ones on drop). Then surfaces and buffers could still have Weak references to the device and use those to remove their respective pointers from the lists of surfaces and buffers on drop, if the weak reference is still valid.

@ids1024
Copy link
Member

ids1024 commented Aug 17, 2023

Yeah, storing Mutex<Vec<_>>s of the surfaces and buffers in the Device could also handle this. Fixing the issue while maintaining the current API and its (intended) behavior.

Personally my inclination for APIs like this is to have a strong reference. As long as there's no reference cycle nothing is leaked if the application doesn't leak anything, and the API is easier to use that way. It seems intuitive enough that the device is kept open as long as any buffers or surfaces using it exist.

But either would fix this.

@Drakulix
Copy link
Member

The problem for me is that unintentionally leaving the device open for gbm can cause higher resource usage, e.g. higher power states of the gpu, and block unloading kernel modules. Compositors wishing the free a device should be able to do so easily and buffers might be cached at multiple levels, so cleaning up every reference might be hard, while handling invalid buffers might be much easier.

@ids1024
Copy link
Member

ids1024 commented Aug 17, 2023

Ah, I wasn't thinking of that. Yeah, it could make sense for gbm, if it isn't easy to ensure the surfaces and buffers are freed when the compositor wants to free the device.

@Drakulix
Copy link
Member

Drakulix commented Nov 21, 2024

Given all these types are contained in a DrmCompositor now for most compositors (and certainly for cosmic-comp, which also implements the whole device power down infrastructure now), I'd say strong references are not as bad as I initially expected.

@ids1024 If you ever find the time to update Smithay/gbm.rs#25 and remove the Draft state, we can do a strong counted 0.17 release for gbm.rs. (And then carefully update smithay and cosmic-comp.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants