Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "linux: workaround to avoid deadlock inside dl_iterate_phdr in glibc" #57059

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ararslan
Copy link
Member

@ararslan ararslan commented Jan 15, 2025

Reverts #57035, fixes #57058, reopens #57017.

@ararslan ararslan added the bugfix This change fixes an existing bug label Jan 15, 2025
@vtjnash
Copy link
Member

vtjnash commented Jan 15, 2025

It looks like FreeBSD is already aware of this bug in their libc: https://reviews.freebsd.org/D47350, so I don't think we should revert the fix just because it triggers a known bug in FreeBSD.

@vtjnash
Copy link
Member

vtjnash commented Jan 15, 2025

Is the _rtld_atfork_pre function accessible from the freebsd libc? That would be an alternative way to accomplish this (more similar to our Apple code also), although the libc authors might have tried to hide that symbol making this difficult to make correct.

@ararslan
Copy link
Member Author

I don't think we should revert the fix just because it triggers a known bug in FreeBSD.

The changes should be applied only to Linux then, otherwise we're knowingly introducing a regression.

Is the _rtld_atfork_pre function accessible from the freebsd libc?

It is! At least I think it is... Does this suffice as a check?:

julia> libc = dlopen("/lib/libc.so.7");

julia> dlsym(libc, :_rtld_atfork_pre)
Ptr{Nothing}(0x0000000082f76b28)

{
callback_t callback = {f, ctx};
dl_iterate_phdr(with_dl_iterate_phdr_lock, &callback);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On FreeBSD, can you just replace this call with int rtld_locks[MAX_RTLD_LOCKS]; jl_lock_profile(); _rtld_atfork_pre(rtld_locks); f(ctx); _rtld_atfork_post(rtld_locks); jl_unlock_profile();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like even though _rtld_atfork_pre is accessible from libc, there's no corresponding header that's included in a standard system installation that has a prototype for it nor a definition for MAX_RTLD_LOCKS. 😔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's here, if you have this rtld_lock.h header, or you can just copy-paste the relevant lines from it:
https://github.com/lattera/freebsd/blob/401a161083850a9a4ce916f37520c084cff1543b/libexec/rtld-elf/rtld_lock.h#L48

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, what I mean is that it's in the system source files (as in your link) but those aren't included in a standard installation, so I don't have the definitions available from an existing file on my machine. I'm compiling now with this change:

diff --git a/src/signals-unix.c b/src/signals-unix.c
index 91d3378068..e1f1d70a80 100644
--- a/src/signals-unix.c
+++ b/src/signals-unix.c
@@ -41,6 +41,12 @@
 #include <sys/event.h>
 #endif

+#if defined(_OS_FREEBSD_)
+#define MAX_RTLS_LOCKS 8
+extern void _rtld_atfork_pre(int*);
+extern void _rtld_atfork_post(int*);
+#endif
+
 // 8M signal stack, same as default stack size (though we barely use this)
 static const size_t sig_stack_size = 8 * 1024 * 1024;

@@ -325,8 +331,17 @@ static int with_dl_iterate_phdr_lock(struct dl_phdr_info *info, size_t size, voi

 void jl_with_stackwalk_lock(void (*f)(void*), void *ctx)
 {
+#if defined(_OS_FREEBSD_)
+    int rtld_locks[MAX_RTLD_LOCKS];
+    jl_lock_profile();
+    _rtld_atfork_pre(rtld_locks);
+    f(ctx);
+    _rtld_atfork_post(rtld_locks);
+    jl_unlock_profile();
+#else
     callback_t callback = {f, ctx};
     dl_iterate_phdr(with_dl_iterate_phdr_lock, &callback);
+#endif
 }

 #if defined(_OS_LINUX_) && (defined(_CPU_X86_64_) || defined(_CPU_X86_))

Any idea what tests need to be run to trigger this specific deadlock or should I just run all of them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that didn't fix it, I still hit a deadlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

freebsd: test x86_64-unknown-freebsd always timeout
2 participants