net: Critical Mutex Deadlock #86499

legoabram · 2025-02-28T18:27:31Z

Describe the bug
Running on an STM32H753VG with ethernet, there is a condition where the rx_q[0] thread and sys_work_q running rs_timeout can get into a mutex deadlock. The work queue acquires the iface mutex, and the rx_q acquires the iface TX mutex, and then they both try to acquire the other mutex. See the call stacks and iface provided for more details.

To Reproduce
Unfortunately, given the nature of the issue, I don't really have a way to replicate this easily. I do have core dumps of a failed system however, so I can provide any information needed at any time.

Expected behavior
No mutex deadlock?

Impact
This is a massive showstopper. This disables our primary functionality in a way that our WDT can't detect. And since it locks up the work queue as well, it breaks several other operations our device needs. We can still recover failed prototype devices in the field thanks to additional debug mechanisms, but we can't go into production with this bug in place.

Logs and console output

Environment

OS: Windows 11
Toolchain Zephyr SDK 0.17
Zephyr: 064fcfc

The text was updated successfully, but these errors were encountered:

legoabram · 2025-02-28T18:28:58Z

It's possible that this issue has already been addressed in a newer version of Zephyr, but we can't afford the time right now to upgrade if we don't know for certain it will fix the problem.

legoabram added the bug The issue is a bug, or the PR is fixing a bug label Feb 28, 2025

JarmouniA added platform: STM32 ST Micro STM32 area: Ethernet labels Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net: Critical Mutex Deadlock #86499

net: Critical Mutex Deadlock #86499

legoabram commented Feb 28, 2025

legoabram commented Feb 28, 2025

net: Critical Mutex Deadlock #86499

net: Critical Mutex Deadlock #86499

Comments

legoabram commented Feb 28, 2025

legoabram commented Feb 28, 2025