You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Running on an STM32H753VG with ethernet, there is a condition where the rx_q[0] thread and sys_work_q running rs_timeout can get into a mutex deadlock. The work queue acquires the iface mutex, and the rx_q acquires the iface TX mutex, and then they both try to acquire the other mutex. See the call stacks and iface provided for more details.
To Reproduce
Unfortunately, given the nature of the issue, I don't really have a way to replicate this easily. I do have core dumps of a failed system however, so I can provide any information needed at any time.
Expected behavior
No mutex deadlock?
Impact
This is a massive showstopper. This disables our primary functionality in a way that our WDT can't detect. And since it locks up the work queue as well, it breaks several other operations our device needs. We can still recover failed prototype devices in the field thanks to additional debug mechanisms, but we can't go into production with this bug in place.
It's possible that this issue has already been addressed in a newer version of Zephyr, but we can't afford the time right now to upgrade if we don't know for certain it will fix the problem.
Describe the bug
Running on an STM32H753VG with ethernet, there is a condition where the
rx_q[0]
thread andsys_work_q
runningrs_timeout
can get into a mutex deadlock. The work queue acquires the iface mutex, and the rx_q acquires the iface TX mutex, and then they both try to acquire the other mutex. See the call stacks and iface provided for more details.To Reproduce
Unfortunately, given the nature of the issue, I don't really have a way to replicate this easily. I do have core dumps of a failed system however, so I can provide any information needed at any time.
Expected behavior
No mutex deadlock?
Impact
This is a massive showstopper. This disables our primary functionality in a way that our WDT can't detect. And since it locks up the work queue as well, it breaks several other operations our device needs. We can still recover failed prototype devices in the field thanks to additional debug mechanisms, but we can't go into production with this bug in place.
Logs and console output
Environment
The text was updated successfully, but these errors were encountered: