-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bluetooth: hci_driver: Fix deadlock in MPSL workq #18953
bluetooth: hci_driver: Fix deadlock in MPSL workq #18953
Conversation
The following west manifest projects have changed revision in this Pull Request:
✅ All manifest checks OK Note: This message is automatically posted and updated by the Manifest GitHub Action. |
CI InformationTo view the history of this post, clich the 'edited' button above Inputs:Sources:sdk-nrf: PR head: e434180985eba1d76383ab3ddfaacce90ebfc0f8 more detailssdk-nrf:
Github labels
List of changed files detected by CI (1)
Outputs:ToolchainVersion: 11349092be Test Spec & Results: ✅ Success; ❌ Failure; 🟠 Queued; 🟡 Progress; ◻️ Skipped;
|
a53bc48
to
6fea7f3
Compare
You can find the documentation preview for this PR at this link. It will be updated about 10 minutes after the documentation build succeeds. Note: This comment is automatically posted by the Documentation Publish GitHub Action. |
DNM until upstream PR is merged and downstreamed: zephyrproject-rtos/zephyr#81646 |
applications/nrf_desktop/Kconfig.ble
Outdated
@@ -15,7 +15,6 @@ config DESKTOP_BT | |||
select BT_SETTINGS | |||
select BT_SIGNING | |||
select BT_SMP | |||
imply BT_CONN_TX_NOTIFY_WQ if SOC_FLASH_NRF_RADIO_SYNC_MPSL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed for testing as the fix makes BT_CONN_TX_NOTIFY_WQ
feature unnecessary. It will be reverted once upstream PR is merged.
west.yml
Outdated
@@ -68,8 +70,9 @@ manifest: | |||
# https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/zephyr/introduction/index.html | |||
# https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/zephyr/guides/modules.html | |||
- name: zephyr | |||
remote: pavelvpv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added for testing with the upstream fix. Will be reverted.
Moved from the draft state as no more work is needed for the change and would like to get feedback on the change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this nice change, and also for taking the time to describe how and what problem it solves.
I believe we should add a changelog entry and update known issues. I suggest to do that in a separate PR.
6fea7f3
to
38d5fc9
Compare
0d5f9e9
to
f77cf88
Compare
This pull request has been marked as stale because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 7 days. Note, that you can always re-open a closed pull request at any time. |
f77cf88
to
df0447e
Compare
Memory footprint analysis revealed the following potential issuessample.matter.template.debug[nrf7002dk/nrf5340/cpuapp]: High ROM usage: 911938[B] - link (cc: @kkasperczyk-no @ArekBalysNordic @markaj-nordic) Note: This message is automatically posted and updated by the CI (latest/sdk-nrf/PR-18953/5) |
df0447e
to
56d0b96
Compare
56d0b96
to
552ef10
Compare
This commit fixes deadlock in MPSL workq caused by bt_buf_get_rx with K_FOREVER. Previously, calling bt_buf_get_rx with K_FOREVER could block indefinitely if the requested pool had no available buffers. This blocking affected the MPSL work queue, preventing it from processing other work items, including critical timeslot events like MPSL_TIMESLOT_SIGNAL_CANCELLED and MPSL_TIMESLOT_SIGNAL_BLOCKED. The issue becomes more severe if a background flash operation coincides with this scenario. Flash operations often execute on the system work queue (sysworkq), which can delay host work items responsible for freeing buffers in the RX pool. In such cases: - Flash operations stall, waiting for a timeslot event from the MPSL work queue. - The MPSL work queue remains blocked by bt_buf_get_rx. - This results in a deadlock, causing the flash operation to timeout and triggering a warning. This commit modifies the HCI driver to call bt_buf_get_rx with K_NO_WAIT. If no buffer is immediately available, it relies on a callback to notify when a buffer is freed. This change ensures the MPSL work queue remains unblocked, allowing other work items to execute while waiting for a buffer. Signed-off-by: Pavel Vasilyev <[email protected]>
After replacing K_FOREVER with K_NO_WAIT when requesting a buffer from the host pool, it became possible to call `hci_driver_receive_process` directly from the SDC callback instead of going through `receive_signal_raise` as `hci_driver_receive_process` is now non-blocking. Signed-off-by: Pavel Vasilyev <[email protected]>
552ef10
to
e434180
Compare
Approved. I'm not sure whether or not the second commit e434180 is a good idea or not, but I'm fine with it. I assume it's done for performance. It does raise the complexity slightly, as there are now multiple stack usage scenarios when invoking I assume the SDC callback is also on the MPSL work thread, so there is still only one thread that can call into |
Add a note about fixed issue where a flash operation executed on the system workqueue might result in `-ETIMEDOUT` if there is an active Bluetooth LE connection. Ref: nrfconnect#18953 Signed-off-by: Pavel Vasilyev <[email protected]>
Add a note about fixed issue where a flash operation executed on the system workqueue might result in `-ETIMEDOUT` if there is an active Bluetooth LE connection. Ref: nrfconnect#18953 Signed-off-by: Pavel Vasilyev <[email protected]>
Add a note about fixed issue where a flash operation executed on the system workqueue might result in `-ETIMEDOUT` if there is an active Bluetooth LE connection. Ref: #18953 Signed-off-by: Pavel Vasilyev <[email protected]>
This PR fixes deadlock in MPSL workq caused by bt_buf_get_rx with
K_FOREVER.
Previously, calling bt_buf_get_rx with K_FOREVER could block
indefinitely if the requested pool had no available buffers. This
blocking affected the MPSL work queue, preventing it from processing
other work items, including critical timeslot events like
MPSL_TIMESLOT_SIGNAL_CANCELLED and MPSL_TIMESLOT_SIGNAL_BLOCKED.
The issue becomes more severe if a background flash operation coincides
with this scenario. Flash operations often execute on the system work
queue (sysworkq), which can delay host work items responsible for
freeing buffers in the RX pool. In such cases:
work queue.
triggering a warning.
This PR modifies the HCI driver to call bt_buf_get_rx with
K_NO_WAIT. If no buffer is immediately available, it relies on a
callback to notify when a buffer is freed. This change ensures the MPSL
work queue remains unblocked, allowing other work items to execute while
waiting for a buffer.