You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
With latest Zephyr Downstream release, we found one critical issue that BLE didn’t work after erasing board then flash BT SHELL project, it can be reproduced with any RW612 board once erasing the board before test.
What target platform are you using?
NXP RW610/RW612
What have you tried to diagnose or workaround this issue?
After debug further, we found issue was caused by sysworkq which excuted the bt_ready() function got blocked during the ECDH operations through long work queue.
Details:
sysworkq task will call bt_init() -> bt_smp_int()->bt_pub_key_gen()->bt_long_wq task to generate BT public key, but during generating the random data through long work queue in z_impl_sys_csrand_get(), bt_ready() was also called by sysworkq to settings_load() ---->generate random data via same interface, although the task priority of sysworkq was higher than bt_long_wq task, bt_long_wq still completes the operation in z_impl_sys_csrand_get() due to mutex protection, sysworkq was in blocked state during this period. After bt_long_wq task done and released Mutex, it should be sysworkq turn to acquire the ctr_lock mutex per our understanding, But issue was the sysworkq task just blocked there forever and never run into the z_impl_sys_csrand_get() again to complete the bt_ready() call trace.
Attached few pictures for your reference, you can see after issue hit, the call stack suddenly changed to main task which is unexpected.
Is this a regression? If yes, have you been able to "git bisect" it to a
specific commit?
yes, it's regression. Previous our v4.1 release at Jan/20 doesn’t have this issue as there is no ECDH enabled nor longwq called from bt_init().
To Reproduce
Steps to reproduce the behavior:
west build -p always -b rd_rw612_bga zephyr/tests/bluetooth/shell -d ble_build/bt_shell
flash zephyr.elf to RW612 board
power on board
bt init, bt scan on
see error
Expected behavior
bt init and bt scan on should all success
Impact
BLE can't be used on any board which erasing flash before using
Logs and console output
*** Booting Zephyr OS build nxp-v4.0.0-13647-g0c6e38f51f3c ***
Type "help" for supported commands.Before any Bluetooth commands you must bt init to initialize the stack.
uart:$ bt init
Bluetooth initialized
[00:00:02.652,169] fs_nvs: 8 Sectors of 4096 bytes
[00:00:02.652,185] fs_nvs: alloc wra: 0, fd0
[00:00:02.652,189] fs_nvs: data wra: 0, 13
[00:00:02.905,481] bt_hci_core: No ID address. App must call settings_load()
uart:$
uart:$
uart:$ bt scan on
Bluetooth set active scan failed (err -11)
uart:~$
Environment (please complete the following information):
Additional context
We're not able to locate the root cause yet why sysworkq task was blocked during bt init, but refer to this commit : c7f3ad6
We switched to use the system workq to perform any ECDH operations other than via long workq, so we add CONFIG_BT_LONG_WQ=n to prj.conf and compile bt_Shell app, issue was gone, BLE can work normally after every erasing board. It's just the workaround we made to unblock the BT SHELL using.
Could you please help to fix this issue from root cause? Thanks!
The text was updated successfully, but these errors were encountered:
Correct Mutex ctr_lock defination as the wrong defination lead to
sysworkq task not acquiring this mutex during bt init, which lead to
BLE didn't work as described in issue zephyrproject-rtos#86444
Signed-off-by: Ying Zhang <[email protected]>
Describe the bug
With latest Zephyr Downstream release, we found one critical issue that BLE didn’t work after erasing board then flash BT SHELL project, it can be reproduced with any RW612 board once erasing the board before test.
NXP RW610/RW612
After debug further, we found issue was caused by sysworkq which excuted the bt_ready() function got blocked during the ECDH operations through long work queue.
Details:
sysworkq task will call bt_init() -> bt_smp_int()->bt_pub_key_gen()->bt_long_wq task to generate BT public key, but during generating the random data through long work queue in z_impl_sys_csrand_get(), bt_ready() was also called by sysworkq to settings_load() ---->generate random data via same interface, although the task priority of sysworkq was higher than bt_long_wq task, bt_long_wq still completes the operation in z_impl_sys_csrand_get() due to mutex protection, sysworkq was in blocked state during this period. After bt_long_wq task done and released Mutex, it should be sysworkq turn to acquire the ctr_lock mutex per our understanding, But issue was the sysworkq task just blocked there forever and never run into the z_impl_sys_csrand_get() again to complete the bt_ready() call trace.
Attached few pictures for your reference, you can see after issue hit, the call stack suddenly changed to main task which is unexpected.
specific commit?
To Reproduce
Steps to reproduce the behavior:
Expected behavior
bt init and bt scan on should all success
Impact
BLE can't be used on any board which erasing flash before using
Logs and console output
*** Booting Zephyr OS build nxp-v4.0.0-13647-g0c6e38f51f3c ***
Type "help" for supported commands.Before any Bluetooth commands you must
bt init
to initialize the stack.uart:
$ bt init$Bluetooth initialized
[00:00:02.652,169] fs_nvs: 8 Sectors of 4096 bytes
[00:00:02.652,185] fs_nvs: alloc wra: 0, fd0
[00:00:02.652,189] fs_nvs: data wra: 0, 13
[00:00:02.905,481] bt_hci_core: No ID address. App must call settings_load()
uart:
uart:
$$ bt scan onuart:
Bluetooth set active scan failed (err -11)
uart:~$
Environment (please complete the following information):
Additional context
We're not able to locate the root cause yet why sysworkq task was blocked during bt init, but refer to this commit :
c7f3ad6
We switched to use the system workq to perform any ECDH operations other than via long workq, so we add CONFIG_BT_LONG_WQ=n to prj.conf and compile bt_Shell app, issue was gone, BLE can work normally after every erasing board. It's just the workaround we made to unblock the BT SHELL using.
Could you please help to fix this issue from root cause? Thanks!
The text was updated successfully, but these errors were encountered: