Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(radio): async call handling to prevent GX12 switch lockups #5949

Merged
merged 7 commits into from
Mar 5, 2025

Conversation

3djc
Copy link
Collaborator

@3djc 3djc commented Feb 26, 2025

First, I would like to thanks Radiomaster devs for their investigation and contribution to bring to surface this issue. Great work guys.

Bottom line is that we have an issue with async call handling based on xTimerPendFunctionCallFromISR(). We need some time to understand why we have that issue.

In the meantime, this workaround allows the switch to be read even if the call to xTimerPendFunctionCallFromISR() was not successful.

Another part of EdgeTX relying heavily on xTimerPendFunctionCallFromISR() is telemetry handling, but the protocol is handling it so this is not critical, so the workaround is not applied there.

@3djc 3djc added bug 🪲 Something isn't working blocks-release Releases shouldn't go out without this fix firmware General radio firmware issue, not colorlcd or B&W specific labels Feb 26, 2025
@3djc 3djc requested a review from raphaelcoeffic February 26, 2025 11:03
@pfeerick pfeerick added the partner Something specifically requested by a sponsor/partner label Feb 26, 2025
@3djc
Copy link
Collaborator Author

3djc commented Feb 26, 2025

nb: the underlying issue is not RM radio related, we have failure xTimerPendFunctionCallFromISR() in all radio, but in place where it is less obvious to see, like per10ms or telemetry

@3djc
Copy link
Collaborator Author

3djc commented Mar 2, 2025

Add anti stacking protection where needed

@3djc
Copy link
Collaborator Author

3djc commented Mar 2, 2025

Spent some time this morning working with Raph on this issue. It turns out, as we hinted earlier, down to some priority issues, that unfortunately wont have any easy fix. We will introduce some improvements as part of 3.0.

However, we discovered a point: the amount of files in directory does critically matter. In any fopen or fcreate operation, the system has to parse some or all of the fat tables, which on this architecture, takes a LOT of time. It is very important for system integrity to keep the number of files in directory to as low as possible, preferably less than a hundred. It is a real issue for SDlogs function which as been changed since Otx days and now generate a file per invocation, leading to huge number of files in LOGS directory and very severely impacting performance. Also, it should be noted that audio packs that number of tracks does also have a negative impact

@philmoz
Copy link
Collaborator

philmoz commented Mar 2, 2025

On another project I worked on formatting the SD as exFat gave a big performance boost. There is an option to enable exFat in fatfs. I wonder if this might help.

@3djc
Copy link
Collaborator Author

3djc commented Mar 2, 2025

Worth a try at the very least, we also need to understand the memory impact, but yes, we should investigate

@gagarinlg
Copy link
Member

gagarinlg commented Mar 2, 2025

exFat is patented by Microsoft. Only the Linux Kernel is licensed for free. Every other use of exFat needs a payed MS license to be legal.

@3djc
Copy link
Collaborator Author

3djc commented Mar 2, 2025

Ok, I guess that rules it out then !

@pfeerick
Copy link
Member

pfeerick commented Mar 2, 2025 via email

raphaelcoeffic
raphaelcoeffic previously approved these changes Mar 4, 2025
Copy link
Member

@raphaelcoeffic raphaelcoeffic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@3djc
Copy link
Collaborator Author

3djc commented Mar 4, 2025

So to summarize:

This is for 2.11 and main.

We keep ensuring ISR are kept extra short, but we are also ensuring the queue does not get filled by duplicated calls. In case long SD event occur, the BSP (and telem and per10ms) call will be certain to have a stacking slot, and those operation will be done when SD unlock.

There will be 2 follow up PR (but only for 3.0) to move sdlogs to UI tack and review per10ms launch

@pfeerick pfeerick changed the title workaround: async call handling to prevent GX12 switch lockups refactor(radio): async call handling to prevent GX12 switch lockups Mar 5, 2025
@pfeerick pfeerick merged commit 466b06c into main Mar 5, 2025
51 checks passed
@pfeerick pfeerick deleted the 3djc/async-workaround branch March 5, 2025 01:08
@3djc
Copy link
Collaborator Author

3djc commented Mar 5, 2025

As a follow up, RM has conducted lab testing on this and found the expander switch issue to be completely fixed with it. They will conduct a larger test on production lines, but feel confident the issues is gone

@Johanl1964
Copy link

Does this PR also apply in general to transmitters with internal memory? I have problems with T-Pro v2 and Bumblebbe that lock up, and the cause seems to be high memory usage when doing "a lot" of telemetry... Lockups fortunately do not affect the RC signal, so you do not lose contact with the model, but nothing else works, and you have to remove the battery to turn off and reset the transmitter!

@3djc
Copy link
Collaborator Author

3djc commented Mar 5, 2025

Does this PR also apply in general to transmitters with internal memory? I have problems with T-Pro v2 and Bumblebbe that lock up, and the cause seems to be high memory usage when doing "a lot" of telemetry... Lockups fortunately do not affect the RC signal, so you do not lose contact with the model, but nothing else works, and you have to remove the battery to turn off and reset the transmitter!

No. And doing a lot of telemetry doesn't consume memory. Lua does however.

@Johanl1964
Copy link

Does this PR also apply in general to transmitters with internal memory? I have problems with T-Pro v2 and Bumblebbe that lock up, and the cause seems to be high memory usage when doing "a lot" of telemetry... Lockups fortunately do not affect the RC signal, so you do not lose contact with the model, but nothing else works, and you have to remove the battery to turn off and reset the transmitter!

No. And doing a lot of telemetry doesn't consume memory. Lua does however.

Yes, when logging telemetry! I fly RC gliders and log vario/altitude, GPS and all ELRS telemetry with 0.3 sec intervals so as not to miss anything! I apply this to all my transmitters, and have problems only with those with internal memory! So is there a problem with this type of memory!?

@3djc
Copy link
Collaborator Author

3djc commented Mar 5, 2025

Memory is not used when doing that, SD is (could be internal SD chip or SDcard depending on radio)

@mha1
Copy link
Contributor

mha1 commented Mar 5, 2025

Does this PR also apply in general to transmitters with internal memory? I have problems with T-Pro v2 and Bumblebbe that lock up, and the cause seems to be high memory usage when doing "a lot" of telemetry... Lockups fortunately do not affect the RC signal, so you do not lose contact with the model, but nothing else works, and you have to remove the battery to turn off and reset the transmitter!

No. And doing a lot of telemetry doesn't consume memory. Lua does however.

Yes, when logging telemetry! I fly RC gliders and log vario/altitude, GPS and all ELRS telemetry with 0.3 sec intervals so as not to miss anything! I apply this to all my transmitters, and have problems only with those with internal memory! So is there a problem with this type of memory!?

I'm flying gliders with a T15/ELRS (no SD card, internal memory) logging ELRS, GPS, Vario and ESC telemetry at 0.1s log period for nearly a year now. No lock ups, in fact no issues at all). Are you running any Lua scripts (lua widgets, function or mixer scripts)?

@Johanl1964
Copy link

Does this PR also apply in general to transmitters with internal memory? I have problems with T-Pro v2 and Bumblebbe that lock up, and the cause seems to be high memory usage when doing "a lot" of telemetry... Lockups fortunately do not affect the RC signal, so you do not lose contact with the model, but nothing else works, and you have to remove the battery to turn off and reset the transmitter!

No. And doing a lot of telemetry doesn't consume memory. Lua does however.

Yes, when logging telemetry! I fly RC gliders and log vario/altitude, GPS and all ELRS telemetry with 0.3 sec intervals so as not to miss anything! I apply this to all my transmitters, and have problems only with those with internal memory! So is there a problem with this type of memory!?

I'm flying gliders with a T15/ELRS (no SD card, internal memory) logging ELRS, GPS, Vario and ESC telemetry at 0.1s log period for nearly a year now. No lock ups, in fact no issues at all). Are you running any Lua scripts (lua widgets, function or mixer scripts)?

The only "active" LUAs used are ELRS and GPSx9L, as I wrote, I have no problems with transmitters with SD cards, only those with internal memory.

@3djc
Copy link
Collaborator Author

3djc commented Mar 6, 2025

Not sure what is GPSx9L lua, but the one I found seems to have some kind of issues https://github.com/moschotto/OpenTX_GPS_Telemetry/blob/7282328f2fdf0f37f782cf187ad0725a0a162cfb/x7_x9lite/GPSx9L.lua#L87 trying to open and close file every 100ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocks-release Releases shouldn't go out without this fix bug 🪲 Something isn't working firmware General radio firmware issue, not colorlcd or B&W specific partner Something specifically requested by a sponsor/partner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants