Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fdcan: high number of CAN core resets w/ ACK error #2132

Closed
1 task
sshane opened this issue Jan 29, 2025 · 0 comments · Fixed by #2137
Closed
1 task

fdcan: high number of CAN core resets w/ ACK error #2132

sshane opened this issue Jan 29, 2025 · 0 comments · Fixed by #2137

Comments

@sshane
Copy link
Contributor

sshane commented Jan 29, 2025

The route in commaai/openpilot#33840 had its camera bus spam CAN core resets due to ACK errors and a high transmit error count on ignition off.

This PR #1502 switched the logic from only resetting once when an error counter reached 100 to resetting continually while the transmit_error_counter > 127. Once this happens, any future ACK errors no longer need this tolerance to reset the CAN core. This causes the interrupt load of the panda to hover around 90%, slowing down/hanging SPI communication with pandad.

Things that mitigate this:

  • removing this busywait delay significantly lowers the interrupt load and communication is kept:
    delay(10000);
  • not sure why, but keeping both CAN 1 and CAN 3 up also keep SPI communication happy (now we only keep one):
    current_board->enable_can_transceivers(enable);
  • Giving the CAN irqs priorities of 1 and SPI1 and SPI2 (setting SPI1_IRQn and SPI2_IRQn irqs to -1 that):
    NVIC_EnableIRQ(FDCAN1_IT0_IRQn);
    NVIC_EnableIRQ(FDCAN1_IT1_IRQn);
    } else if (FDCANx == FDCAN2) {
    NVIC_EnableIRQ(FDCAN2_IT0_IRQn);
    NVIC_EnableIRQ(FDCAN2_IT1_IRQn);
    } else if (FDCANx == FDCAN3) {
    NVIC_EnableIRQ(FDCAN3_IT0_IRQn);
    NVIC_EnableIRQ(FDCAN3_IT1_IRQn);
  • debounce CAN core resets
  • don't reset CAN cores when their transceivers are off due to ignition/power saving mode.
  • reset transit error counter

TODOs:

  • This should be common, figure out why this is rare. On another random Bronco route I see 2 CAN core resets on camera bus (flipped canState0) due to 100 busOffCnts. Also figure out how it didn't spam CAN core resets in this case. Figuring out these two questions should make it clearer which fix is appropriate.

Related PRs that touch this code:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant