Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the kludge and fix the race between task and motion #3292

Merged
merged 2 commits into from
Jan 21, 2025

Conversation

BsAtHome
Copy link
Contributor

This PR addresses the problem that shows itself at startup with these two messages:

 USRMOT: ERROR: command 30 timeout
 emcMotionInit: emcTrajInit failed

Commit 9a45ed9 added a kludge to re-copy the command in case of timeout in a specific command. However, it is a kludge and it turns out that it was always the first command that timed out. That clearly indicated a race/overwrite situation.

The problem exists between task and motion, usually milltask and motmod, but also motion-logger is affected. The race is an overwrite scenario with following sequence:

  1. the writer, milltask, copies a command into shared memory command structure
  2. the reader, motmod or motion-logger, attaches to shared memory and clears the shared memory overwriting the command structure
  3. the writer never sees its command completing.

The timeout happens consistently because the milltask process starts before the motmod thread (though it is not guaranteed which process attaches first and writes the memory). The solution is obvious. The reader must not write to the command structure during initialization because there may already be a command in there.

This PR removes the kludge and removes the initialization of the command structure in the readers. The initial value of shared memory (when created) is always zero, so that works out fine if the reader attaches before the writer. The reader may set status and other structures to its liking, but the command structure is left alone.

Additionally, a micro-sleep is added in the split read functions (usrmotReadEmcmotStatus(), usrmotReadEmcmotConfig() and usrmotReadEmcmotInternal()). These were running a too tight loop with three retries. A tight loop is over in a matter of nanoseconds and will most likely timeout. Adding a minute sleep forces a reschedule and possible yield, giving the other side time to actually catch up. The micro sleep is no problem because the routines are only called from non-realtime.

 USRMOT: ERROR: command 30 timeout
 emcMotionInit: emcTrajInit failed

The reader must not write to the command structure during initialization because there may already be a command in there.
@snowgoer540
Copy link
Contributor

Thank you so much for looking into this and fixing the root of the problem. I really appreciate it!

@snowgoer540 snowgoer540 merged commit 38b9547 into LinuxCNC:master Jan 21, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants