Remove the kludge and fix the race between task and motion #3292
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses the problem that shows itself at startup with these two messages:
Commit 9a45ed9 added a kludge to re-copy the command in case of timeout in a specific command. However, it is a kludge and it turns out that it was always the first command that timed out. That clearly indicated a race/overwrite situation.
The problem exists between task and motion, usually milltask and motmod, but also motion-logger is affected. The race is an overwrite scenario with following sequence:
The timeout happens consistently because the milltask process starts before the motmod thread (though it is not guaranteed which process attaches first and writes the memory). The solution is obvious. The reader must not write to the command structure during initialization because there may already be a command in there.
This PR removes the kludge and removes the initialization of the command structure in the readers. The initial value of shared memory (when created) is always zero, so that works out fine if the reader attaches before the writer. The reader may set status and other structures to its liking, but the command structure is left alone.
Additionally, a micro-sleep is added in the split read functions (
usrmotReadEmcmotStatus()
,usrmotReadEmcmotConfig()
andusrmotReadEmcmotInternal()
). These were running a too tight loop with three retries. A tight loop is over in a matter of nanoseconds and will most likely timeout. Adding a minute sleep forces a reschedule and possible yield, giving the other side time to actually catch up. The micro sleep is no problem because the routines are only called from non-realtime.