Consider extending context timer concept to both job scheduling and podready #187

Mierdin · 2020-06-10T21:53:32Z

There are a few places in the antidote scheduler code where we need to wait for certain, parallel tasks to complete. In general this is done by waiting by spawning goroutines with a waitgroup, and blocking until some kind of exit criteria, like the waitgroup being emptied, or some kind of timeout indicating failure.

The pod network health checks use a context with timeout that seems to work pretty well, and matches what we're trying to do.

However, the basic pod status as well as the wait for job success uses a for loop with a counter.

This seems to work, as these two instances already seem to have reliable exit conditions and non-spammy spans, but it might be worth considering adopting a consistent approach across the board.
My main concern here is that with the for loop approach, any sleeps we inject in the loop between checks will compound, making it really hard to determine exactly how long a given portion of the code will wait for. In theory, a check which retries 600 times and pauses for a second in between tries will be roughly 600 seconds, but in practice it's usually significantly more than that. The context with timeout will allow us to reason in terms of actual time waiting, rather than number of retries, which I think makes more sense in the antidote config.

Tbh, this is really not a huge priority - the status quo works well, and the answer may be "no it's not worth changing", but I think it's a worthwhile exercise to at least consider the implications if anyone is curious to poke around the scheduler a bit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider extending context timer concept to both job scheduling and podready #187

Consider extending context timer concept to both job scheduling and podready #187

Mierdin commented Jun 10, 2020

Consider extending context timer concept to both job scheduling and podready #187

Consider extending context timer concept to both job scheduling and podready #187

Comments

Mierdin commented Jun 10, 2020