Rework local cache to address "not yet computed a rollout plan" issue. #48
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The rollout dashboard keeps a local cache of all tasks it has seen, because
retrieving all task instances from Airflow is expensive. We do this
during the first loop, but in subsequent runs, we retrieve tasks that have
updated or modified after the last check's timestamp. To this, we also add
tasks that have started after the last check.
In addition to that, we now linearize the tasks in case that any of the
retrieved task lists contains the same task, but may have been updated
between requests. Linearization involves picking, for each task instance
the object with the latest date (be it execution, start or end date).
In addition to that, if the rollout plan is somehow retrieved but marked
empty when the schedule task is complete, we re-retrieve it again. This
prevents the odd error where the task has completed but the XCom associated
with the task (containing the plan) is not yet saved to the database (or
at least it looks that way, because we're racing to get the value right
after the task finished, but the value is not yet inserted stably into
the database).
Finally, this PR parallelizes the multiple requests that take place when
the task retrieval is performed. This reduces incremental update time to
roughly half of what it used to be.