Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix various problems with rollout dashboard's cache and task processing.
As per https://dfinity.atlassian.net/browse/DRE-303 and https://dfinity.atlassian.net/browse/DRE-304 , in certain cases where the administrator has intervened in the rollout (via clearing tasks for retry, or marking them as failed, or marking them as successful), the cache gets out of synchronization with the actual Airflow state, because these task changes do not update the task dates (which is what the cache relies on, to prevent transferring tens of megabytes every 5 seconds). This implements the use of an incremental log parser that fishes out tasks updated in the last update window. With that, we can deduce when a rollout needs full task update or just certain tasks. Alas, some tasks do not log updates even when they execute (chiefly the tasks implemented by Airflow logic and null operators), so we still must hit Airflow with an incremental "after this date" query (actually, 3) to obtain that information. We may be able to optimize this to three queries for N rollouts rather than three * N, but that is not in the cards for this PR. This has been verified manually through replication of the error conditions in a local Airflow instance.
- Loading branch information