-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rebaser): change quiescent shutdown to reduce missed activity #4707
base: main
Are you sure you want to change the base?
Conversation
/try [using pr as a dummy for CI testing changes] |
747ce03
to
1b6b50e
Compare
Dependency Review✅ No vulnerabilities or OpenSSF Scorecard issues found.OpenSSF Scorecard
Scanned Files |
/try |
Okay, starting a try! I'll update this comment once it's running...\n |
This change alters the logic that helps a change set "process" task to shut down when no Rebaser requests have been seen over our `quiescent_period`. Prior to this change there was a shutdown window period where the `ChangeSetProcessorTask` would not be looking for new Rebaser requests to process while waiting for the `SerialDvuTask` to end. As a hedge against this scenario the process task handler checks the change set subject just before ending to ensure that if there's at least one request message that we don't ack/delete the task. In this altered version of a quiescent shutdown we notice the quiet period as before in the Rebaser requests subscription stream. However, now a `quiesced_notify` `tokio::sync::Notify` is fired to signal the `SerialDvuTask`. Then the `ChangeSetProcessorTask` continues to process any further requests that may show up (remember that after running a "dvu" job, another Rebaser request is often submitted). Meanwhile in the `SerialDvuTask`, it will continue to run "dvu" jobs as long as the `run_dvu_notify` has been set (in effect "draining" any pending runs), and only then will check to see if the `quiesced_notify` has been set. If it has, then it will cancel the `quiesced_token` which cause `SerialDvuTask` to return with an `Ok(Shutdown::Quiesced)` and that same `CancellationToken` will cause the Naxum app in `ChangeSetProcessorTask` to be gracefully shut down. With these changes, the one or two remaining "dvu" jobs will not cause the process task to stop processing further Rebaser requests. For example, let's assuming that the last 2 "dvu" jobs take 8 minutes each. That means that the process task is in a quiescent shutdown for up to the next 8 * 2 = 16 minutes, during which time any further Rebaser requests will also be processed (whereas they may not have been prior to this change). Signed-off-by: Fletcher Nichol <[email protected]> Uncommited changes
1b6b50e
to
e815c31
Compare
Local testing so far includes:
|
e815c31
to
ad53551
Compare
/try |
Okay, starting a try! I'll update this comment once it's running...\n |
I've tested this locally with 2 rebasers and my extremely naughty "sleeper" asset that sleeps for 15 minutes on a qualification. Basically I was unable to make the whole changeset hang anymore and I was able to continuously build out, having attribute setters and create components land in SDF as you would expect while the DVU was hung. The changeset itself was still stuck behind my waiter (due to the single DVU job per changeset) but after the waiter period expired the changeset resumed normal operation and caught itself up correctly. I attempted to delete a rebaser forcefully while it was processing work and after ~20s the rebasers would correctly pick the work back up where it was left off to continue the value propagation etc. This seems a lot better than what we currently have in Production. |
/try |
Okay, starting a try! I'll update this comment once it's running...\n |
This change alters the logic that helps a change set "process" task to shut down when no Rebaser requests have been seen over our
quiescent_period
. Prior to this change there was a shutdown window period where theChangeSetProcessorTask
would not be looking for new Rebaser requests to process while waiting for theSerialDvuTask
to end. As a hedge against this scenario the process task handler checks the change set subject just before ending to ensure that if there's at least one request message that we don't ack/delete the task.In this altered version of a quiescent shutdown we notice the quiet period as before in the Rebaser requests subscription stream. However, now a
quiesced_notify
tokio::sync::Notify
is fired to signal theSerialDvuTask
. Then theChangeSetProcessorTask
continues to process any further requests that may show up (remember that after running a "dvu" job, another Rebaser request is often submitted). Meanwhile in theSerialDvuTask
, it will continue to run "dvu" jobs as long as therun_dvu_notify
has been set (in effect "draining" any pending runs), and only then will check to see if thequiesced_notify
has been set. If it has, then it will cancel thequiesced_token
which causeSerialDvuTask
to return with anOk(Shutdown::Quiesced)
and that sameCancellationToken
will cause the Naxum app inChangeSetProcessorTask
to be gracefully shut down.With these changes, the one or two remaining "dvu" jobs will not cause the process task to stop processing further Rebaser requests. For example, let's assuming that the last 2 "dvu" jobs take 8 minutes each. That means that the process task is in a quiescent shutdown for up to the next 8 * 2 = 16 minutes, during which time any further Rebaser requests will also be processed (whereas they may not have been prior to this change).