-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log backup: use global checkpoint ts as source of truth #58135
Conversation
Hi @3pointer. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #58135 +/- ##
================================================
+ Coverage 73.1841% 74.9132% +1.7291%
================================================
Files 1675 1693 +18
Lines 461917 466183 +4266
================================================
+ Hits 338050 349233 +11183
+ Misses 103127 95400 -7727
- Partials 20740 21550 +810
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@@ -548,8 +626,10 @@ func TestCheckPointLagged(t *testing.T) { | |||
}) | |||
adv.StartTaskListener(ctx) | |||
c.advanceClusterTimeBy(2 * time.Minute) | |||
// if global ts is not advanced, the checkpoint will not be lagged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? If there is a new task and the global checkpoint is never advanced, the task will never be paused even exceed the limit.
This implies that we should never pause a task that never advanced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually it should be if global ts is less than task.start-ts
which implies that could have some corner cases when start a new task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if globalTs <= c.task.StartTs {
// task is not started yet
return false, nil
}
Then maybe here should be < instead of <= ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if globalTs == c.task.StartTs
this will only happen when new task created.
Since it's not the common case after task running for some time. I think it's better to make this not pause by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes the task may be stuck from creating, say, the advancer doesn't work or one of TiKV didn't notice the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talk with @YuJuncen , we had a agreement to make the check include when globalTs == task.StartTs
. I changed it.
Additionally I found the unproper error return logic when add task. I also fixed it this PR, and fix the related test cases.
/retest |
@3pointer: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BornChanger, YuJuncen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
@3pointer: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
What problem does this PR solve?
Issue Number: close #58031
Problem Summary:
The previous lag calculation relied on c.lastCheckpoint.TS to compute the lag. However, this approach is unreliable, especially when ownership changes, as c.lastCheckpoint.TS is not guaranteed to increase steadily. This PR addresses the issue by introducing a global checkpoint timestamp that maintains a strictly non-decreasing state.
What changed and how does it work?
The lag calculation now utilizes a global checkpoint timestamp instead of c.lastCheckpoint.TS. This global timestamp ensures consistency and stability, as it always increases or stays the same, even during ownership transitions. This change guarantees a more robust and accurate lag measurement.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.