[Scheduled Actions V2] Backfiller component #7336

lina-temporal · 2025-02-13T03:13:42Z

What changed?

Added the HSM Scheduler's Backfiller component, responsible for buffering manual actions.

Key differences between workflow scheduler and HSM backfiller logic:

A Backfiller component is 1:1 for each Backfill request, spawned on request and deleted on completion.
Backfillers each generate their own unique IDs at spawn, used as part of the generated request IDs for deduplication.
When the Invoker's buffer is full, the Backfiller will exponentially back off and retry filling, giving the Invoker a chance to catch up.

How did you test it?

New tests, including those called out in [Scheduled Actions V2] Invoker logic #7152
go test -v

Potential risks

Not in production yet
Missing subtle behavior from the previous backfill mechanism - although I believe the test cases are well represented.

bergundy

Overall this looks great. Didn't have a lot of comments here.

bergundy · 2025-02-24T14:47:39Z

components/scheduler/backfiller.go

+	// The Backfiller sub state machine is responsible for buffering manually
+	// requested actions. Each backfill request has its own Backfiller node.
+	Backfiller struct {
+		*schedulespb.BackfillerInternal


nit: you don't need to suffix with with Internal IMHO, all of the APIs defined in server/api are considered internal.

Suggested change

*schedulespb.BackfillerInternal

*schedulespb.BackfillerInternal

bergundy · 2025-02-24T14:49:20Z

components/scheduler/backfiller.go

+func (b Backfiller) SetState(_ BackfillerMachineState) {}
+
+func (b Backfiller) RegenerateTasks(node *hsm.Node) ([]hsm.Task, error) {
+	return nil, nil


Looks like the implementation is missing.

bergundy · 2025-02-24T15:00:31Z

components/scheduler/backfiller_executors.go

+	})
+}
+
+func (backfillerTaskExecutor) enqueue(schedulerNode *hsm.Node, starts []*schedulespb.BufferedStart) error {


This seems like something that can be shared with the root scheduler's implementation.

bergundy · 2025-02-26T00:40:44Z

components/scheduler/backfiller_executors.go

+	now := env.Now()
+	nowpb := timestamppb.New(now)


I wonder if you're going to want to be more deterministic than this and take the time from the trigger request.
It may come in handy when resolving conflicts when two clusters become active.

bergundy · 2025-02-26T00:42:45Z

components/scheduler/backfiller_executors.go

+	err = backfillerNode.Parent.Walk(func(node *hsm.Node) error {
+		if node.Key.Type == BackfillerMachineType {


You can use the hsm.Collection abstraction here:

temporal/service/history/hsm/tree.go

Line 653 in 9c7d332

func NewCollection[T any](node *Node, stateMachineType string) Collection[T] {

bergundy · 2025-02-26T00:44:17Z

components/scheduler/backfiller_executors.go

+		return
+	}
+
+	backfillerCount = max(1, backfillerCount)


Want to add a comment that this is to prevent division by zero?

bergundy · 2025-02-26T01:35:05Z

components/scheduler/backfiller_tasks.go

+}
+
+func (BackfillTask) Validate(_ *persistencespb.StateMachineRef, node *hsm.Node) error {
+	// Backfiller only has a single task/state, so no validation is done here.


You'll need a way to validate the task (probably by checking if there's an expected action for the given deadline). Otherwise, a standby cluster won't be able to discard these tasks when they've been executed in the active cluster.

You can reach out to @yycptt for guidance in case I can't do a followup for this review.

bergundy · 2025-02-26T01:36:54Z

components/scheduler/backfiller_tasks.go

+func (b Backfiller) tasks() ([]hsm.Task, error) {
+	return []hsm.Task{BackfillTask{deadline: b.NextInvocationTime.AsTime()}}, nil
+}
+
+func (b Backfiller) output() (hsm.TransitionOutput, error) {
+	tasks, err := b.tasks()
+	if err != nil {
+		return hsm.TransitionOutput{}, err
+	}
+	return hsm.TransitionOutput{Tasks: tasks}, nil
+}


I was looking for this in backfiller.go, got used to seeing task generation next to the state machine transition definitions.

bergundy · 2025-02-26T01:49:23Z

proto/internal/buf.yaml

@@ -12,6 +12,7 @@ breaking:
  # Uncomment this to temporarily ignore specific files or directories:
  ignore:
    - temporal/server/api/historyservice/v1/
+    - temporal/server/api/schedule/v1/message.proto


Put a comment to remove this exception once this PR is merged?
Also we should remove the exception for historyservice.

[Scheduled Actions V2] Backfiller component

6bf63be

lina-temporal requested review from bergundy and dnr February 13, 2025 03:13

lina-temporal requested a review from a team as a code owner February 13, 2025 03:13

ignore sched in buf-breaking for this change

00511b0

bergundy reviewed Feb 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Scheduled Actions V2] Backfiller component #7336

[Scheduled Actions V2] Backfiller component #7336

lina-temporal commented Feb 13, 2025

bergundy left a comment

bergundy Feb 24, 2025

bergundy Feb 24, 2025

bergundy Feb 24, 2025

bergundy Feb 26, 2025

bergundy Feb 26, 2025

bergundy Feb 26, 2025

bergundy Feb 26, 2025

bergundy Feb 26, 2025

bergundy Feb 26, 2025

bergundy Feb 26, 2025

	*schedulespb.BackfillerInternal
	*schedulespb.BackfillerInternal

		err = backfillerNode.Parent.Walk(func(node *hsm.Node) error {
		if node.Key.Type == BackfillerMachineType {

[Scheduled Actions V2] Backfiller component #7336

Are you sure you want to change the base?

[Scheduled Actions V2] Backfiller component #7336

Conversation

lina-temporal commented Feb 13, 2025

What changed?

How did you test it?

Potential risks

bergundy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment