vecindex: add pacer for vector index operations #137734

andy-kimball · 2024-12-19T00:55:58Z

The pacer will limit the rate of foreground insert and delete operations in the vector index such that background split and merge operations can keep up. It does this by setting the allowed ops/sec and then delaying operations that would otherwise exceed this limit.

This PR introduces the pacer, but does not change the vector index to use it. That will come in a later PR. However, this PR does simulate usage of the pacer under different interesting conditions, including edge cases.

Epic: CRDB-42943

Release note: None

blathers-crl · 2024-12-19T00:56:01Z

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2024-12-19T00:56:06Z

This change is

DrewKimball

Reviewed 7 of 12 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @mw5h)

pkg/sql/vecindex/pacer.go line 19 at r1 (raw file):

// size is below this threshold, throttling will still occur if the queue size
// is increasing.
const allowedQueuedFixups = 5

This number seems low to me. I'm assuming it's just a placeholder value for testing, is that right?

Also, should these numbers eventually be proportional to GOMAXPROCS?

pkg/sql/vecindex/pacer.go line 134 at r1 (raw file):

	p.mu.totalOpCount++

	// Compute elapsed time since the last query.

Do we really need to do all this work on every insert/delete? Could we short-circuit in most cases after increasing the op count? For example, if the elapsed time is less than 1 second (or some other interval), we could return immediately without updating p.mu.lastOpAt. Maybe we'd also need to compare queuedFixups against lastQueuedFixups or something, but same idea.

It also seems like the calculations below are very vulnerable to floating-point errors when the duration is short. While clamping values might prevent the pacer from making crazy decisions in that situation, we probably want to avoid getting there in the first place as much as possible.

I'm also suspicious overall of making all this logic happen on every insert or delete. Do we really need to make pacing decisions here? What if we did this instead:

Have the pacer run token adjustment logic every second (or a similar small interval).
Have OnInsertOrDelete record just the needed statistics, totalOpCount, and maybe an opCount for the current second.
As before, have OnInsertOrDelete subtract one token, and delay as needed.

pkg/sql/vecindex/pacer.go line 205 at r1 (raw file):

		return 0
	}
	return time.Duration(float64(time.Second) * -p.mu.currentTokens / p.mu.allowedOpsPerSec)

Should we enforce bounds on the delay?

andy-kimball

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mw5h)

pkg/sql/vecindex/pacer.go line 19 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

This number seems low to me. I'm assuming it's just a placeholder value for testing, is that right?

Also, should these numbers eventually be proportional to GOMAXPROCS?

I'd like to make targetQueuedFixups proportional to actual ops/sec, as that's what drives fixups. However, I want to wait until I have the "real" testing setup so I can tune these values. I also plan to add some cluster settings so we can change these on the fly.

pkg/sql/vecindex/pacer.go line 134 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Do we really need to do all this work on every insert/delete? Could we short-circuit in most cases after increasing the op count? For example, if the elapsed time is less than 1 second (or some other interval), we could return immediately without updating p.mu.lastOpAt. Maybe we'd also need to compare queuedFixups against lastQueuedFixups or something, but same idea.

It also seems like the calculations below are very vulnerable to floating-point errors when the duration is short. While clamping values might prevent the pacer from making crazy decisions in that situation, we probably want to avoid getting there in the first place as much as possible.

I'm also suspicious overall of making all this logic happen on every insert or delete. Do we really need to make pacing decisions here? What if we did this instead:

Have the pacer run token adjustment logic every second (or a similar small interval).

Have OnInsertOrDelete record just the needed statistics, totalOpCount, and maybe an opCount for the current second.

As before, have OnInsertOrDelete subtract one token, and delay as needed.

I've changed the logic to contain a fast path if the token bucket has at least one token in it. Also, I now only recalculate allowed ops/sec when the size of the fixup queue changes, or if it's been at least one second. This seems to work pretty well.

Regarding floating point math, I've been pretty careful to ensure that the algorithms are numerically stable. Even if durations are short, everything should work well. While short durations can cause high rates, we then adjust ops/sec according to the duration, so that has the effect of "canceling out" the high rate (i.e. we multiply and then divide by the same number). The clamping is just an extra safeguard in case there are bugs.

pkg/sql/vecindex/pacer.go line 205 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Should we enforce bounds on the delay?

There's a built-in bound: the delay can't be longer than 1 second * number of waiting operations. I've also added a way for the caller to report canceled operations so that we know if a "waiting" operation is no longer waiting.

pkg/sql/vecindex/pacer.go line 200 at r2 (raw file):

	p.mu.delayed = false

	// Calculate the desired rate of change in the fixup queue size over the next

I've updated the algorithm to hopefully be more clear. It first calculates how much we want to change the queue size, then calculates how much it's actually changing. The difference is the net change we want to achieve. We change allowed ops/sec to try and achieve that net change.

DrewKimball

Nice!

Reviewed 6 of 9 files at r2, 9 of 9 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @mw5h)

pkg/sql/vecindex/pacer_test.go line 142 at r3 (raw file):

			totalOpCount++
			delay = p.OnInsertOrDelete(queuedFixups)
			// Keep ops/sec <= 1000 (not counting noise).

Would it be interesting to simulate a constant arrival rate?

pkg/sql/vecindex/pacer_test.go line 161 at r3 (raw file):

			fixupsInterval := crtime.Mono(time.Second / time.Duration(fixupsPerSec))
			fixupsInterval = crtime.Mono(float64(fixupsInterval) * (1 + noiseFactor))
			nextFixup = max(now+fixupsInterval, 1)

Should the max be applied to the interval instead of the time?

pkg/sql/vecindex/pacer.go line 182 at r3 (raw file):

// since the last call to updateOpsPerSecLocked. Allowing sub-second elapsed
// increments allows the pacer to be significantly more responsive.
func (p *pacer) updateOpsPerSecLocked(elapsed time.Duration, queuedFixups int) {

Since the mu field is gone now, should we change this to just updateOpsPerSec, or alternatively, add the Locked suffix to all the pacer methods? It'd also be nice to mention that pacer is not thread-safe, unless the follow-up is just going to reintroduce the mutex.

pkg/sql/vecindex/pacer.go line 191 at r3 (raw file):

	var desiredQueueSizeRate float64
	if queuedFixups > targetQueuedFixups {
		// If the fixup queue is too large, reduce it a gradual rate that's

nit: reduce it a -> reduce it at a

pkg/sql/vecindex/pacer.go line 243 at r3 (raw file):

	// Scale the change in ops/sec by the magnitude of desired change with respect
	// to the max allowed change.
	deltaOpsPerSec *= netQueueSizeRate / maxQueueSizeRate

nit: consider changing this line to:

deltaOpsPerSec *= math.Abs(netQueueSizeRate / maxQueueSizeRate)

so that it's only updating the magnitude like the comment says. Then, we could have the netQueueSizeRate < 0 case above do this instead to make sure deltaOpsPerSec has the correct sign:

deltaOpsPerSec = deltaOpsPerSec/deltaFactor - deltaOpsPerSec

pkg/sql/vecindex/pacer.go line 274 at r3 (raw file):

	// statistical clustering or a backlog of fixups that is suddenly added to
	// the queue).
	p.queueSizeRate = max(min(p.queueSizeRate, maxQueueSizeRate), -maxQueueSizeRate)

Since this is just our estimate for the size rate, should we avoid clamping here and instead rely on the clamping when calculating the desired rate?

andy-kimball

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mw5h)

pkg/sql/vecindex/pacer.go line 182 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Since the mu field is gone now, should we change this to just updateOpsPerSec, or alternatively, add the Locked suffix to all the pacer methods? It'd also be nice to mention that pacer is not thread-safe, unless the follow-up is just going to reintroduce the mutex.

I re-named the methods. My plan is to use the fixupProcessor mutex processor. I added a comment to pacer saying that it's not thread-safe.

pkg/sql/vecindex/pacer.go line 243 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

nit: consider changing this line to:
deltaOpsPerSec *= math.Abs(netQueueSizeRate / maxQueueSizeRate)
so that it's only updating the magnitude like the comment says. Then, we could have the netQueueSizeRate < 0 case above do this instead to make sure deltaOpsPerSec has the correct sign:
deltaOpsPerSec = deltaOpsPerSec/deltaFactor - deltaOpsPerSec

I changed the initial calculation of deltaOpsPerSec to be negative in the case that it needs to decrease. I then changed the scaling function to be:

deltaOpsPerSec *= math.Abs(netQueueSizeRate) / maxQueueSizeRate

Hopefully that's a less subtle way of getting a negative delta.

pkg/sql/vecindex/pacer.go line 274 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Since this is just our estimate for the size rate, should we avoid clamping here and instead rely on the clamping when calculating the desired rate?

I clamped here as well because this stores the EMA. Say that was calculated to be a really large number, maybe even +INF. I don't want that to poison all future calculations of the rate. In other words, I want to ensure that the actual rate is clamped and the net rate is clamped.

pkg/sql/vecindex/pacer_test.go line 142 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Would it be interesting to simulate a constant arrival rate?

You mean if the pacer returns a delay, but before that operation has waited out the delay, another operation arrives? If that's what you mean, it's not too easy to simulate that in this loop. But I did add a new TestMultipleArrival test to explicitly test that case. The delay given to each subsequent arrival should be proportional to the amount of debt in the token bucket, in order to enforce the overall ops/sec rate.

pkg/sql/vecindex/pacer_test.go line 161 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Should the max be applied to the interval instead of the time?

Done.

The pacer will limit the rate of foreground insert and delete operations in the vector index such that background split and merge operations can keep up. It does this by setting the allowed ops/sec and then delaying operations that would otherwise exceed this limit. This PR introduces the pacer, but does not change the vector index to use it. That will come in a later PR. However, this PR does simulate usage of the pacer under different interesting conditions, including edge cases. Epic: CRDB-42943 Release note: None

andy-kimball · 2025-01-18T20:04:24Z

bors r=drewkimball

craig · 2025-01-18T20:39:53Z

Build succeeded:

andy-kimball requested review from DrewKimball and mw5h December 19, 2024 00:55

andy-kimball requested a review from a team as a code owner December 19, 2024 00:55

DrewKimball reviewed Dec 27, 2024

View reviewed changes

andy-kimball force-pushed the pacing branch from cba50da to 2e10bb8 Compare January 14, 2025 01:16

andy-kimball commented Jan 14, 2025

View reviewed changes

andy-kimball force-pushed the pacing branch 2 times, most recently from 862e4b8 to 9e63d05 Compare January 14, 2025 16:36

DrewKimball approved these changes Jan 18, 2025

View reviewed changes

andy-kimball force-pushed the pacing branch from 9e63d05 to b0ce50c Compare January 18, 2025 17:31

andy-kimball commented Jan 18, 2025

View reviewed changes

andy-kimball force-pushed the pacing branch 5 times, most recently from e25911e to d791989 Compare January 18, 2025 18:51

andy-kimball force-pushed the pacing branch from d791989 to 301b671 Compare January 18, 2025 19:08

craig bot merged commit 758fe6a into cockroachdb:master Jan 18, 2025
22 checks passed

andy-kimball deleted the pacing branch January 18, 2025 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vecindex: add pacer for vector index operations #137734

vecindex: add pacer for vector index operations #137734

andy-kimball commented Dec 19, 2024

blathers-crl bot commented Dec 19, 2024

cockroach-teamcity commented Dec 19, 2024

DrewKimball left a comment

andy-kimball left a comment

DrewKimball left a comment

andy-kimball left a comment

andy-kimball commented Jan 18, 2025

craig bot commented Jan 18, 2025

vecindex: add pacer for vector index operations #137734

vecindex: add pacer for vector index operations #137734

Conversation

andy-kimball commented Dec 19, 2024

blathers-crl bot commented Dec 19, 2024

cockroach-teamcity commented Dec 19, 2024

DrewKimball left a comment

Choose a reason for hiding this comment

andy-kimball left a comment

Choose a reason for hiding this comment

DrewKimball left a comment

Choose a reason for hiding this comment

andy-kimball left a comment

Choose a reason for hiding this comment

andy-kimball commented Jan 18, 2025

craig bot commented Jan 18, 2025