-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vecindex: add pacer for vector index operations #137734
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 12 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @mw5h)
pkg/sql/vecindex/pacer.go
line 19 at r1 (raw file):
// size is below this threshold, throttling will still occur if the queue size // is increasing. const allowedQueuedFixups = 5
This number seems low to me. I'm assuming it's just a placeholder value for testing, is that right?
Also, should these numbers eventually be proportional to GOMAXPROCS?
pkg/sql/vecindex/pacer.go
line 134 at r1 (raw file):
p.mu.totalOpCount++ // Compute elapsed time since the last query.
Do we really need to do all this work on every insert/delete? Could we short-circuit in most cases after increasing the op count? For example, if the elapsed time is less than 1 second (or some other interval), we could return immediately without updating p.mu.lastOpAt
. Maybe we'd also need to compare queuedFixups
against lastQueuedFixups
or something, but same idea.
It also seems like the calculations below are very vulnerable to floating-point errors when the duration is short. While clamping values might prevent the pacer from making crazy decisions in that situation, we probably want to avoid getting there in the first place as much as possible.
I'm also suspicious overall of making all this logic happen on every insert or delete. Do we really need to make pacing decisions here? What if we did this instead:
- Have the pacer run token adjustment logic every second (or a similar small interval).
- Have
OnInsertOrDelete
record just the needed statistics,totalOpCount
, and maybe anopCount
for the current second. - As before, have
OnInsertOrDelete
subtract one token, and delay as needed.
pkg/sql/vecindex/pacer.go
line 205 at r1 (raw file):
return 0 } return time.Duration(float64(time.Second) * -p.mu.currentTokens / p.mu.allowedOpsPerSec)
Should we enforce bounds on the delay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mw5h)
pkg/sql/vecindex/pacer.go
line 19 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
This number seems low to me. I'm assuming it's just a placeholder value for testing, is that right?
Also, should these numbers eventually be proportional to GOMAXPROCS?
I'd like to make targetQueuedFixups
proportional to actual ops/sec, as that's what drives fixups. However, I want to wait until I have the "real" testing setup so I can tune these values. I also plan to add some cluster settings so we can change these on the fly.
pkg/sql/vecindex/pacer.go
line 134 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Do we really need to do all this work on every insert/delete? Could we short-circuit in most cases after increasing the op count? For example, if the elapsed time is less than 1 second (or some other interval), we could return immediately without updating
p.mu.lastOpAt
. Maybe we'd also need to comparequeuedFixups
againstlastQueuedFixups
or something, but same idea.It also seems like the calculations below are very vulnerable to floating-point errors when the duration is short. While clamping values might prevent the pacer from making crazy decisions in that situation, we probably want to avoid getting there in the first place as much as possible.
I'm also suspicious overall of making all this logic happen on every insert or delete. Do we really need to make pacing decisions here? What if we did this instead:
- Have the pacer run token adjustment logic every second (or a similar small interval).
- Have
OnInsertOrDelete
record just the needed statistics,totalOpCount
, and maybe anopCount
for the current second.- As before, have
OnInsertOrDelete
subtract one token, and delay as needed.
I've changed the logic to contain a fast path if the token bucket has at least one token in it. Also, I now only recalculate allowed ops/sec when the size of the fixup queue changes, or if it's been at least one second. This seems to work pretty well.
Regarding floating point math, I've been pretty careful to ensure that the algorithms are numerically stable. Even if durations are short, everything should work well. While short durations can cause high rates, we then adjust ops/sec according to the duration, so that has the effect of "canceling out" the high rate (i.e. we multiply and then divide by the same number). The clamping is just an extra safeguard in case there are bugs.
pkg/sql/vecindex/pacer.go
line 205 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Should we enforce bounds on the delay?
There's a built-in bound: the delay can't be longer than 1 second * number of waiting operations. I've also added a way for the caller to report canceled operations so that we know if a "waiting" operation is no longer waiting.
pkg/sql/vecindex/pacer.go
line 200 at r2 (raw file):
p.mu.delayed = false // Calculate the desired rate of change in the fixup queue size over the next
I've updated the algorithm to hopefully be more clear. It first calculates how much we want to change the queue size, then calculates how much it's actually changing. The difference is the net change we want to achieve. We change allowed ops/sec to try and achieve that net change.
862e4b8
to
9e63d05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 9 files at r2, 9 of 9 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @mw5h)
pkg/sql/vecindex/pacer_test.go
line 142 at r3 (raw file):
totalOpCount++ delay = p.OnInsertOrDelete(queuedFixups) // Keep ops/sec <= 1000 (not counting noise).
Would it be interesting to simulate a constant arrival rate?
pkg/sql/vecindex/pacer_test.go
line 161 at r3 (raw file):
fixupsInterval := crtime.Mono(time.Second / time.Duration(fixupsPerSec)) fixupsInterval = crtime.Mono(float64(fixupsInterval) * (1 + noiseFactor)) nextFixup = max(now+fixupsInterval, 1)
Should the max be applied to the interval instead of the time?
pkg/sql/vecindex/pacer.go
line 182 at r3 (raw file):
// since the last call to updateOpsPerSecLocked. Allowing sub-second elapsed // increments allows the pacer to be significantly more responsive. func (p *pacer) updateOpsPerSecLocked(elapsed time.Duration, queuedFixups int) {
Since the mu
field is gone now, should we change this to just updateOpsPerSec
, or alternatively, add the Locked
suffix to all the pacer methods? It'd also be nice to mention that pacer
is not thread-safe, unless the follow-up is just going to reintroduce the mutex.
pkg/sql/vecindex/pacer.go
line 191 at r3 (raw file):
var desiredQueueSizeRate float64 if queuedFixups > targetQueuedFixups { // If the fixup queue is too large, reduce it a gradual rate that's
nit: reduce it a
-> reduce it at a
pkg/sql/vecindex/pacer.go
line 243 at r3 (raw file):
// Scale the change in ops/sec by the magnitude of desired change with respect // to the max allowed change. deltaOpsPerSec *= netQueueSizeRate / maxQueueSizeRate
nit: consider changing this line to:
deltaOpsPerSec *= math.Abs(netQueueSizeRate / maxQueueSizeRate)
so that it's only updating the magnitude like the comment says. Then, we could have the netQueueSizeRate < 0
case above do this instead to make sure deltaOpsPerSec
has the correct sign:
deltaOpsPerSec = deltaOpsPerSec/deltaFactor - deltaOpsPerSec
pkg/sql/vecindex/pacer.go
line 274 at r3 (raw file):
// statistical clustering or a backlog of fixups that is suddenly added to // the queue). p.queueSizeRate = max(min(p.queueSizeRate, maxQueueSizeRate), -maxQueueSizeRate)
Since this is just our estimate for the size rate, should we avoid clamping here and instead rely on the clamping when calculating the desired rate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mw5h)
pkg/sql/vecindex/pacer.go
line 182 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Since the
mu
field is gone now, should we change this to justupdateOpsPerSec
, or alternatively, add theLocked
suffix to all the pacer methods? It'd also be nice to mention thatpacer
is not thread-safe, unless the follow-up is just going to reintroduce the mutex.
I re-named the methods. My plan is to use the fixupProcessor
mutex processor. I added a comment to pacer
saying that it's not thread-safe.
pkg/sql/vecindex/pacer.go
line 243 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
nit: consider changing this line to:
deltaOpsPerSec *= math.Abs(netQueueSizeRate / maxQueueSizeRate)
so that it's only updating the magnitude like the comment says. Then, we could have the
netQueueSizeRate < 0
case above do this instead to make suredeltaOpsPerSec
has the correct sign:deltaOpsPerSec = deltaOpsPerSec/deltaFactor - deltaOpsPerSec
I changed the initial calculation of deltaOpsPerSec
to be negative in the case that it needs to decrease. I then changed the scaling function to be:
deltaOpsPerSec *= math.Abs(netQueueSizeRate) / maxQueueSizeRate
Hopefully that's a less subtle way of getting a negative delta.
pkg/sql/vecindex/pacer.go
line 274 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Since this is just our estimate for the size rate, should we avoid clamping here and instead rely on the clamping when calculating the desired rate?
I clamped here as well because this stores the EMA. Say that was calculated to be a really large number, maybe even +INF. I don't want that to poison all future calculations of the rate. In other words, I want to ensure that the actual rate is clamped and the net rate is clamped.
pkg/sql/vecindex/pacer_test.go
line 142 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Would it be interesting to simulate a constant arrival rate?
You mean if the pacer returns a delay, but before that operation has waited out the delay, another operation arrives? If that's what you mean, it's not too easy to simulate that in this loop. But I did add a new TestMultipleArrival
test to explicitly test that case. The delay given to each subsequent arrival should be proportional to the amount of debt in the token bucket, in order to enforce the overall ops/sec rate.
pkg/sql/vecindex/pacer_test.go
line 161 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Should the max be applied to the interval instead of the time?
Done.
e25911e
to
d791989
Compare
The pacer will limit the rate of foreground insert and delete operations in the vector index such that background split and merge operations can keep up. It does this by setting the allowed ops/sec and then delaying operations that would otherwise exceed this limit. This PR introduces the pacer, but does not change the vector index to use it. That will come in a later PR. However, this PR does simulate usage of the pacer under different interesting conditions, including edge cases. Epic: CRDB-42943 Release note: None
bors r=drewkimball |
The pacer will limit the rate of foreground insert and delete operations in the vector index such that background split and merge operations can keep up. It does this by setting the allowed ops/sec and then delaying operations that would otherwise exceed this limit.
This PR introduces the pacer, but does not change the vector index to use it. That will come in a later PR. However, this PR does simulate usage of the pacer under different interesting conditions, including edge cases.
Epic: CRDB-42943
Release note: None