-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor of ActiveDefrag to reduce latencies #1242
base: unstable
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1242 +/- ##
============================================
+ Coverage 70.62% 70.77% +0.15%
============================================
Files 114 114
Lines 61694 63182 +1488
============================================
+ Hits 43569 44718 +1149
- Misses 18125 18464 +339
|
ffdf32a
to
4054b1f
Compare
@zvi-code Would you mind also helping taking a look on this PR. |
@JimB123 , I skimmed through the code, at a very high level, it looks good (and the results you posted are very promising!). I will followup with a more detailed review of the code. |
@zvi-code The Valkey event loop supports creation of various timer events. But this For defrag, there are 2 things that need to happen. First, we need to make a decision if we should begin to defrag. It makes sense to decide this using the main In the old code, if we target 10% of the CPU, that was done by letting defrag run for (a continuous) 10ms every time the 100ms In the new code, with the same 10% CPU target, we run the defrag job for only 500us, but schedule it on it's own dedicated timer so that it can run more often. The code modulates the frequency rather than the duration. (In extreme cases, anti-starvation must still modulate the duration as starvation impacts the frequency.)
I haven't looked at the lazy expiration before, but looking at it now, that's exactly what I'm saying. We shouldn't be doing ANYTHING in
OK, wow, this is another thing I haven't looked at before. IMO, This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, much more readable than before! Some minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied to some of @madolson comments. Will publish update.
4054b1f
to
3f7d0ad
Compare
Signed-off-by: Jim Brunner <[email protected]>
3f7d0ad
to
825da4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm. @zvi-code would still like you to take a look if you have time.
@valkey-io/core-team there is a small major change here in the form of a config that determines how much time we spend on active Defrag. Please take a look and provide 👍 or 👎 . You can see the definition here:
Other than that, this change just better amortizes the active-defrag work to reduce latency spikes. |
@JimB123 , added several comments, will continue to complete the review tomorrow |
serverLog(LL_VERBOSE, "Starting active defrag, frag=%.0f%%, frag_bytes=%zu, cpu=%d%%", | ||
frag_pct, frag_bytes, cpu_pct); | ||
} | ||
return cpu_pct; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is very weird code, I suggest to just set the value of server.active_defrag_cpu_percent
internally as it was
all_stages_finished = 1; | ||
defrag_stage = 0; | ||
} | ||
// This function shouldn't be called unless there's work to do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this confusing, this function is called from event loop, so it's not that someone could call it when in wrong state... why is it needed?
* or making it more aggressive. */ | ||
run_with_period(1000) { | ||
computeDefragCycles(); | ||
static defragKeysCtx ctx; // STATIC - this persists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need all these static? can't we avoid them and carry the context with the already existing context in privatdata (or similar)?
defrag.current_stage = listNodeValue(listFirst(defrag.remaining_stages)); | ||
listDelNode(defrag.remaining_stages, listFirst(defrag.remaining_stages)); | ||
// Initialize the stage with endtime==0 | ||
doneStatus status = defrag.current_stage->stage_fn(0, defrag.current_stage->target, defrag.current_stage->privdata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just have explicit init function that will be called when state is removed from the list and set to start? It makes the code less clear and mixes 2 totally separate logic paths
} else { | ||
/* The timeproc will go away when we return AE_NOMORE. | ||
* Mark this now, indicating a normal termination. */ | ||
defrag.timeproc_id = AE_DELETED_EVENT_ID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it needed? don't you override it anyway in endDefragCycle
?
* cycle or adjusting the CPU percentage for an existing cycle. */ | ||
if (hasActiveChildProcess()) return; | ||
|
||
server.active_defrag_cpu_percent = computeDefragCpuPercent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make us query jemalloc [in zmalloc_get_allocator_info
] every time the databasesCron
is called, previous code tried to avoid this frequent call, is this change intentional?
long long timeproc_id; | ||
monotime timeproc_end_time; | ||
long timeproc_overage_us; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have explanation on the structs and their members?
can we have all structs defined at the beginning together with the documentation you already have there?
monotime timeproc_end_time; | ||
long timeproc_overage_us; | ||
}; | ||
static struct DefragContext defrag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would also move to the begining
Refer to: #1141
This update refactors the defrag code to:
With this update, the defrag cycle time is reduced to 500us, with more frequent cycles. This results in much more predictable latencies, with a dramatic reduction in tail latencies.
(See #1141 for more complete details.)
This update is focused mostly on the high-level processing, and does NOT address lower level functions which aren't currently timebound (e.g.
activeDefragSdsDict()
, andmoduleDefragGlobals()
). These are out of scope for this update and left for a future update.I fixed
kvstoreDictLUTDefrag
because it was using up to 7ms on a CME single shard.During unit tests, the following max latencies were measured (in verbose mode):
In addition, the following test was run on both old and new versions of the software:
Configuration was set so that both old/new clusters would use 10% defrag CPU.
Defrag time OLD: 120 sec
Defrag time NEW: 105 sec
Times were based on a 5-second poll. (I didn't have logs running.)
The improvement in run time is believed to be due to the unskewed nature of the new code which provides a more accurate 10% of the CPU.
This is the OLD distribution for the GET benchmark while defragging:
This is the equivalent NEW distribution:
You can see in the new distribution that there is a very slight increase in latencies <= 99.951%, in exchange for a huge reduction in tail latencies.
A CONTROL test WITHOUT Active Defrag running shows a very similar distribution with a variation of roughly 500us in the tail latencies (as expected):