-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: always schedule adjacent segment compaction #24874
storage: always schedule adjacent segment compaction #24874
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is likely that we will be doing extra self compaction work in do_compact_adjacent_segment()
that ultimately won't remove any bytes at all, in the general case that sliding window compaction has fully de-duplicated the log.
Not a massive concern for now (since our main goal here is leveraging adjacent segment compaction to achieve a reduction in segment numbers), but for future work I think we should be looking at a sliding window compaction procedure that:
- Self compacts segments (already done in the current implementation)
- Performs sliding window compaction over the segments (already done in the current implementation)
- Combines all
M
compatible (contingent on raft terms andcompacted_log_segment_size
) sub-arrays ofN
segments using newly enhanced adjacent segment tools (no "compaction" is done here, just the act of concatenating segments/reducingnum_segments
in sub-arrayM_i
fromN -> 1
)
Then, I think we can fully deprecate the adjacent segment compaction routine as it currently exists.
+1, this is a much cleaner solution, but requires a good amount of thought and testing to ensure we're crash-safe. So far compaction improvements have incrementally improved on what's been there, but we haven't yet gotten to a point where we have a generic "replace M segments with N new segments" method. |
Agreed, this is the biggest new primitive that would have to be added/battle tested for the proposed solution above. |
CI test resultstest results on build#61009
test results on build#61023
test results on build#61135
|
82cf4df
to
a7ec00a
Compare
How did the CI failure/bug fix in #24880 relate to compaction? |
From the commit message
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we recently saw a large segment that caused the need for chunked compaction to be introduced. iirc, adjacent segment compaction can create multi-gb segments? is that going to be an issue if sliding window compaction begins interacting with these segments?
We previously fell back on adjacent segment compaction only if there was no new data to compact. In some situations, we've seen the rate of incoming data outpace the compaction interval, causing segments to pile up without ever being merged. This change tweaks the logic to always run adjacent segment compaction after running sliding window compaction. Along the way, a couple tests needed to be tweaked to handle the fact that housekeeping now may merge segments.
a7ec00a
to
08d0433
Compare
I'm not sure I follow the question, but this is what I expect from this change:
Does that answer your question? |
@@ -448,11 +442,11 @@ FIXTURE_TEST( | |||
std::stringstream st; | |||
stm_manifest.serialize_json(st); | |||
vlog(test_log.debug, "manifest: {}", st.str()); | |||
verify_segment_request("500-1-v1.log", stm_manifest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This verification ensured a matching local segment, but that isn't the case for this test anymore since the desired segment was merged into the previous segment
@@ -843,19 +844,18 @@ SEASTAR_THREAD_TEST_CASE(test_upload_aligned_to_non_existent_offset) { | |||
.get(); | |||
} | |||
auto seg = b.get_log_segments().back(); | |||
seg->appender().close().get(); | |||
seg->release_appender().get(); | |||
seg->release_appender(&b.get_disk_log_impl().readers()).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This release_appender(readers_cache)
call releases both the segment appender and the compacted index appender (this method is what's used in disk_log_impl). This is now a requirement for this test because the merge compaction in gc()
will close a compacted index if it hasn't yet been released, and that would trip up subsequent segment->close()
calls, which expect to only be called once.
@andrwng I think my concern is like how does chunked compaction handle a 5gb segment. I guess that is more of a question for @WillemKauf and if the answer is that yikes let's not do that, then maybe we can change the default size or do something else. 5gb seems really big? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
i responded to that question about large segment size and tagged you and willem i don't think the answer blocks this pr.
/backport v24.3.x |
/backport v24.2.x |
Failed to create a backport PR to v24.2.x branch. I tried:
|
We previously fell back on adjacent segment compaction only if there was no new data to compact. In some situations, we've seen the rate of incoming data outpace the compaction interval, causing segments to pile up without ever being merged.
This change tweaks the logic to always run adjacent segment compaction after running sliding window compaction.
Backports Required
Release Notes
Improvements