Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: always schedule adjacent segment compaction #24874

Merged
merged 1 commit into from
Jan 28, 2025

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Jan 21, 2025

We previously fell back on adjacent segment compaction only if there was no new data to compact. In some situations, we've seen the rate of incoming data outpace the compaction interval, causing segments to pile up without ever being merged.

This change tweaks the logic to always run adjacent segment compaction after running sliding window compaction.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Improvements

  • Redpanda will now schedule local segment merges of compacted topics, even when windowed compaction has occurred in a given housekeeping round. This ensures progress in reducing segment count in compacted topics with high produce traffic.

WillemKauf
WillemKauf previously approved these changes Jan 21, 2025
Copy link
Contributor

@WillemKauf WillemKauf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is likely that we will be doing extra self compaction work in do_compact_adjacent_segment() that ultimately won't remove any bytes at all, in the general case that sliding window compaction has fully de-duplicated the log.

Not a massive concern for now (since our main goal here is leveraging adjacent segment compaction to achieve a reduction in segment numbers), but for future work I think we should be looking at a sliding window compaction procedure that:

  1. Self compacts segments (already done in the current implementation)
  2. Performs sliding window compaction over the segments (already done in the current implementation)
  3. Combines all M compatible (contingent on raft terms and compacted_log_segment_size) sub-arrays of N segments using newly enhanced adjacent segment tools (no "compaction" is done here, just the act of concatenating segments/reducing num_segments in sub-array M_i from N -> 1)

Then, I think we can fully deprecate the adjacent segment compaction routine as it currently exists.

@andrwng
Copy link
Contributor Author

andrwng commented Jan 21, 2025

I think we should be looking at a sliding window compaction procedure that:

+1, this is a much cleaner solution, but requires a good amount of thought and testing to ensure we're crash-safe. So far compaction improvements have incrementally improved on what's been there, but we haven't yet gotten to a point where we have a generic "replace M segments with N new segments" method.

@WillemKauf
Copy link
Contributor

but we haven't yet gotten to a point where we have a generic "replace M segments with N new segments" method.

Agreed, this is the biggest new primitive that would have to be added/battle tested for the proposed solution above.

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 21, 2025

CI test results

test results on build#61009
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61009#01948ad1-2769-4e47-89b2-502f7919ba7a FLAKY 1/2
storage_e2e_single_thread_rpunit.storage_e2e_single_thread_rpunit unit https://buildkite.com/redpanda/redpanda/builds/61009#01948a8d-37e8-4219-94b7-d5a29ba1caf9 FAIL 0/2
storage_e2e_single_thread_rpunit.storage_e2e_single_thread_rpunit unit https://buildkite.com/redpanda/redpanda/builds/61009#01948a8d-37e9-4bbe-b4ce-eee77f3400da FAIL 0/2
test results on build#61023
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61023#01948c30-e558-46d6-b596-c2f9803693c7 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61023#01948c4c-4bc7-47e4-8f3c-d932fa563242 FLAKY 1/4
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61023#01948c4c-4bc8-4768-9c68-1139e1080523 FLAKY 1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61023#01948c4c-4bc7-47e4-8f3c-d932fa563242 FLAKY 1/7
test_archival_service_rpfixture.test_archival_service_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/61023#01948bed-0891-401b-8a14-0afc09ce84f8 FAIL 0/2
test_archival_service_rpfixture.test_archival_service_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/61023#01948bed-0892-4204-9d5e-24af9f605020 FAIL 0/2
test results on build#61135
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61135#0194966a-e9b6-4501-b6f4-b79f8aa3497c FLAKY 1/3
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61135#01949688-4676-4152-a5ac-b2c022074b5b FLAKY 1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61135#01949688-4676-4152-a5ac-b2c022074b5b FLAKY 1/2

@WillemKauf
Copy link
Contributor

How did the CI failure/bug fix in #24880 relate to compaction?

@andrwng
Copy link
Contributor Author

andrwng commented Jan 22, 2025

How did the CI failure/bug fix in #24880 relate to compaction?

From the commit message

With an upcoming change to merge compact after windowed compaction,
test_offset_range_size2_compacted would fail because it would prefix
truncate mid-segment following a merge compaction, and then trip over
this, hitting an unexpected exception when creating a reader:

std::runtime_error: Reader cannot read before start of the log 0 < 887

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we recently saw a large segment that caused the need for chunked compaction to be introduced. iirc, adjacent segment compaction can create multi-gb segments? is that going to be an issue if sliding window compaction begins interacting with these segments?

We previously fell back on adjacent segment compaction only if there was
no new data to compact. In some situations, we've seen the rate of
incoming data outpace the compaction interval, causing segments to pile
up without ever being merged.

This change tweaks the logic to always run adjacent segment compaction
after running sliding window compaction.

Along the way, a couple tests needed to be tweaked to handle the fact
that housekeeping now may merge segments.
@andrwng andrwng force-pushed the storage-compaction-always-merge branch from a7ec00a to 08d0433 Compare January 24, 2025 02:30
@andrwng
Copy link
Contributor Author

andrwng commented Jan 24, 2025

we recently saw a large segment that caused the need for chunked compaction to be introduced. iirc, adjacent segment compaction can create multi-gb segments? is that going to be an issue if sliding window compaction begins interacting with these segments?

I'm not sure I follow the question, but this is what I expect from this change:

  • We will merge compact more consistently in topics with high throughput
  • Because of that, we may end up with fewer segments that are on the larger end. For the large segments whose keys don't fit in 128MiB, chunked compaction will kick in after failing to do vanilla windowed compaction

Does that answer your question?

@@ -448,11 +442,11 @@ FIXTURE_TEST(
std::stringstream st;
stm_manifest.serialize_json(st);
vlog(test_log.debug, "manifest: {}", st.str());
verify_segment_request("500-1-v1.log", stm_manifest);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This verification ensured a matching local segment, but that isn't the case for this test anymore since the desired segment was merged into the previous segment

@@ -843,19 +844,18 @@ SEASTAR_THREAD_TEST_CASE(test_upload_aligned_to_non_existent_offset) {
.get();
}
auto seg = b.get_log_segments().back();
seg->appender().close().get();
seg->release_appender().get();
seg->release_appender(&b.get_disk_log_impl().readers()).get();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This release_appender(readers_cache) call releases both the segment appender and the compacted index appender (this method is what's used in disk_log_impl). This is now a requirement for this test because the merge compaction in gc() will close a compacted index if it hasn't yet been released, and that would trip up subsequent segment->close() calls, which expect to only be called once.

@andrwng andrwng requested review from WillemKauf and dotnwat January 24, 2025 06:51
@andrwng andrwng enabled auto-merge January 24, 2025 07:00
@dotnwat
Copy link
Member

dotnwat commented Jan 28, 2025

Because of that, we may end up with fewer segments that are on the larger end. For the large segments whose keys don't fit in 128MiB, chunked compaction will kick in after failing to do vanilla windowed compaction

@andrwng I think my concern is like how does chunked compaction handle a 5gb segment. I guess that is more of a question for @WillemKauf and if the answer is that yikes let's not do that, then maybe we can change the default size or do something else. 5gb seems really big?

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

i responded to that question about large segment size and tagged you and willem i don't think the answer blocks this pr.

@andrwng andrwng merged commit 8fafb35 into redpanda-data:dev Jan 28, 2025
18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-24874-v24.2.x-816 remotes/upstream/v24.2.x
git cherry-pick -x 08d0433a2a

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants