storage: always schedule adjacent segment compaction #24874

andrwng · 2025-01-21T20:27:00Z

We previously fell back on adjacent segment compaction only if there was no new data to compact. In some situations, we've seen the rate of incoming data outpace the compaction interval, causing segments to pile up without ever being merged.

This change tweaks the logic to always run adjacent segment compaction after running sliding window compaction.

Backports Required

Release Notes

Improvements

Redpanda will now schedule local segment merges of compacted topics, even when windowed compaction has occurred in a given housekeeping round. This ensures progress in reducing segment count in compacted topics with high produce traffic.

WillemKauf

It is likely that we will be doing extra self compaction work in do_compact_adjacent_segment() that ultimately won't remove any bytes at all, in the general case that sliding window compaction has fully de-duplicated the log.

Not a massive concern for now (since our main goal here is leveraging adjacent segment compaction to achieve a reduction in segment numbers), but for future work I think we should be looking at a sliding window compaction procedure that:

Self compacts segments (already done in the current implementation)
Performs sliding window compaction over the segments (already done in the current implementation)
Combines all M compatible (contingent on raft terms and compacted_log_segment_size) sub-arrays of N segments using newly enhanced adjacent segment tools (no "compaction" is done here, just the act of concatenating segments/reducing num_segments in sub-array M_i from N -> 1)

Then, I think we can fully deprecate the adjacent segment compaction routine as it currently exists.

andrwng · 2025-01-21T20:50:17Z

I think we should be looking at a sliding window compaction procedure that:

+1, this is a much cleaner solution, but requires a good amount of thought and testing to ensure we're crash-safe. So far compaction improvements have incrementally improved on what's been there, but we haven't yet gotten to a point where we have a generic "replace M segments with N new segments" method.

WillemKauf · 2025-01-21T20:52:16Z

but we haven't yet gotten to a point where we have a generic "replace M segments with N new segments" method.

Agreed, this is the biggest new primitive that would have to be added/battle tested for the proposed solution above.

vbotbuildovich · 2025-01-21T23:53:56Z

CI test results

test results on build#61009

test_id	test_kind	job_url	test_status	passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade	ducktape	https://buildkite.com/redpanda/redpanda/builds/61009#01948ad1-2769-4e47-89b2-502f7919ba7a	FLAKY	1/2
storage_e2e_single_thread_rpunit.storage_e2e_single_thread_rpunit	unit	https://buildkite.com/redpanda/redpanda/builds/61009#01948a8d-37e8-4219-94b7-d5a29ba1caf9	FAIL	0/2
storage_e2e_single_thread_rpunit.storage_e2e_single_thread_rpunit	unit	https://buildkite.com/redpanda/redpanda/builds/61009#01948a8d-37e9-4bbe-b4ce-eee77f3400da	FAIL	0/2

test results on build#61023

test_id	test_kind	job_url	test_status	passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61023#01948c30-e558-46d6-b596-c2f9803693c7	FLAKY	1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61023#01948c4c-4bc7-47e4-8f3c-d932fa563242	FLAKY	1/4
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade	ducktape	https://buildkite.com/redpanda/redpanda/builds/61023#01948c4c-4bc8-4768-9c68-1139e1080523	FLAKY	1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/61023#01948c4c-4bc7-47e4-8f3c-d932fa563242	FLAKY	1/7
test_archival_service_rpfixture.test_archival_service_rpfixture	unit	https://buildkite.com/redpanda/redpanda/builds/61023#01948bed-0891-401b-8a14-0afc09ce84f8	FAIL	0/2
test_archival_service_rpfixture.test_archival_service_rpfixture	unit	https://buildkite.com/redpanda/redpanda/builds/61023#01948bed-0892-4204-9d5e-24af9f605020	FAIL	0/2

test results on build#61135

test_id	test_kind	job_url	test_status	passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61135#0194966a-e9b6-4501-b6f4-b79f8aa3497c	FLAKY	1/3
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61135#01949688-4676-4152-a5ac-b2c022074b5b	FLAKY	1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/61135#01949688-4676-4152-a5ac-b2c022074b5b	FLAKY	1/2

WillemKauf · 2025-01-22T19:41:00Z

How did the CI failure/bug fix in #24880 relate to compaction?

src/v/storage/disk_log_impl.cc

andrwng · 2025-01-22T19:52:56Z

How did the CI failure/bug fix in #24880 relate to compaction?

From the commit message

With an upcoming change to merge compact after windowed compaction,
test_offset_range_size2_compacted would fail because it would prefix
truncate mid-segment following a merge compaction, and then trip over
this, hitting an unexpected exception when creating a reader:

std::runtime_error: Reader cannot read before start of the log 0 < 887

dotnwat

we recently saw a large segment that caused the need for chunked compaction to be introduced. iirc, adjacent segment compaction can create multi-gb segments? is that going to be an issue if sliding window compaction begins interacting with these segments?

We previously fell back on adjacent segment compaction only if there was no new data to compact. In some situations, we've seen the rate of incoming data outpace the compaction interval, causing segments to pile up without ever being merged. This change tweaks the logic to always run adjacent segment compaction after running sliding window compaction. Along the way, a couple tests needed to be tweaked to handle the fact that housekeeping now may merge segments.

andrwng · 2025-01-24T02:37:21Z

we recently saw a large segment that caused the need for chunked compaction to be introduced. iirc, adjacent segment compaction can create multi-gb segments? is that going to be an issue if sliding window compaction begins interacting with these segments?

I'm not sure I follow the question, but this is what I expect from this change:

We will merge compact more consistently in topics with high throughput
Because of that, we may end up with fewer segments that are on the larger end. For the large segments whose keys don't fit in 128MiB, chunked compaction will kick in after failing to do vanilla windowed compaction

Does that answer your question?

andrwng · 2025-01-24T02:38:26Z

src/v/cluster/archival/tests/ntp_archiver_reupload_test.cc

@@ -448,11 +442,11 @@ FIXTURE_TEST(
    std::stringstream st;
    stm_manifest.serialize_json(st);
    vlog(test_log.debug, "manifest: {}", st.str());
-    verify_segment_request("500-1-v1.log", stm_manifest);


This verification ensured a matching local segment, but that isn't the case for this test anymore since the desired segment was merged into the previous segment

andrwng · 2025-01-24T02:41:10Z

src/v/cluster/archival/tests/segment_reupload_test.cc

@@ -843,19 +844,18 @@ SEASTAR_THREAD_TEST_CASE(test_upload_aligned_to_non_existent_offset) {
              .get();
        }
        auto seg = b.get_log_segments().back();
-        seg->appender().close().get();
-        seg->release_appender().get();
+        seg->release_appender(&b.get_disk_log_impl().readers()).get();


This release_appender(readers_cache) call releases both the segment appender and the compacted index appender (this method is what's used in disk_log_impl). This is now a requirement for this test because the merge compaction in gc() will close a compacted index if it hasn't yet been released, and that would trip up subsequent segment->close() calls, which expect to only be called once.

dotnwat · 2025-01-28T04:24:04Z

Because of that, we may end up with fewer segments that are on the larger end. For the large segments whose keys don't fit in 128MiB, chunked compaction will kick in after failing to do vanilla windowed compaction

@andrwng I think my concern is like how does chunked compaction handle a 5gb segment. I guess that is more of a question for @WillemKauf and if the answer is that yikes let's not do that, then maybe we can change the default size or do something else. 5gb seems really big?

dotnwat

lgtm.

i responded to that question about large segment size and tagged you and willem i don't think the answer blocks this pr.

vbotbuildovich · 2025-01-28T04:25:05Z

/backport v24.3.x

vbotbuildovich · 2025-01-28T04:25:06Z

/backport v24.2.x

vbotbuildovich · 2025-01-28T04:26:07Z

Failed to create a backport PR to v24.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-24874-v24.2.x-816 remotes/upstream/v24.2.x
git cherry-pick -x 08d0433a2a

Workflow run logs.

github-actions bot added the area/redpanda label Jan 21, 2025

andrwng requested review from dotnwat and WillemKauf January 21, 2025 20:27

WillemKauf previously approved these changes Jan 21, 2025

View reviewed changes

andrwng dismissed WillemKauf’s stale review via a7ec00a January 22, 2025 02:51

andrwng force-pushed the storage-compaction-always-merge branch from 82cf4df to a7ec00a Compare January 22, 2025 02:51

WillemKauf mentioned this pull request Jan 22, 2025

storage: fix bounds check in offset range size method #24880

Merged

7 tasks

WillemKauf reviewed Jan 22, 2025

View reviewed changes

src/v/storage/disk_log_impl.cc Show resolved Hide resolved

dotnwat reviewed Jan 23, 2025

View reviewed changes

andrwng force-pushed the storage-compaction-always-merge branch from a7ec00a to 08d0433 Compare January 24, 2025 02:30

andrwng commented Jan 24, 2025

View reviewed changes

andrwng requested review from WillemKauf and dotnwat January 24, 2025 06:51

andrwng enabled auto-merge January 24, 2025 07:00

dotnwat approved these changes Jan 28, 2025

View reviewed changes

andrwng merged commit 8fafb35 into redpanda-data:dev Jan 28, 2025
18 checks passed

This was referenced Jan 28, 2025

[v24.3.x] storage: always schedule adjacent segment compaction #24956

Open

[v24.2.x] storage: always schedule adjacent segment compaction #24957

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: always schedule adjacent segment compaction #24874

storage: always schedule adjacent segment compaction #24874

andrwng commented Jan 21, 2025 •

edited

Loading

WillemKauf left a comment •

edited

Loading

andrwng commented Jan 21, 2025

WillemKauf commented Jan 21, 2025

vbotbuildovich commented Jan 21, 2025 •

edited

Loading

WillemKauf commented Jan 22, 2025

andrwng commented Jan 22, 2025

dotnwat left a comment

andrwng commented Jan 24, 2025

andrwng Jan 24, 2025

andrwng Jan 24, 2025

dotnwat commented Jan 28, 2025 •

edited

Loading

dotnwat left a comment

vbotbuildovich commented Jan 28, 2025

vbotbuildovich commented Jan 28, 2025

vbotbuildovich commented Jan 28, 2025

storage: always schedule adjacent segment compaction #24874

storage: always schedule adjacent segment compaction #24874

Conversation

andrwng commented Jan 21, 2025 • edited Loading

Backports Required

Release Notes

Improvements

WillemKauf left a comment • edited Loading

Choose a reason for hiding this comment

andrwng commented Jan 21, 2025

WillemKauf commented Jan 21, 2025

vbotbuildovich commented Jan 21, 2025 • edited Loading

CI test results

WillemKauf commented Jan 22, 2025

andrwng commented Jan 22, 2025

dotnwat left a comment

Choose a reason for hiding this comment

andrwng commented Jan 24, 2025

andrwng Jan 24, 2025

Choose a reason for hiding this comment

andrwng Jan 24, 2025

Choose a reason for hiding this comment

dotnwat commented Jan 28, 2025 • edited Loading

dotnwat left a comment

Choose a reason for hiding this comment

vbotbuildovich commented Jan 28, 2025

vbotbuildovich commented Jan 28, 2025

vbotbuildovich commented Jan 28, 2025

andrwng commented Jan 21, 2025 •

edited

Loading

WillemKauf left a comment •

edited

Loading

vbotbuildovich commented Jan 21, 2025 •

edited

Loading

dotnwat commented Jan 28, 2025 •

edited

Loading