Add Compactor.MaxLookback Option for Limiting Blocks Loaded in Compaction Cycles #10585

dmwilson-grafana · 2025-02-04T22:03:10Z

What this PR does

As described in #7664, a Mimir block's meta.json file can become large if its sources list grows due to intermediate blocks from split groups, compactor merge shards, or re-compactions due to OOO blocks. When a bucket contains many blocks with many sources each, this may cause unexpectedly high resource usage on the compactor.

This PR adds an experimental option, -compactor.max-lookback which can reduce the CPU time and memory spent reading blocks' metadata files by instructing the compactor to read metadata for only the blocks that have been uploaded within the max-lookback period. If set, this value should be set well above the maximum of -compactor.block-ranges (e.g. 168h) to not interfere w. the regular compaction process.

This PR also adds a metric, cortex_compactor_meta_blocks_synced which describes the number of blocks discovered in object storage and labels them with their state, i.e. were they loaded, marked for deletion, skipped because they were outside the lookback period, etc.

Which issue(s) this PR fixes or relates to

Compactor: Reduce memory consumption from large meta.json files #7664

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

github-actions · 2025-02-04T22:16:57Z

💻 Deploy preview deleted.

seizethedave

I'm thinking about this new config setting, ops-wise. The primary concern we've talked about with this new setting is around block upload and/or backfills: If we set max-lookback (a server-level option) to X days, then any tenant-level block upload with a block ID over X days old will go unnoticed. (My cursory scan of the block upload code says that it will honor whatever block ID is given on import, rather than creating new $now ULIDs.)

Two specific bad things I'm thinking about are:

footguns around forgetting to adjust max-lookback in preparation of uploads
operating a large multitenant Mimir cluster and one tenant needs to upload data from 1 year ago. This will cause us to set max-lookback to 1y for all tenants and probably blow up compactor resources.

What do you think of a slightly different impl: using IterWithAttributes to learn each block's mtime and instead using that as a filtering device. Then (at least, from my 30' view) a tenant can perform a block upload and the server doesn't have to be specially reconfigured to match the data.

seizethedave · 2025-02-07T21:26:08Z

RE my 2nd point:

operating a large multitenant Mimir cluster and one tenant needs to upload data from 1 year ago. This will cause us to set max-lookback to 1y for all tenants and probably blow up compactor resources.

If the attributes thing doesn't work out, I think we should consider doing something like one of these:

disable max-lookback for a tenant if they have this option enabled. (Then just the minority of tenants with uploading enabled would contribute to waste in meta loading.)
or expose another setting for a per-tenant max-lookback override, which would probably have to be changed alongside changes to compactor.block_upload_enabled.

dmwilson-grafana · 2025-02-07T21:53:43Z

If we set max-lookback (a server-level option) to X days, then any tenant-level block upload with a block ID over X days old will go unnoticed.

Agreed. It doesn't make sense to have this as a server-level option. I did some research and seems it could be configured per-tenant if the option was defined in limits.go with an override (ref: docs). As you mentioned in the second comment, probably need some additional work on limits.

dmwilson-grafana · 2025-02-07T21:57:49Z

What do you think of a slightly different impl: using IterWithAttributes to learn each block's mtime and instead using that as a filtering device.

I believe both Iter and IterWithObjectAttributes are non-recursive (link), which would mean an extra call to check mtime for each blocks meta file at /$tenant/$block/meta.json. Also a bit hesitant to use mtime since not all IterOptions are supported by all object storage providers.

CHANGELOG.md

docs/sources/mimir/configure/about-versioning.md

docs/sources/mimir/configure/configuration-parameters/index.md

pkg/storage/tsdb/block/fetcher.go

pkg/compactor/syncer_metrics_test.go

pkg/storage/tsdb/block/fetcher.go

pkg/compactor/syncer_metrics.go

seizethedave · 2025-02-12T22:11:06Z

pkg/storage/tsdb/block/fetcher_test.go

+		require.Contains(t, actualMetas, olderULID)
+	})
+
+	t.Run("should return no block metas when fetcher lookback is set short", func(t *testing.T) {


Since ulid uses a funky and specific timestamp format, how about a test case that verifies that we're playing nice with that? Something like three blocks at -3G, -2G, -1G where G is near the timestamp granularity and verify that maxlookback of -1.5G does the right thing.

ulid is precise down to 1ms (see: docs). It's possible that we will fail to filter out blocks that are less than 1ms older than the threshold (e.g. 168h0m0.0009s), but we would never filter out blocks that are just barely within the threshold because of ulid's precision. Small illustrative example below.

func main() { ns := time.Nanosecond tdel, olddel, newdel := -1000000 * ns, -1500000 * ns, -500000 * ns now := time.Now() minAllowed, _ := ulid.New(ulid.Timestamp(now.Add(time.Duration(tdel))), nil) // threshold: 1ms justOlder, _ := ulid.New(ulid.Timestamp(now.Add(time.Duration(olddel))), nil) // 1.5ms old justNewer, _ := ulid.New(ulid.Timestamp(now.Add(time.Duration(newdel))), nil) // 0.5ms old // can be (-1, 0) OR (0, 1); fetcher skips if `id.Compare(threshold)` == -1 fmt.Println(justOlder.Compare(minAllowed), justNewer.Compare(minAllowed)) }

I can't see an easy way to test this in fetcher_test.go without getting very flaky results. I just added a test that checks that we will skip a block that's 1s beyond the threshold.

Sounds good! Thanks.

pkg/storage/tsdb/block/fetcher.go

seizethedave

LGTM!

…tion Cycles (#10585) As described in #7664, a Mimir block's meta.json file can become large if its sources list grows due to intermediate blocks from split groups, compactor merge shards, or re-compactions due to OOO blocks. When a bucket contains many blocks with many sources each, this may cause unexpectedly high resource usage on the compactor. This PR adds an experimental option, -compactor.max-lookback which can reduce the CPU time and memory spent reading blocks' metadata files by instructing the compactor to read metadata for only the blocks that have been uploaded within the max-lookback period. If set, this value should be set well above the maximum of -compactor.block-ranges (e.g. 168h) to not interfere w. the regular compaction process. This PR also adds a metric, cortex_compactor_meta_blocks_synced which describes the number of blocks discovered in object storage and labels them with their state, i.e. were they loaded, marked for deletion, skipped because they were outside the lookback period, etc.

dmwilson-grafana added 6 commits February 3, 2025 12:19

mock up behavior w.o. flags

7bb5490

go fmt

81e275d

implement compactor lookback

e002833

implement compactor lookback

c61c689

add PR number in CHANGELOG.md

7c6a5f2

Merge branch 'main' into dwilson/set-block-meta-max-sync-age

8f28bb8

dmwilson-grafana added 8 commits February 4, 2025 17:38

lint fixes

0a4655b

go lint; update generated documentation

72b64bf

update docs

b9c73ec

metric gather from tsdb to compactor

19b9bfb

Merge branch 'main' into dwilson/set-block-meta-max-sync-age

e03d3f9

remove CompactorMaxLookback from Limits

d8cc30a

update docs

a665dbb

update make-reference

f608454

dmwilson-grafana marked this pull request as ready for review February 7, 2025 18:31

dmwilson-grafana requested review from tacole02 and a team as code owners February 7, 2025 18:31

seizethedave reviewed Feb 7, 2025

View reviewed changes

overwrite maxLookback when blockUploadEnabled in compactUser

57746f0

tacole02 approved these changes Feb 8, 2025

View reviewed changes

update docs

6e18772

seizethedave reviewed Feb 12, 2025

View reviewed changes

dmwilson-grafana added 4 commits February 13, 2025 09:28

update to fetcher tests

60c0013

Merge branch 'main' into dwilson/set-block-meta-max-sync-age

d6c5729

go lint; spelling

cfeb40d

go lint; spelling

66bf5bc

Merge branch 'main' into dwilson/set-block-meta-max-sync-age

995240d

seizethedave approved these changes Feb 13, 2025

View reviewed changes

Merge branch 'main' into dwilson/set-block-meta-max-sync-age

5d13b34

seizethedave merged commit b12297b into main Feb 13, 2025
30 checks passed

seizethedave deleted the dwilson/set-block-meta-max-sync-age branch February 13, 2025 20:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Compactor.MaxLookback Option for Limiting Blocks Loaded in Compaction Cycles #10585

Add Compactor.MaxLookback Option for Limiting Blocks Loaded in Compaction Cycles #10585

dmwilson-grafana commented Feb 4, 2025 •

edited

Loading

github-actions bot commented Feb 4, 2025 •

edited

Loading

seizethedave left a comment

seizethedave commented Feb 7, 2025

dmwilson-grafana commented Feb 7, 2025

dmwilson-grafana commented Feb 7, 2025

seizethedave Feb 12, 2025

dmwilson-grafana Feb 13, 2025 •

edited

Loading

seizethedave Feb 13, 2025

seizethedave left a comment

Add Compactor.MaxLookback Option for Limiting Blocks Loaded in Compaction Cycles #10585

Add Compactor.MaxLookback Option for Limiting Blocks Loaded in Compaction Cycles #10585

Conversation

dmwilson-grafana commented Feb 4, 2025 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

github-actions bot commented Feb 4, 2025 • edited Loading

seizethedave left a comment

Choose a reason for hiding this comment

seizethedave commented Feb 7, 2025

dmwilson-grafana commented Feb 7, 2025

dmwilson-grafana commented Feb 7, 2025

seizethedave Feb 12, 2025

Choose a reason for hiding this comment

dmwilson-grafana Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

seizethedave Feb 13, 2025

Choose a reason for hiding this comment

seizethedave left a comment

Choose a reason for hiding this comment

dmwilson-grafana commented Feb 4, 2025 •

edited

Loading

github-actions bot commented Feb 4, 2025 •

edited

Loading

dmwilson-grafana Feb 13, 2025 •

edited

Loading