-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Modify optimized compaction to cover edge cases #25594
Merged
+664
−208
Merged
Changes from 3 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
d631314
feat: Modify optimized compaction to cover edge cases
devanbenz 67849ae
feat: Modify the PR to include optimized compaction
devanbenz 827e859
feat: Use named variables for PlanOptimize
devanbenz 5387ca3
feat: adjust test comments
devanbenz 3153596
feat: code removal from debugging
devanbenz 83d28ec
feat: setting BlockCount idx value to 1
devanbenz f896a01
feat: Adjust testing and add sprintf for magic vars
devanbenz f15d9be
feat: need to use int64 instead of int
devanbenz 54c8e1c
feat: touch
devanbenz 403d888
feat: Adjust tests to include lower level planning function calls
devanbenz 23d12e1
feat: Fix up some tests that I forgot to adjust
devanbenz d3afb03
feat: fix typo
devanbenz cf657a8
feat: touch
devanbenz fc6ca13
feat: Call SingleGenerationReason() once by initializing a
devanbenz 4fc4d55
feat: clarify file counts for reason we are not fully compacted
devanbenz c93bdfb
feat: grammar typo
devanbenz 2dd5ef4
feat: missed a test when updating the variable! whoops!
devanbenz 479de96
feat: Add test for another edge case found;
devanbenz c392906
feat: Remove some overlapping tests
devanbenz f444518
feat: Adds check for block counts and adjusts tests to use require.Ze…
devanbenz 5e4e2da
feat: Adds test for planning lower level TSMs with block sizes at agg…
devanbenz c315b1f
chore: rerun ci
devanbenz eb0a77d
feat: Add a mock backfill test with mixed generations, mixed levels, …
devanbenz 1bac192
Merge branch 'master-1.x' into db/4201/compaction-bugs
devanbenz 371f960
feat: Fix a merge conflict where a var was renamed from fs -> fss
devanbenz 5a614c4
feat: Adding more tests reversing and mixing up some of the
devanbenz 3748c36
feat: Begin 'compacting' tests in to single test
devanbenz 0799f00
feat: create loop for tests where there should be no further compaction
devanbenz 3e69f2d
feat: cleanup
devanbenz 976291a
feat: Add test names to the testing struct
devanbenz 0a2ba1e
feat: Use t.Run instead of declaring the test name in the requires
devanbenz 8c908c5
feat: Reverse block counts
devanbenz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly confused whether this is still what we want to do. We skip a group (i.e., a generation) here if it is large (sum of all files is larger than the largest permissible single file), and the first file has the default maximum points per block and there are no tombstones.
This seems to be mixing metrics from the first file in the generation (points per block) with metrics from the whole generation (combined file size). Do we need to look at the points per block of all the files in the generation? Why are we skipping a generation if it is larger than a single file can be? What's the significance of that?
I understand the original code had this strange mix of conditionals, but do we understand why, and whether we should continue with them? At the very least, the comment
Skip the file if...
is misleading, because we are skipping a generation which may contain more than one file, are we not?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think the comment is a bit misleading. I was mostly just keeping Plan, and PlanLevel as is... I would have no problem with modifying the existing logic in them though. Perhaps instead of checking individual file block counts and the entire group size against 2 GB I take the approach checking all the files in the group and all the block sizes in the group? Some pseudo code:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After consideration, I think you were right, @devanbenz, to change
Plan
andPlanLevel
minimally. While their algorithms are obtuse, we shouldn't change them in the PR or at this time, to minimize the risks in what is already a large change to compaction.