Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#12081: "Add deleteFileThreshold parameter to SizeBasedDataRewriter, update logic, and include tests" #12133

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jangalasriramd7
Copy link

This PR refines the SizeBasedDataRewriter class in the Apache Iceberg project, addressing issues related to the constructor initialization and logic for delete file threshold handling. It makes the class more robust by improving its flexibility and functionality in the context of delete-based data file rewriting.

Constructor Updates:
Refactored constructors to properly initialize the deleteRatioThreshold and deleteFileThreshold parameters.
Added an additional constructor that allows specifying the deleteRatioThreshold value explicitly while maintaining backward compatibility with the default constructor.

@github-actions github-actions bot added the core label Jan 29, 2025
@RussellSpitzer
Copy link
Member

Title of this pr looks incorrect? This is a new feature and not a fix. The PR also would require some tests to prove the new parameter is working as expected.

@jangalasriramd7 jangalasriramd7 changed the title Fixing #12081: Fix SizeBasedDataRewriter constructors and related logic #12081: "Add deleteFileThreshold parameter to SizeBasedDataRewriter, update logic, and include tests" Jan 29, 2025
@jangalasriramd7
Copy link
Author

I have changed the PR title.

@jangalasriramd7
Copy link
Author

@RussellSpitzer Addressed your comments:

  • Replaced the magic number with DEFAULT_DELETE_THRESHOLD
  • Moved deleteRatioThreshold to operate as an option
    Let me know if further changes are needed!

@@ -67,6 +68,8 @@ public Set<String> validOptions() {
public void init(Map<String, String> options) {
super.init(options);
this.deleteFileThreshold = deleteFileThreshold(options);
this.deleteRatioThreshold = PropertyUtil.propertyAsDouble(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to validate that 0.0 > threshold <= 1.0. That would also require some additional unit tests that check for valid/invalid parameters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this.deleteRatioThreshold needs to be used in tooHighDeleteRatio()

@jangalasriramd7
Copy link
Author

Could you please help me out in resolving the error.

This build must be run with JDK 11 or 17 or 21 but was executed with JDK 23

image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants