-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#12081: "Add deleteFileThreshold parameter to SizeBasedDataRewriter, update logic, and include tests" #12133
base: main
Are you sure you want to change the base?
#12081: "Add deleteFileThreshold parameter to SizeBasedDataRewriter, update logic, and include tests" #12133
Conversation
Title of this pr looks incorrect? This is a new feature and not a fix. The PR also would require some tests to prove the new parameter is working as expected. |
I have changed the PR title. |
core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java
Outdated
Show resolved
Hide resolved
@RussellSpitzer Addressed your comments:
|
core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java
Show resolved
Hide resolved
@@ -67,6 +68,8 @@ public Set<String> validOptions() { | |||
public void init(Map<String, String> options) { | |||
super.init(options); | |||
this.deleteFileThreshold = deleteFileThreshold(options); | |||
this.deleteRatioThreshold = PropertyUtil.propertyAsDouble( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also need to validate that 0.0 > threshold <= 1.0
. That would also require some additional unit tests that check for valid/invalid parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also this.deleteRatioThreshold
needs to be used in tooHighDeleteRatio()
This PR refines the SizeBasedDataRewriter class in the Apache Iceberg project, addressing issues related to the constructor initialization and logic for delete file threshold handling. It makes the class more robust by improving its flexibility and functionality in the context of delete-based data file rewriting.
Constructor Updates:
Refactored constructors to properly initialize the deleteRatioThreshold and deleteFileThreshold parameters.
Added an additional constructor that allows specifying the deleteRatioThreshold value explicitly while maintaining backward compatibility with the default constructor.