-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Overwrite partitions mode #3687
Conversation
CodSpeed Performance ReportMerging #3687 will improve performances by 42.87%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3687 +/- ##
==========================================
+ Coverage 77.79% 77.94% +0.14%
==========================================
Files 729 725 -4
Lines 90408 90967 +559
==========================================
+ Hits 70335 70904 +569
+ Misses 20073 20063 -10
|
all_file_paths = [] | ||
if overwrite_partitions: | ||
# Get all files in ONLY the directories that were written to. | ||
|
||
written_dirs = set(str(pathlib.Path(path).parent) for path in written_file_paths.to_pylist()) | ||
for dir in written_dirs: | ||
file_selector = pafs.FileSelector(dir, recursive=True) | ||
try: | ||
all_file_paths.extend( | ||
[info.path for info in fs.get_file_info(file_selector) if info.type == pafs.FileType.File] | ||
) | ||
except FileNotFoundError: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@desmondcheongzx new implementation ready for review, where we only look for files to delete IF they are in the partition directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for making it clearer!
Closes #1768
Overwrite-partitions mode will only overwrite files in the partition directories that were written into as part of the write operation. E.g. partition "A" will be overwritten if and only if partition "A" was written into.
This PR also refactors the test code a bit.