-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] daft-connect range use python generator #3308
Conversation
CodSpeed Performance ReportMerging #3308 will improve performances by 59.54%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3308 +/- ##
==========================================
+ Coverage 77.35% 77.39% +0.03%
==========================================
Files 678 677 -1
Lines 83119 83114 -5
==========================================
+ Hits 64296 64323 +27
+ Misses 18823 18791 -32
|
daft/io/_range.py
Outdated
|
||
def _range_generators(start: int, end: int, step: int) -> Iterator[Callable[[], Iterator[Table]]]: | ||
def generator_for_value(value: int) -> Callable[[], Iterator[Table]]: | ||
def generator() -> Iterator[Table]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is generating 1 row per table which will be extremely slow. Instead you should be calculating what range should go in each partition and then call something like Series.arange
to generate the series in rust. Each partition can just be 1 table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what you are thinking interop will be like in this case. Are you thinking
- Series.arange is in rust →
- table gets sent to python (Iterator of Tables) →
- get python RangeScanOperator →
- then get rust binding of RangeScanOperator (
let scan_operator_handle = ScanOperatorHandle::from_python_scan_operator(range, py)?;
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samster25 I added partitions in
also added #3334 so we can switch over to pure arange
approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolving this as no response
3244faf
to
fb08980
Compare
fb08980
to
a67b1bf
Compare
a67b1bf
to
3901ec2
Compare
3901ec2
to
4f1210c
Compare
No description provided.