Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] daft-connect range use python generator #3308

Merged
merged 10 commits into from
Nov 20, 2024
Merged

Conversation

andrewgazelka
Copy link
Contributor

No description provided.

@andrewgazelka andrewgazelka changed the title [REFACTOR] range use python generator [REFACTOR] daft-connect range use python generator Nov 18, 2024
@andrewgazelka andrewgazelka changed the title [REFACTOR] daft-connect range use python generator [FEAT] daft-connect range use python generator Nov 18, 2024
@github-actions github-actions bot added the enhancement New feature or request label Nov 18, 2024
Copy link

codspeed-hq bot commented Nov 18, 2024

CodSpeed Performance Report

Merging #3308 will improve performances by 59.54%

Comparing andrew/improve-range (4f1210c) with main (731a73e)

Summary

⚡ 2 improvements
✅ 15 untouched benchmarks

Benchmarks breakdown

Benchmark main andrew/improve-range Change
test_iter_rows_first_row[100 Small Files] 320.4 ms 200.9 ms +59.54%
test_show[100 Small Files] 23 ms 14.9 ms +54.2%

Copy link

codecov bot commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 60.37736% with 63 lines in your changes missing coverage. Please review.

Project coverage is 77.39%. Comparing base (7922d2d) to head (fb08980).

Files with missing lines Patch % Lines
daft/io/_range.py 0.00% 23 Missing ⚠️
src/daft-connect/src/translation/logical_plan.rs 63.79% 21 Missing ⚠️
src/daft-connect/src/op/execute.rs 50.00% 12 Missing ⚠️
src/daft-connect/src/lib.rs 0.00% 6 Missing ⚠️
src/daft-connect/src/op/execute/root.rs 97.91% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3308      +/-   ##
==========================================
+ Coverage   77.35%   77.39%   +0.03%     
==========================================
  Files         678      677       -1     
  Lines       83119    83114       -5     
==========================================
+ Hits        64296    64323      +27     
+ Misses      18823    18791      -32     
Files with missing lines Coverage Δ
src/daft-connect/src/translation/schema.rs 0.00% <ø> (ø)
src/daft-connect/src/op/execute/root.rs 97.91% <97.91%> (ø)
src/daft-connect/src/lib.rs 46.27% <0.00%> (-1.27%) ⬇️
src/daft-connect/src/op/execute.rs 80.95% <50.00%> (ø)
src/daft-connect/src/translation/logical_plan.rs 63.79% <63.79%> (ø)
daft/io/_range.py 0.00% <0.00%> (ø)
---- 🚨 Try these New Features:


def _range_generators(start: int, end: int, step: int) -> Iterator[Callable[[], Iterator[Table]]]:
def generator_for_value(value: int) -> Callable[[], Iterator[Table]]:
def generator() -> Iterator[Table]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is generating 1 row per table which will be extremely slow. Instead you should be calculating what range should go in each partition and then call something like Series.arange to generate the series in rust. Each partition can just be 1 table.

Copy link
Contributor Author

@andrewgazelka andrewgazelka Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what you are thinking interop will be like in this case. Are you thinking

  • Series.arange is in rust →
  • table gets sent to python (Iterator of Tables) →
  • get python RangeScanOperator →
  • then get rust binding of RangeScanOperator (let scan_operator_handle = ScanOperatorHandle::from_python_scan_operator(range, py)?;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samster25 I added partitions in

f58d2ae

also added #3334 so we can switch over to pure arange approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolving this as no response

src/daft-connect/src/op/execute/root.rs Outdated Show resolved Hide resolved
tests/connect/test_range_simple.py Show resolved Hide resolved
@andrewgazelka andrewgazelka force-pushed the andrew/improve-range branch 2 times, most recently from 3244faf to fb08980 Compare November 20, 2024 00:58
daft/io/_range.py Outdated Show resolved Hide resolved
tests/connect/test_range_simple.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

andrewgazelka commented Nov 20, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

@andrewgazelka andrewgazelka mentioned this pull request Nov 20, 2024
@andrewgazelka andrewgazelka enabled auto-merge (squash) November 20, 2024 03:30
@andrewgazelka andrewgazelka merged commit 066cde1 into main Nov 20, 2024
43 checks passed
@andrewgazelka andrewgazelka deleted the andrew/improve-range branch November 20, 2024 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants