-
-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask support for first, last, first_n and last_n reductions #1214
Dask support for first, last, first_n and last_n reductions #1214
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1214 +/- ##
==========================================
+ Coverage 84.52% 84.83% +0.31%
==========================================
Files 35 35
Lines 8369 8561 +192
==========================================
+ Hits 7074 7263 +189
- Misses 1295 1298 +3
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whew! That didn't look fun, but I'm very glad to see it. I didn't spot any issues, but I may or may not have glazed over when reviewing certain bits. :-)
It hadn't occurred to me to be entertaining. I can try that approach, but it may come at the expense of code accuracy 😁 |
I've rebased it on |
This fixes the Dask parts of issues #1182 and #1207. CUDA support will follow in a separate PR, this is already too big.
Full list of example reductions this supports on CPU with and without Dask:
first('value')
- returns singlevalue
per pixellast('value')
- dittofirst_n('value', n=3)
- returnsn=3
value
s per pixellast_n('value', n=3)
- dittowhere(first('value'))
- returns single row index per pixelwhere(last('value'))
- dittowhere(first_n('value', n=3))
- returnsn=3
row indexes per pixelwhere(last_n('value', n=3))
- dittowhere(first('value'), 'other')
- returns singleother
per pixelwhere(last('value'), 'other')
- dittowhere(first_n('value', n=3), 'other')
- returnsn=3
other
s per pixelwhere(last_n('value', n=3), 'other')
- dittoSummary of changes:
Some
reduction
behaviour is now dependent on whether supplied data is partitioned (dask or CUDA) or not, so extra argumentscuda
andpartitioned
are passed through the class hierarchy.I have factored out some common
first
andlast
code into a private base class_first_or_last
to simplify, and similar applies to_first_n_or_last_n
.The fundamental addition is that
first('value')
using Dask is implemented aswhere(_min_row_index(), 'value')
. This takes advantage of the existingwhere
machinery and a new class_min_row_index
which is very similar to amin
reduction except that it accepts a virtual row index column, not a real column supplied by the user. I could have extended themin
class to cover this, but I thought it better to keep the new functionality in a new private class. It is possible to instantiate the private class of course, although no user code should do this, and I have done this in the test suite. There are equivalent new private classes forlast
,first_n
andlast_n
.I do need to recheck that trying to use CUDA with these reductions raises informative error messages not crashes.