Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask support for first, last, first_n and last_n reductions #1214

Merged
merged 9 commits into from
May 16, 2023
Merged

Dask support for first, last, first_n and last_n reductions #1214

merged 9 commits into from
May 16, 2023

Conversation

ianthomas23
Copy link
Member

This fixes the Dask parts of issues #1182 and #1207. CUDA support will follow in a separate PR, this is already too big.

Full list of example reductions this supports on CPU with and without Dask:

  • first('value') - returns single value per pixel
  • last('value') - ditto
  • first_n('value', n=3) - returns n=3 values per pixel
  • last_n('value', n=3) - ditto
  • where(first('value')) - returns single row index per pixel
  • where(last('value')) - ditto
  • where(first_n('value', n=3)) - returns n=3 row indexes per pixel
  • where(last_n('value', n=3)) - ditto
  • where(first('value'), 'other') - returns single other per pixel
  • where(last('value'), 'other') - ditto
  • where(first_n('value', n=3), 'other') - returns n=3 others per pixel
  • where(last_n('value', n=3), 'other') - ditto

Summary of changes:

  1. Some reduction behaviour is now dependent on whether supplied data is partitioned (dask or CUDA) or not, so extra arguments cuda and partitioned are passed through the class hierarchy.

  2. I have factored out some common first and last code into a private base class _first_or_last to simplify, and similar applies to _first_n_or_last_n.

  3. The fundamental addition is that first('value') using Dask is implemented as where(_min_row_index(), 'value'). This takes advantage of the existing where machinery and a new class _min_row_index which is very similar to a min reduction except that it accepts a virtual row index column, not a real column supplied by the user. I could have extended the min class to cover this, but I thought it better to keep the new functionality in a new private class. It is possible to instantiate the private class of course, although no user code should do this, and I have done this in the test suite. There are equivalent new private classes for last, first_n and last_n.

I do need to recheck that trying to use CUDA with these reductions raises informative error messages not crashes.

@ianthomas23 ianthomas23 added this to the v0.14.5 milestone May 11, 2023
@codecov
Copy link

codecov bot commented May 11, 2023

Codecov Report

Merging #1214 (25bd257) into main (8092f4d) will increase coverage by 0.31%.
The diff coverage is 95.09%.

@@            Coverage Diff             @@
##             main    #1214      +/-   ##
==========================================
+ Coverage   84.52%   84.83%   +0.31%     
==========================================
  Files          35       35              
  Lines        8369     8561     +192     
==========================================
+ Hits         7074     7263     +189     
- Misses       1295     1298       +3     
Impacted Files Coverage Δ
datashader/data_libraries/pandas.py 100.00% <ø> (ø)
datashader/reductions.py 84.68% <93.15%> (+1.56%) ⬆️
datashader/compiler.py 91.07% <100.00%> (+0.50%) ⬆️
datashader/data_libraries/dask.py 95.23% <100.00%> (+0.07%) ⬆️
datashader/data_libraries/dask_xarray.py 98.95% <100.00%> (ø)
datashader/utils.py 81.86% <100.00%> (+2.61%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whew! That didn't look fun, but I'm very glad to see it. I didn't spot any issues, but I may or may not have glazed over when reviewing certain bits. :-)

@ianthomas23
Copy link
Member Author

I may or may not have glazed over when reviewing certain bits. :-)

It hadn't occurred to me to be entertaining. I can try that approach, but it may come at the expense of code accuracy 😁

@ianthomas23
Copy link
Member Author

I've rebased it on main to pick up the fast CUDA mutex PR, and added a couple of error messages. There are no functional code changes so I will merge then CI is green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants