Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast path for common_chunks where only one disk array #224

Merged
merged 5 commits into from
Feb 10, 2025

Conversation

rafaqz
Copy link
Collaborator

@rafaqz rafaqz commented Feb 5, 2025

Closes #222

At least for the case of a single disk array, the profile is back to being all GDAL reads.

However it can still be slow for multiple GDAL style column chunked arrays.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Feb 5, 2025

Maybe we need to define == on chunks so it doesn't iterate over all the values?

@rafaqz
Copy link
Collaborator Author

rafaqz commented Feb 8, 2025

Ok I added == on chunks, its a little complicated for RegularChunks as the chunks can match with different chunk size and offset for one or two chunks.

But should help performance of common_chunks in the multiple array case as well.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Feb 10, 2025

@meggart can you have a final look at this? I added equality on chunks

@meggart meggart merged commit efcd9bb into main Feb 10, 2025
10 checks passed
@meggart
Copy link
Collaborator

meggart commented Feb 10, 2025

Looks good!

@rafaqz rafaqz deleted the fast_path_commonchunks branch February 10, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

common_chunks on SubDiskArray can be a large fraction of io cost
2 participants