-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add memory efficient meta data summary #1030
base: master
Are you sure you want to change the base?
Conversation
Hello @moloney, Thank you for updating!
To test for issues locally, Comment last updated at 2021-07-13 03:30:41 UTC |
Codecov Report
@@ Coverage Diff @@
## master #1030 +/- ##
==========================================
- Coverage 92.26% 91.04% -1.22%
==========================================
Files 100 101 +1
Lines 12205 12668 +463
Branches 2136 2267 +131
==========================================
+ Hits 11261 11534 +273
- Misses 616 781 +165
- Partials 328 353 +25
Continue to review full report at Codecov.
|
Co-authored-by: Chris Markiewicz <[email protected]>
Merged master in to resolve conflicts and get the tests going. Let me know if you'd prefer I didn't do that. |
@matthew-brett / @moloney / @effigies |
@ZviBaratz Can you explain in more detail what you have in mind? I don't see how a cache helps to solve the problem of determining what meta data is varying when someone hands us a list of Dicom files we have never seen before (that could come from multiple Dicom series). |
The idea is that there will be a |
We really don't want to require all the files live in a single directory. The assumption is you are passed a list of files that could be massive even for a single series (e.g. 36K) that you have never seen before and you want to efficiently convert them into an xarray on the fly. My original implementation in dcmstack wasn't totally naive, meta data values that were constant were only stored once, and yet it required orders of magnitude more memory (18GB vs ~800MB with 36K files) compared to this approach. |
I see. |
If we want to support using multiprocessing to speed up the parsing of very large series, this would also provide a nice compact representation to pass around. |
Sorry, I lost track of this one. What's the status? Are we still trying to get this into nibabel? |
This is some work-in-progress for adding data structures for creating a memory efficient summary of a sequence of meta data dictionaries (assuming a large number of keys/values repeat) and then using this to determine how to sort the associated images into an nD array.
This approach was inspired by this dcmstack issue.