LVAE dataset: add MultiFileDset #223

veegalinova · 2024-08-24T10:29:30Z

Description

What: This is a follow-up on updating the datasets from the Disentangle repo (reference commit 57cad67 from the main branch), adding the MultiFileDset class, TilingMode and EmptyPatchFetcher
Why: The MultiFileDset is necessary for the datasets in the paper
How: Copy/paste the code from the Disentangle repo with minimal changes

Changes Made

Added:
- from the Disentangle repo:
  - copied MultiFileDset class and related classes
  - added TilingMode and updated related code in MultiChDloader
  - copied EmptyPatchFetcher
  - copied several new parameters for MultiChDloader
  - copied code for flips (using Albumentations)
  - added "load_data_fn" parameter to dataset classes to pass the data loading logic from different experiments
- added lists of the dataset types needed for the paper in the data_utils.py
- added a simple test to test the MultiFileDset object initialization
Modified:
- moved dataset configs into lvae_training/dataset/configs
- moved GridIndexManager and IndexSwitcher to separate files in lvae_training/dataset/utils
Removed:
- removed code related to the BioSR dataset from data_utils.py and tests

Please ensure your PR meets the following requirements:

Code builds and passes tests locally, including doctests
New tests have been added (for bug fixes/features)
Pre-commit passes
PR to the documentation exists (for bug fixes / features)

… fetcher;

codecov · 2024-08-24T11:13:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.97%. Comparing base (929b9b8) to head (d22cd2f).
Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #223      +/-   ##
==========================================
- Coverage   87.20%   86.97%   -0.24%     
==========================================
  Files         131      132       +1     
  Lines        4956     4882      -74     
==========================================
- Hits         4322     4246      -76     
- Misses        634      636       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/careamics/lvae_training/dataset/utils/data_utils.py

src/careamics/lvae_training/dataset/vae_dataset.py

src/careamics/lvae_training/dataset/multifile_dataset.py

#238) ### Description - **What**: Add option in `XYFilp` and `XYRandomRotate90` to transform additional arrays identically to the patch and target. Include the ability to compose these transforms. Also fixed some type hint issues, but some remain. - **Why**: The LVAE training requires the identical transformation of multiple arrays, including images and noise arrays. Previously it was implemented with the `Albumentations` library. Original motivation came from this comment chain #223 (comment). - **How**: A few options were considered, but ultimately I went with what I thought would require the least amount of the code base to change. Additional arrays can passed to the transform `__call__` function as kwargs, and they will be returned as a dict after the augmented patch and target. Composition of transforms with additional arrays has been kept as a separate method, `Compose.transform_with_additional_arrays`, mostly to not have to change too much of the code base, this can be revised and integrated with `Compose.__call__` later if desired. ### Changes Made - **Added**: - Tests for the new functionality. - `_chain_transforms_additional_arrays` and `transform_with_additional_arrays` methods to `Compose`. - **Modified**: - add `**additional_arrays: NDArray` argument to the `__call__` method of some transforms. - A few existing tests that needed to modify the unpacking of transform outputs, now that they output a tuple of 3 rather than 2. ### Breaking changes Code calling transforms directly, the output is now a tuple of 3 rather than 2. ### Additional Notes and Examples #### Note One of the biggest pain points that hasn't been fully resolved is that the `N2VManipulate` transform doesn't have the same function signature in the `__call__` method as the other transforms, it returns 3 arrays rather than 2. We know it gets applied last in the algorithm but we could make it more explicit somehow, this would resolve some annoying type hinting. #### Converting LVAE transform code https://github.com/CAREamics/careamics/blob/b008187292f269455272c92c9009d45a330e7ca0/src/careamics/lvae_training/dataset/vae_dataset.py#L867-L881 To convert the code snippet above, I think (not tested) the following can be done, assuming `self.rotation_transform` is created using the CAREamics `Compose` instead of that from Albumentations: ```python img_kwargs = {} for i, img in enumerate(img_tuples): for k in range(len(img)): img_kwargs[f"img{i}_{k}"] = img[k] noise_kwargs = {} for i, nimg in enumerate(noise_tuples): for k in range(len(nimg)): noise_kwargs[f"noise{i}_{k}"] = nimg[k] img, _, rot_dic = self._rotation_transform.transform_with_additional_arrays( img_tuples[0][0], **img_kwargs, **noise_kwargs ) ``` --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) --------- Co-authored-by: Joran Deschamps <[email protected]>

…ion; init imports

veegalinova added 7 commits August 23, 2024 13:19

extract index manager and switcher to separate files; add empty patch…

74cfcd5

… fetcher;

move files around

60c9c95

add missing inits

27f06fe

add changes from Disentangle for vae_dataset and index_manager

73b2261

add multifile dataset

9b54ef0

remove biosr and mrc logic

c7310a3

add multifile dataset test

faad30f

veegalinova requested review from ashesh-0 and melisande-c August 24, 2024 10:29

veegalinova and others added 3 commits August 24, 2024 12:30

Merge branch 'main' into vg/feat/multifile_dataset

386cb2a

move albumentations import to prevent tests from failing

ad6e7de

re-enable tests

8ec586e

ashesh-0 requested changes Aug 26, 2024

View reviewed changes

src/careamics/lvae_training/dataset/utils/data_utils.py Outdated Show resolved Hide resolved

melisande-c mentioned this pull request Aug 26, 2024

Feature: Function that computes CAREamics TileInformation with GridIndexManager #217

Merged

4 tasks

CatEek reviewed Aug 28, 2024

View reviewed changes

src/careamics/lvae_training/dataset/vae_dataset.py Show resolved Hide resolved

jdeschamps and others added 3 commits September 2, 2024 15:08

Merge branch 'main' into vg/feat/multifile_dataset

a1b1abc

add load_data_fn parameter to datasets

53047dd

rename vae dataset back to MultiChDataset for compatibility

ccafc1f

jdeschamps requested a review from ashesh-0 September 4, 2024 15:14

ashesh-0 approved these changes Sep 4, 2024

View reviewed changes

src/careamics/lvae_training/dataset/multifile_dataset.py Outdated Show resolved Hide resolved

melisande-c mentioned this pull request Sep 6, 2024

Feature: Transform multiple arrays identically to patch and target #238

Merged

4 tasks

jdeschamps and others added 2 commits September 9, 2024 09:26

Merge branch 'main' into vg/feat/multifile_dataset

e8e76cb

update lvae_tiled_patching with TilingMode parameter

58c050a

veegalinova and others added 5 commits September 9, 2024 15:53

merge main branch

dc3b5c6

update new tests with pytest tag

7644c37

update multifile dataset with TwoChannelData and MultiChannelData

5dd4d30

Merge branch 'main' into vg/feat/multifile_dataset

9bd8c85

update datasets with target normalization; file merge and reorganizat…

7b4cb1d

…ion; init imports

Merge branch 'main' into vg/feat/multifile_dataset

d22cd2f

veegalinova merged commit 59fa2dd into main Sep 18, 2024
18 checks passed

veegalinova deleted the vg/feat/multifile_dataset branch September 18, 2024 19:02

veegalinova mentioned this pull request Sep 18, 2024

add examples of dataset initialization with new configuration CAREamics/MicroSplit-reproducibility#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LVAE dataset: add MultiFileDset #223

LVAE dataset: add MultiFileDset #223

veegalinova commented Aug 24, 2024 •

edited

Loading

codecov bot commented Aug 24, 2024 •

edited

Loading

LVAE dataset: add MultiFileDset #223

LVAE dataset: add MultiFileDset #223

Conversation

veegalinova commented Aug 24, 2024 • edited Loading

Description

Changes Made

codecov bot commented Aug 24, 2024 • edited Loading

Codecov Report

veegalinova commented Aug 24, 2024 •

edited

Loading

codecov bot commented Aug 24, 2024 •

edited

Loading