Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LVAE dataset: add MultiFileDset #223

Merged
merged 21 commits into from
Sep 18, 2024
Merged

LVAE dataset: add MultiFileDset #223

merged 21 commits into from
Sep 18, 2024

Conversation

veegalinova
Copy link
Collaborator

@veegalinova veegalinova commented Aug 24, 2024

Description

  • What: This is a follow-up on updating the datasets from the Disentangle repo (reference commit 57cad67 from the main branch), adding the MultiFileDset class, TilingMode and EmptyPatchFetcher
  • Why: The MultiFileDset is necessary for the datasets in the paper
  • How: Copy/paste the code from the Disentangle repo with minimal changes

Changes Made

  • Added:
    • from the Disentangle repo:
      • copied MultiFileDset class and related classes
      • added TilingMode and updated related code in MultiChDloader
      • copied EmptyPatchFetcher
      • copied several new parameters for MultiChDloader
      • copied code for flips (using Albumentations)
      • added "load_data_fn" parameter to dataset classes to pass the data loading logic from different experiments
    • added lists of the dataset types needed for the paper in the data_utils.py
    • added a simple test to test the MultiFileDset object initialization
  • Modified:
    • moved dataset configs into lvae_training/dataset/configs
    • moved GridIndexManager and IndexSwitcher to separate files in lvae_training/dataset/utils
  • Removed:
    • removed code related to the BioSR dataset from data_utils.py and tests

Please ensure your PR meets the following requirements:

  • Code builds and passes tests locally, including doctests
  • New tests have been added (for bug fixes/features)
  • Pre-commit passes
  • PR to the documentation exists (for bug fixes / features)

Copy link

codecov bot commented Aug 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.97%. Comparing base (929b9b8) to head (d22cd2f).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #223      +/-   ##
==========================================
- Coverage   87.20%   86.97%   -0.24%     
==========================================
  Files         131      132       +1     
  Lines        4956     4882      -74     
==========================================
- Hits         4322     4246      -76     
- Misses        634      636       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jdeschamps jdeschamps requested a review from ashesh-0 September 4, 2024 15:14
jdeschamps added a commit that referenced this pull request Sep 9, 2024
#238)

### Description

- **What**: Add option in `XYFilp` and `XYRandomRotate90` to transform
additional arrays identically to the patch and target. Include the
ability to compose these transforms. Also fixed some type hint issues,
but some remain.
- **Why**: The LVAE training requires the identical transformation of
multiple arrays, including images and noise arrays. Previously it was
implemented with the `Albumentations` library. Original motivation came
from this comment chain
#223 (comment).
- **How**: A few options were considered, but ultimately I went with
what I thought would require the least amount of the code base to
change. Additional arrays can passed to the transform `__call__`
function as kwargs, and they will be returned as a dict after the
augmented patch and target. Composition of transforms with additional
arrays has been kept as a separate method,
`Compose.transform_with_additional_arrays`, mostly to not have to change
too much of the code base, this can be revised and integrated with
`Compose.__call__` later if desired.

### Changes Made

- **Added**: 
  - Tests for the new functionality.
- `_chain_transforms_additional_arrays` and
`transform_with_additional_arrays` methods to `Compose`.
- **Modified**: 
- add `**additional_arrays: NDArray` argument to the `__call__` method
of some transforms.
- A few existing tests that needed to modify the unpacking of transform
outputs, now that they output a tuple of 3 rather than 2.

### Breaking changes

Code calling transforms directly, the output is now a tuple of 3 rather
than 2.


### Additional Notes and Examples

#### Note

One of the biggest pain points that hasn't been fully resolved is that
the `N2VManipulate` transform doesn't have the same function signature
in the `__call__` method as the other transforms, it returns 3 arrays
rather than 2. We know it gets applied last in the algorithm but we
could make it more explicit somehow, this would resolve some annoying
type hinting.

#### Converting LVAE transform code


https://github.com/CAREamics/careamics/blob/b008187292f269455272c92c9009d45a330e7ca0/src/careamics/lvae_training/dataset/vae_dataset.py#L867-L881

To convert the code snippet above, I think (not tested) the following
can be done, assuming `self.rotation_transform` is created using the
CAREamics `Compose` instead of that from Albumentations:

```python
img_kwargs = {}
for i, img in enumerate(img_tuples):
    for k in range(len(img)):
        img_kwargs[f"img{i}_{k}"] = img[k]

noise_kwargs = {}
for i, nimg in enumerate(noise_tuples):
    for k in range(len(nimg)):
        noise_kwargs[f"noise{i}_{k}"] = nimg[k]

img, _, rot_dic = self._rotation_transform.transform_with_additional_arrays(
    img_tuples[0][0], **img_kwargs, **noise_kwargs
)
```

---

**Please ensure your PR meets the following requirements:**

- [x] Code builds and passes tests locally, including doctests
- [x] New tests have been added (for bug fixes/features)
- [x] Pre-commit passes
- [ ] PR to the documentation exists (for bug fixes / features)

---------

Co-authored-by: Joran Deschamps <[email protected]>
@veegalinova veegalinova merged commit 59fa2dd into main Sep 18, 2024
18 checks passed
@veegalinova veegalinova deleted the vg/feat/multifile_dataset branch September 18, 2024 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants