Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel-dependent normalization #134

Merged
merged 35 commits into from
Jun 14, 2024
Merged

Channel-dependent normalization #134

merged 35 commits into from
Jun 14, 2024

Conversation

CatEek
Copy link
Contributor

@CatEek CatEek commented Jun 6, 2024

Description

Input patches and targets should be normalized separately, because they may have very different pixel value ranges. Also normalization was done for all channels at once, which might not be correct for cases where 2 channels should not be mixed.

  • What: Added separate normalization for images and targets and per channel normalization
  • Why: Normalizing everything with the same stats is not correct

Changes Made

  • Normalization func + some changes in the datasets, training and prediction loops.
  • Reelevant tests

Please ensure your PR meets the following requirements:

  • Code builds and passes tests locally, including doctests
  • New tests have been added (for bug fixes/features)
  • Pre-commit passes
  • No change to the documentation needed

@CatEek CatEek requested review from jdeschamps and melisande-c June 6, 2024 10:15
Copy link
Member

@jdeschamps jdeschamps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the merge with the most recent main, and that introduced some errors. So I pushed some fixes now, but not all tests pass.

Mypy is definitely not passing...

The main points in the review are the following:

  • dataclasses use
  • improving the normalization with broadcasting
  • type declaration for the stats in the Pydantic model

src/careamics/config/data_model.py Show resolved Hide resolved
src/careamics/config/data_model.py Outdated Show resolved Hide resolved
src/careamics/config/data_model.py Outdated Show resolved Hide resolved
src/careamics/dataset/iterable_dataset.py Outdated Show resolved Hide resolved
src/careamics/dataset/iterable_dataset.py Outdated Show resolved Hide resolved
src/careamics/dataset/iterable_dataset.py Outdated Show resolved Hide resolved
src/careamics/transforms/normalize.py Outdated Show resolved Hide resolved
src/careamics/transforms/normalize.py Outdated Show resolved Hide resolved
tests/transforms/test_compose.py Outdated Show resolved Hide resolved
@jdeschamps jdeschamps changed the title Iz/feat/separate norm Channel-dependent normalization Jun 6, 2024
@jdeschamps
Copy link
Member

There is still one test no passing related to the BMZ, potentially because it was before using a tensor [-1, 1] with meant that the normalization was not mattering so much for the result.

Now we are again with small differences between CAREamics prediction and BMZ one. Let's understand where it comes from.

@jdeschamps jdeschamps mentioned this pull request Jun 7, 2024
4 tasks
CatEek pushed a commit that referenced this pull request Jun 11, 2024
### Description

> **tldr**:
> - Split prediction datasets into tiled and non-tiled
> - Simplify stitching by passing `TileInformation` all along
> - Fix #125 
> - Change `total` memory check to `available`



When not tiling prediction, the images are forced through the tiling
pipeline (with a `TileInformation` class being passed along the images),
and it makes the debugging complex. This PR splits the prediction
datasets into tiled and not-tiled.

I also changed the `total` memory into `available` for the switch
between in memory and iterable datasets during training, as it better
represents what can be loaded in memory.

Finally, I simplified the stitching and prediction pipeline by passing
the `TileInformation` further.

- **What**: Refactor prediction datasets into tiled and non-tiled
datasets, simplify stitching.
- **Why**: Avoids forcing non-tiled prediction through the same complex
pipeline as the tiled one.
- **How**: Split the two features (tiled and non tiled predictions) into
two datasets.

### Changes Made

- **Added**: 
    - *dataset/iterable_pred_dataset.py*
    - *dataset/iterable_tiled_pred_dataset.py* 
    - *dataset/in_memory_pred_dataset.py*
    - *dataset/in_memory_tiled_pred_dataset.py* 
- **Modified**: `get_ram_size` now looks at available memory.
- **Removed**: Removed useless calls to `sort` in datasets, `is_tile` in
`TilingInformation`.

### Related issues

This PR fixes #125.

### Notes

This PR will create merge issues with #134.

Currently, there are two issues remaining:
- `extract_tile` returns C(Z)YX, while the non tiling datasets always
return SC(Z)YX with a singleton dimension
- it is not clear when we should, and when, cast the `Tensors` into
`np.ndarray`. I tried to make it happen in the same place (in the
prediction loop), but that is not entirely solved.

---

**Please ensure your PR meets the following requirements:**

- [x] Code builds and passes tests locally, including doctests
- [x] New tests have been added (for bug fixes/features)
- [x] Pre-commit passes
- [x] PR to the documentation exists (for bug fixes / features)
@jdeschamps jdeschamps self-requested a review June 14, 2024 13:58
@jdeschamps jdeschamps merged commit 07fb84e into main Jun 14, 2024
15 checks passed
@jdeschamps jdeschamps deleted the iz/feat/separate_norm branch June 14, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants