-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channel-dependent normalization #134
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the merge with the most recent main, and that introduced some errors. So I pushed some fixes now, but not all tests pass.
Mypy is definitely not passing...
The main points in the review are the following:
- dataclasses use
- improving the normalization with broadcasting
- type declaration for the stats in the Pydantic model
There is still one test no passing related to the BMZ, potentially because it was before using a tensor [-1, 1] with meant that the normalization was not mattering so much for the result. Now we are again with small differences between CAREamics prediction and BMZ one. Let's understand where it comes from. |
### Description > **tldr**: > - Split prediction datasets into tiled and non-tiled > - Simplify stitching by passing `TileInformation` all along > - Fix #125 > - Change `total` memory check to `available` When not tiling prediction, the images are forced through the tiling pipeline (with a `TileInformation` class being passed along the images), and it makes the debugging complex. This PR splits the prediction datasets into tiled and not-tiled. I also changed the `total` memory into `available` for the switch between in memory and iterable datasets during training, as it better represents what can be loaded in memory. Finally, I simplified the stitching and prediction pipeline by passing the `TileInformation` further. - **What**: Refactor prediction datasets into tiled and non-tiled datasets, simplify stitching. - **Why**: Avoids forcing non-tiled prediction through the same complex pipeline as the tiled one. - **How**: Split the two features (tiled and non tiled predictions) into two datasets. ### Changes Made - **Added**: - *dataset/iterable_pred_dataset.py* - *dataset/iterable_tiled_pred_dataset.py* - *dataset/in_memory_pred_dataset.py* - *dataset/in_memory_tiled_pred_dataset.py* - **Modified**: `get_ram_size` now looks at available memory. - **Removed**: Removed useless calls to `sort` in datasets, `is_tile` in `TilingInformation`. ### Related issues This PR fixes #125. ### Notes This PR will create merge issues with #134. Currently, there are two issues remaining: - `extract_tile` returns C(Z)YX, while the non tiling datasets always return SC(Z)YX with a singleton dimension - it is not clear when we should, and when, cast the `Tensors` into `np.ndarray`. I tried to make it happen in the same place (in the prediction loop), but that is not entirely solved. --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [x] PR to the documentation exists (for bug fixes / features)
Description
Input patches and targets should be normalized separately, because they may have very different pixel value ranges. Also normalization was done for all channels at once, which might not be correct for cases where 2 channels should not be mixed.
Changes Made
Please ensure your PR meets the following requirements: