-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring: lightning API package and smoke tests #161
Conversation
This PR might interact with #153 |
### Description Fixes a bug with stitching multiple prediction outputs - **What**: There was a mistake when splitting tiles into groups for their respective images. This only became a problem with > 1 images. - **How**: Very simple fix; a last tile from the previous group was being added to the next. Just needed to +1 to the index. ### Changes Made - **Modified**: `stitch_prediction` function. +1 to index when creating `image_slices`. ### Related Issues - Fixes #162 ### Additional Notes and Examples I did not an extra test because I think they are added in #161 --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [ ] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple notes on the lightning API that are maybe not relevant to this PR since they weren't altered here.
- This is just my opinion so feel free to ignore: but I find all the "
Wrapper
" type classes unnecessary and confusing. These include theTrainingDataWrapper
,CAREamicsModuleWrapper
andPredictDataWrapper
. The purpose of these classes is to initialise the classes they claim to wrap with a different set of parameters and this can easily be achieved with a function. I feel subclassing should be primarily be reserved for extending or modifying the functionality of the parent class.
For example a CAREamicsTrainData
factory function could reimplement what is in the __init__
of TrainingDataWrapper
, but return a CAREamicsTrainData
object, as such:
def create_train_datamodule(
train_data: Union[str, Path, np.ndarray],
data_type: Union[Literal["array", "tiff", "custom"], SupportedData],
patch_size: List[int],
axes: str,
batch_size: int,
val_data: Optional[Union[str, Path]] = None,
transforms: Optional[List[TRANSFORMS_UNION]] = None,
train_target_data: Optional[Union[str, Path]] = None,
val_target_data: Optional[Union[str, Path]] = None,
read_source_func: Optional[Callable] = None,
extension_filter: str = "",
val_percentage: float = 0.1,
val_minimum_patches: int = 5,
dataloader_params: Optional[dict] = None,
use_in_memory: bool = True,
use_n2v2: bool = False,
struct_n2v_axis: Literal["horizontal", "vertical", "none"] = "none",
struct_n2v_span: int = 5,
) -> CAREamicsTrainData:
"""Create `CAREamicsTrainData`."""
# logic from `TrainingDataWrapper.__init__`
...
return CAREamicsTrainData(
data_config=data_config,
train_data=train_data,
val_data=val_data,
train_data_target=train_target_data,
val_data_target=val_target_data,
read_source_func=read_source_func,
extension_filter=extension_filter,
val_percentage=val_percentage,
val_minimum_split=val_minimum_patches,
use_in_memory=use_in_memory
)
- Not a priority but I will mention anyway: I think the
CAREamicsTrainData.setup
method is pretty difficult to read with all the nested "if else" statements 😅. Maybe a strategy pattern could help here in future, especially if there are more dataset types to come, i.e. zarr !
I absolutely agree, the whole dataset handling has been a mess since they have been implemented and we are slowly refactoring things... I like the idea of the function rather than the subclassing. It would allow us to come up with better names for the classes as well (since now there would be only a single one). |
Update: the wrappers have made space for functions, this created a duplication with a method used in CAREamics. The duplication was solved via some refactoring. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #161 +/- ##
===========================================
- Coverage 91.30% 41.65% -49.66%
===========================================
Files 104 119 +15
Lines 2633 5786 +3153
===========================================
+ Hits 2404 2410 +6
- Misses 229 3376 +3147 ☔ View full report in Codecov by Sentry. |
Removed And I corrected the vocab. 😉 |
if model.architecture == SupportedArchitecture.UNET.value: | ||
# tile size must be equal to k*2^n, where n is the number of pooling | ||
# layers (equal to the depth) and k is an integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I realise now the InferenceConfig
doesn't have access to the type of architecture so it can't be validated in there 😅, I guess it might be nice to isolate this check as a separate function somewhere but otherwise looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only the type of architecture, but also the depth of the UNet...
But you had a good point with the double Inference
instantiation, that was convoluted. Let's see later if we want to move that to some utils package or so.
Description
I moved the lightning modules into their own package to clarify the imports for the different API, and added smoke tests for the Lightning API. As a result I needed to implement a new function to retrieve the statistics from the
CAREamicsTrainData
, which led me to refactor a bit the datasets and standardize how the statistics are recorded across the two datasets.The smoke tests check that the Lightning API works for tiled and un-tiled data.
Finally, the lightning modules wrappers have disappeared, and now convenience functions explicit the parameters. This led to renaming of the lightning modules.
careamics.lightning
CAREamicsTrainData
careamics.lightning
StatsOutput
, both datasets now have aself.image_stats: Stats
andself.target_stats: Stats
get_data_statistics
to the datasets andCAREamicsTrainData
and corresponding testChanges Made
test_lightning_api.py
.ligthning_data_module.py
Iterable_dataset.py
in_memory_dataset.py
patching.py
Additional Notes and Examples
To use the Lightning API, users now need to have the following imports:
(see full example here)
Since the
PredictDataWrapper
requires passing the statistics, users can now simply call:While building the notebook, I uncovered the following error: #162
Please ensure your PR meets the following requirements: