Skip to content

Commit

Permalink
Merge pull request #6 from szmazurek/documentation
Browse files Browse the repository at this point in the history
Documentation
  • Loading branch information
sarthakpati authored Oct 5, 2024
2 parents 4810cdd + 77ff86d commit 2ee1419
Show file tree
Hide file tree
Showing 26 changed files with 907 additions and 185 deletions.
63 changes: 31 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,45 +2,44 @@

Presenting the **G**ener**a**lly **N**uanced **D**eep **L**earning **F**ramework for **Synth**esis (GaNDLF-Synth), a unified abstraction to train various synthesis algorithms in a zero/low code approach.

## Documentation

General documentation is in preparation, it will be **made available soon**.
## Why use this?

- Supports multiple
- Generative model architectures
- Data dimensions (2D/3D)
- Channels/images/sequences
- Label conditioning schemes
- Domain modalities (i.e., Radiology Scans and Digitized Histopathology Tissue Sections)
- Problem types (synthesis, reconstruction)
- Multi-GPU and multi-node training
- Built-in
- Support for parallel HPC-based computing
- Support for training check-pointing
- Support for [Automatic mixed precision](./docs/usage.md#mixed-precision-training)
- Robust data augmentation and preprocessing (via interfacing [GaNDLF](
https://docs.mlcommons.org/GaNDLF/))
- Leverages robust open source software
- No need to write any code to generate robust models

Mixed precision training:
We currently support mixed precision training based on [lightning](https://pytorch-lightning.readthedocs.io/en/latest/advanced/mixed_precision.html). To use mixed precision, please set the "precision" field in the "compute" field. All available precision options can be found under the link above.

```yaml
compute:
precision: "16"
```
## Documentation

Usage of distributed strategies:
GaNDLF has extensive documentation and it is arranged in the following manner:

- we currently support [DDP](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) and [DeepSpeed](https://www.deepspeed.ai/getting-started/).
- To use ddp, just configure the number of nodes and type strategy name "ddp" under "compute" field in the config. Note that this is going to be used as default strategy if not specified and multiple GPUs are available.
- [Home](https://mlcommons.github.io/GaNDLF-Synth/)
- [Installation](./docs/setup.md)
- [Usage](./docs/usage.md)
- [Extension](./docs/extending.md)
- [Frequently Asked Questions](./docs/FAQ.md)
- [Acknowledgements](./docs/acknowledgements.md)

```yaml
compute:
num_devices: 2 # if not set, all GPUs available will
num_nodes: 2 # if not set, one node training is assumed
strategy: "ddp"
strategy_config: {} # additional strategy specific kwargs, see https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.strategies.DDPStrategy.html#lightning.pytorch.strategies.DDPStrategy
## Contributing

```
Please see the [contributing guide](./CONTRIBUTING.md) for more information.

- For deepspeed, we leverage the original deepspeed library config to set the distributed parameters. To use deepspeed, configure the "compute" field as follows:
## Disclaimer
- The software has been designed for research purposes only and has neither been reviewed nor approved for clinical use by the Food and Drug Administration (FDA) or by any other federal/state agency.
- This code (excluding dependent libraries) is governed by [the Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt) provided in the [LICENSE file](./LICENSE) unless otherwise specified.

```yaml
compute:
num_devices: 2 # if not set, all GPUs available will
num_nodes: 2 # if not set, one node training is assumed
strategy: "deepspeed"
strategy_config:
config: "path-to-deepspeed-config.json" # path to the deepspeed config file
```
Details of this config file can be found in the deepspeed documentation here: https://www.deepspeed.ai/docs/config-json/
Please read further details in the Lightning guide: https://lightning.ai/docs/pytorch/stable/advanced/model_parallel/deepspeed.html#custom-deepspeed-config.
Note that you will probably need to override the optimizer choice with one of optimized ones available in deepspeed. This optimizer (scheduler can be specified here too) will take precedence over the one specified in the base yaml config file.

## Citation

Expand Down
1 change: 1 addition & 0 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Coming soon
Empty file added docs/README.md
Empty file.
25 changes: 25 additions & 0 deletions docs/acknowledgements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
## Acknowledgements

This file records the following pieces of information:

- All papers whose results were reproduced for the [GaNDLF manuscript](https://arxiv.org/abs/2103.01006).
- All challenges (links and references) that various developers/users participated in with GaNDLF.
- All papers that used GaNDLF in their analysis.


## Papers

| **Application** | **Lead Author** | **Link** |
|:------------------------:|:---------------:|:------------------------------------------------:|
| arxiv here | Sarthak Pati | |



## People

- [All coders and developers of GaNDLF-Synth](https://github.com/mlcommons/GaNDLF-Synth/graphs/contributors)

# TODO
- Supervisors:

- [Spyridon Bakas](https://www.med.upenn.edu/cbica/sbakas/)
126 changes: 126 additions & 0 deletions docs/customize.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@

This file contains mid-level information regarding various parameters that can be leveraged to customize the training/inference in GaNDLF. To see the default parameters for certain fields, see the [default configs directory]("https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/parameter_defaults/").


## Training Parameters
The training parameters are defined in the configuration file. The following fields are supported:
```yaml
model_config: # Configuration for the model (see below) - required
modality: # Modality of the input data, either 'rad' or 'histo' - required
num_epochs: # Number of epochs to train the model
batch_size: # Batch size for training, validation and test dataloaders
data_preprocessing: # Data preprocessing configuration (see below)
data_augmentation: # Data augmentation configuration (see below)
data_postprocessing: # Data postprocessing configuration (see below)
dataloader_config: # Dataloaders configuration (see below)
inference_parameters:
batch_size: # Batch size for inference
n_images_to_generate: # Number of images to generate during inference, unused in image-to-image models that require input images. This field can be a single value or a dictionary containing the number of images to generate for each class, for example {"1": 10, "2": 20}.
save_model_every_n_epochs: # Save checkpoint every n epochs
compute: {} # Distributed training and mixed precision configuration (see below)
```
## Model
Model configuration is expected to be in the following format:
```yaml
model_config:
model_name: # Name of the model to use
labeling_paradigm: # Labeling paradigm for the model, either 'unlabeled', 'patient', or 'custom'
architecture: # Architecture of the model, customizing given model. Specifics are defined in the config of the given model.
losses: # Loss functions to use (see below).
- name: # Name of the loss function
- some_parameter: some_value
# For models containing multiple losses (for example GANS), the losses are expected to be in the following format:
losses:
- discriminator: # Discriminator loss
- name: # Name of the loss function
- some_parameter: some_value
- generator: # Generator loss
- name: # Name of the loss function
- some_parameter: some_value
# Note that to use multiple losses, the model should be prepared via config to handle it via certain subloss name.
- optimizers: # Optimizers to use (see below)
- name: # Name of the optimizer
- some_parameter: some_value
# For models containing multiple optimizers (for example GANS), the optimizers can be defined as losses above.
- schedulers: # Schedulers to use (see below)
- name: # Name of the scheduler
- some_parameter: some_value
# For models containing multiple schedulers (for example GANS), the schedulers can be defined as losses above.
- n_channels: # Number of input channels
- n_dimensions: # Number of dimensions of the input data (2 or 3)
- tensor_shape: # Shape of the input tensor
# This model config can support additional parameters that are specific to the model, for example:
- save_eval_images_every_n_epochs: # Save evaluation images every n epochs, useful to assess training progress of generative models. Implemented in i.e. DCGAN.
```
## Optimizers
GaNDLF-Synth interfaces GaNDLF core framework for optimizers. See the [optimizers directory](https://github.com/mlcommons/GaNDLF/blob/master/GANDLF/optimizers/__init__.py) for available optimizers. They support optimizer-specific configurable parameters, interfacing [Pytorch Optimizers](https://pytorch.org/docs/stable/optim.html).
## Schedulers
GaNDLF-Synth interfaces GaNDLF core framework for schedulers. See the [schedulers directory](https://github.com/mlcommons/GaNDLF/blob/master/GANDLF/schedulers/__init__.py) for available schedulers. They support scheduler-specific configurable parameters, interfacing [Pytorch Schedulers](https://pytorch.org/docs/stable/optim.html).
## Losses
GaNDLF-Synth supports multiple loss functions. See the [losses directory](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/losses/__init__.py) for available loss functions. They support loss-specific configurable parameters, interfacing [Pytorch Loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions).
## Dataloader configs
GaNDLF-Synth supports separate dataloader parameters for training, validation, test and inference dataloaders. They support configurable parameters, interfacing [Pytorch Dataloader](https://pytorch.org/docs/stable/data.html). The following fields are supported:
```yaml
dataloader_config: # Dataloaders configuration (see below)
shuffle: # Whether to shuffle the data
num_workers: # Number of processes spawned to load the data
pin_memory: # Whether to pin the memory for CPU-GPU transfer
timeout: # Timeout for the dataloader processes
prefetch_factor: # Number of batches to prefetch by each worker
persistent_workers: # Whether to keep the worker processes alive between epochs
```
The fields for specific dataloaders are expected to be in the following format:
```yaml
dataloader_config:
train:
- some_parameter: some_value
val:
- some_parameter: some_value
test:
- some_parameter: some_value
inference:
- some_parameter: some_value
```
If given dataloader is not configured explicitly, the default values are used (see above).
## Data preprocessing
GaNDLF-Synth interfaces GaNDLF core framework for data preprocessing. To see available data preprocessing options, see [here](https://github.com/mlcommons/GaNDLF/blob/master/GANDLF/data/preprocessing/__init__.py).
Separate preprocessing parameters can be defined for each dataloader (train, val, test, inference) as follows:
```yaml
data_preprocessing:
train:
- some_transform: some_value
val:
- some_transform: some_value
test:
- some_transform: some_value
inference:
- some_transform: some_value
```
## Data Augmentation
GaNDLF-Synth interfaces GaNDLF core framework for data augmentation. To see available data augmentation options, see [here](https://github.com/mlcommons/GaNDLF/blob/master/GANDLF/data/augmentation/__init__.py) Augmentations are applied only to the training dataloader.
```yaml
data_augmentation:
train:
- some_transform: some_value
```
## Post processing
GaNDLF-Synth interfaces GaNDLF core framework for post processing. To see available post processing options, see [here](https://github.com/mlcommons/GaNDLF/blob/master/GANDLF/data/post_process/__init__.py). Post-processing is applied only to the inference dataloader.
```yaml
data_postprocessing:
inference:
- some_transform: some_value
```
## Distributed Training
For detalis on using distributed training, see the [usage page](./usage.md#distributed-training).
117 changes: 117 additions & 0 deletions docs/extending.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
## Environment

Before starting to work on the code-level on GaNDLF, please follow the [instructions to install GaNDLF-Synth from sources](./setup.md). Once that's done, please verify the installation using the following command:

```bash
# continue from previous shell
(venv_gandlf) $>
# you should be in the "GaNDLF" git repo
(venv_gandlf) $> gandlf-synth verify-install
```


## Submodule flowcharts

- The following flowcharts are intended to provide a high-level overview of the different submodules in GaNDLF-Synth.
- Navigate to the `README.md` file in each submodule folder for details.
- Some flowcharts are still in development and may not be complete/present.

## Overall Architecture

- Command-line parsing: [gandlf run](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/entrypoints/run.py)
- [Config Manager](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/config_manager.py):
- Handles configuration parsing
- Provides configuration to other modules
- [Training Manager](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/training_manager.py):
- Main entry point from CLI
- Handles training functionality
- [Inference Manager](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/inference_manager.py):
- Handles inference functionality
- Main entry point from CLI
- Performs actual inference


## Dependency Management

To update/change/add a dependency in [setup](https://github.com/mlcommons/GaNDLF-Synth/blob/main/setup.py), please ensure **at least** the following conditions are met:

- The package is being [actively maintained](https://opensource.com/life/14/1/evaluate-sustainability-open-source-project).
- The new dependency is being testing against the **minimum python version** supported by GaNDLF-Synth (see the `python_requires` variable in [setup](https://github.com/mlcommons/GaNDLF-Synth/blob/main/setup.py)).
- It does not clash with any existing dependencies.

## Adding Models

- Create the model in the new file in [architectures](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/models/architectures) folder.
- Make sure the new class inherits from [ModelBase](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/models/architectures/base_model.py) class.
- Create new `LightningModule` that will implement the training, validation, testing and inference logic. Make sure it inherits from [SynthesisModule] (https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/models/module_abc.py) class and implements necessary abstract methods.
- Add the new model to the [ModuleFactory](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/models/module_factory.py) `AVAILABE_MODULES` dictionary. Note that the key in this dictionary should follow the naming convention: `labeling-paradigm_model-name`, and needs to match the key in the model config factory that we will create in the next steps.
- Create the model config file in the [configs](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/configs) folder.
- Implement the model configuration class that will parse model's configuration when creating model instance. Make sure it inherits from [AbstractModelConfig](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/configs/config_abc.py) class.
- Add the new model configuration class to the [ModelConfigFactory](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/configs/model_config_factory.py) `AVAILABLE_MODEL_CONFIGS` dictionary. Note that the model config key in this dictionary should follow the naming convention: `labeling-paradigm_model-name`, and needs to match the key in the model factory that we created in the previous steps.

## Adding Dataloading Functionalities

GaNDLF-Synth handles dataloading for specific labeling paradigms in separate abstractions. If you wish to modify the dataloading functionality, please refer to the following modules:

- [Datasets](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/data/datasets.py): Contains the dataset classes for different labeling paradigms.
- [Dataset Factories](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/data/datasets_factory.py): Contains the factory class that creates the dataset instance based on the configuration.
- [Dataloaders Factories](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/data/dataloaders_factory.py): Contains the factory class that creates the dataloader instance based on the configuration.

Remember to add new datasets and dataloaders to the respective factory classes. For some cases, modifications of the training or inference logic may be required to accommodate the new dataloading functionality (see below).

## Adding Training Functionality


- For changes at the level of single training, validation, or test steps, modify the specific functions of a given module.
- For changes at the level of the entire training loop, modify the [Training Manager](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/training_manager.py). The main training loop is handled via `Trainer` class of `Pytorch Lightning` - please refer to the [Pytorch Lightning documentation](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html) for more details.


## Adding Inference Functionality

- For changes at the level of single inference steps, modify the specific functions of a given module. Note that for inference, special dataloaders are used to load the data in the required format.
- For changes at the level of the entire inference loop, modify the [Inference Manager](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/inference_manager.py). The main inference loop is handled via `Trainer` class of `Pytorch Lightning` - please refer to the [Pytorch Lightning documentation](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html) for more details.

## Adding new CLI command

Example: `gandlf-synth run` [CLI command](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/entrypoints/construct_csv.py)
- Implement function and wrap it with `@click.command()` + `@click.option()`
- Add it to `cli_subommands` [dict](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/entrypoints/subcommands.py)
The command would be available under `gandlf-synth your-subcommand-name` CLI command.


## Update parameters
For any new feature that is configurable via config, please ensure the corresponding option in the ["extending" section of this documentation](./extending.md) is added, so that others can review/use/extend it as needed.

## Update Tests

Once you have made changes to functionality, it is imperative that the unit tests be updated to cover the new code. Please see the [full testing suite](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/testing/tests/) for details and examples. Note that tests are split into different categories, each having its own file in the aforementioned folder:
- `test_modules.py`: module-specific tests
- `test_generic.py`: global features tests
- `entrypoints/`: tests for specific CLI commands

## Run Tests

### Prerequisites

Tests are using [sample data](https://drive.google.com/uc?id=12utErBXZiO_0hspmzUlAQKlN9u-manH_), which gets downloaded and prepared automatically when you run unit tests. Prepared data is stored at `GaNDLF-Synth/testing/data/` automatically the first time test are ran. However, you may want to download & explore data by yourself.

### Unit tests

Once you have the virtual environment set up, tests can be run using the following command:

```bash
# continue from previous shell
(venv_gandlf) $> pytest --device cuda # can be cuda or cpu, defaults to cpu
```

Any failures will be reported in the file `GaNDLF-Synth/testing/failures.log`.


### Code coverage

The code coverage for the unit tests can be obtained by the following command:

```bash
# continue from previous shell
(venv_gandlf) $> coverage run -m pytest --device cuda; coverage report -m
```
Loading

0 comments on commit 2ee1419

Please sign in to comment.