Merge pull request #6 from szmazurek/documentation

Documentation
mlcommons · Oct 5, 2024 · 2ee1419 · 2ee1419
2 parents 4810cdd + 77ff86d
commit 2ee1419
Show file tree

Hide file tree

Showing 26 changed files with 907 additions and 185 deletions.
diff --git a/README.md b/README.md
@@ -2,45 +2,44 @@
 
 Presenting the **G**ener**a**lly **N**uanced **D**eep **L**earning **F**ramework for **Synth**esis (GaNDLF-Synth), a unified abstraction to train various synthesis algorithms in a zero/low code approach.
 
-## Documentation
-
-General documentation is in preparation, it will be **made available soon**.
+## Why use this?
+
+- Supports multiple
+  - Generative model architectures
+  - Data dimensions (2D/3D)
+  - Channels/images/sequences 
+  - Label conditioning schemes
+  - Domain modalities (i.e., Radiology Scans and Digitized Histopathology Tissue Sections)
+  - Problem types (synthesis, reconstruction)
+  - Multi-GPU and multi-node training
+- Built-in 
+  - Support for parallel HPC-based computing
+  - Support for training check-pointing
+  - Support for [Automatic mixed precision](./docs/usage.md#mixed-precision-training)
+- Robust data augmentation and preprocessing (via interfacing [GaNDLF](
+  https://docs.mlcommons.org/GaNDLF/))
+- Leverages robust open source software
+- No need to write any code to generate robust models
 
-Mixed precision training:
-We currently support mixed precision training based on [lightning](https://pytorch-lightning.readthedocs.io/en/latest/advanced/mixed_precision.html). To use mixed precision, please set the "precision" field in the "compute" field. All available precision options can be found under the link above. 
-
-```yaml
-compute:
-  precision: "16"        
-```
+## Documentation
 
-Usage of distributed strategies:
+GaNDLF has extensive documentation and it is arranged in the following manner:
 
-- we currently support [DDP](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) and [DeepSpeed](https://www.deepspeed.ai/getting-started/). 
-- To use ddp, just configure the number of nodes and type strategy name "ddp" under "compute" field in the config. Note that this is going to be used as default strategy if not specified and multiple GPUs are available.
+- [Home](https://mlcommons.github.io/GaNDLF-Synth/)
+- [Installation](./docs/setup.md)
+- [Usage](./docs/usage.md)
+- [Extension](./docs/extending.md)
+- [Frequently Asked Questions](./docs/FAQ.md)
+- [Acknowledgements](./docs/acknowledgements.md)
 
-```yaml
-compute:
-  num_devices: 2         # if not set, all GPUs available will 
-  num_nodes: 2           # if not set, one node training is assumed
-  strategy: "ddp"
-  strategy_config: {}    # additional strategy specific kwargs, see https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.strategies.DDPStrategy.html#lightning.pytorch.strategies.DDPStrategy
+## Contributing
 
-```
+Please see the [contributing guide](./CONTRIBUTING.md) for more information.
 
-- For deepspeed, we leverage the original deepspeed library config to set the distributed parameters. To use deepspeed, configure the "compute" field as follows:
+## Disclaimer
+- The software has been designed for research purposes only and has neither been reviewed nor approved for clinical use by the Food and Drug Administration (FDA) or by any other federal/state agency.
+- This code (excluding dependent libraries) is governed by [the Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt) provided in the [LICENSE file](./LICENSE) unless otherwise specified.
 
-```yaml
-compute:
-  num_devices: 2         # if not set, all GPUs available will 
-  num_nodes: 2           # if not set, one node training is assumed
-  strategy: "deepspeed"
-  strategy_config: 
-    config: "path-to-deepspeed-config.json"    # path to the deepspeed config file
-```
-Details of this config file can be found in the deepspeed documentation here: https://www.deepspeed.ai/docs/config-json/
-Please read further details in the Lightning guide: https://lightning.ai/docs/pytorch/stable/advanced/model_parallel/deepspeed.html#custom-deepspeed-config.
-Note that you will probably need to override the optimizer choice with one of optimized ones available in deepspeed. This optimizer (scheduler can be specified here too) will take precedence over the one specified in the base yaml config file.
 
 ## Citation
 

diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -0,0 +1 @@
+# Coming soon
diff --git a/docs/README.md b/docs/README.md
diff --git a/docs/acknowledgements.md b/docs/acknowledgements.md
@@ -0,0 +1,25 @@
+## Acknowledgements
+
+This file records the following pieces of information:
+
+- All papers whose results were reproduced for the [GaNDLF manuscript](https://arxiv.org/abs/2103.01006).
+- All challenges (links and references) that various developers/users participated in with GaNDLF.
+- All papers that used GaNDLF in their analysis.
+
+
+## Papers
+
+|        **Application**       |   **Lead Author**   |                       **Link**                       |
+|:------------------------:|:---------------:|:------------------------------------------------:|
+|    arxiv here      | Sarthak Pati |  |
+
+
+
+## People
+
+- [All coders and developers of GaNDLF-Synth](https://github.com/mlcommons/GaNDLF-Synth/graphs/contributors)
+
+# TODO
+- Supervisors:
+
+    - [Spyridon Bakas](https://www.med.upenn.edu/cbica/sbakas/)
diff --git a/docs/customize.md b/docs/customize.md
@@ -0,0 +1,126 @@
+
+This file contains mid-level information regarding various parameters that can be leveraged to customize the training/inference in GaNDLF. To see the default parameters for certain fields, see the [default configs directory]("https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/parameter_defaults/").
+
+
+## Training Parameters
+The training parameters are defined in the configuration file. The following fields are supported:
+```yaml
+model_config:  # Configuration for the model (see below) - required
+modality:  # Modality of the input data, either 'rad' or 'histo' - required
+num_epochs:  # Number of epochs to train the model
+batch_size:  # Batch size for training, validation and test dataloaders
+data_preprocessing:  # Data preprocessing configuration (see below)
+data_augmentation:  # Data augmentation configuration (see below)
+data_postprocessing:  # Data postprocessing configuration (see below)
+dataloader_config:  # Dataloaders configuration (see below)
+inference_parameters:
+  batch_size:  # Batch size for inference
+  n_images_to_generate:  # Number of images to generate during inference, unused in image-to-image models that require input images. This field can be a single value or a dictionary containing the number of images to generate for each class, for example {"1": 10, "2": 20}.
+save_model_every_n_epochs:  # Save checkpoint every n epochs
+compute: {} # Distributed training and mixed precision configuration (see below)
+```
+
+## Model
+Model configuration is expected to be in the following format:
+```yaml
+model_config:
+    model_name:  # Name of the model to use
+    labeling_paradigm:  # Labeling paradigm for the model, either 'unlabeled', 'patient', or 'custom'
+    architecture:  # Architecture of the model, customizing given model. Specifics are defined in the config of the given model.
+    losses: # Loss functions to use (see below).
+        - name:  # Name of the loss function
+        - some_parameter: some_value
+     # For models containing multiple losses (for example GANS), the losses are expected to be in the following format:
+    losses:
+        - discriminator: # Discriminator loss
+            - name: # Name of the loss function
+            - some_parameter: some_value
+        - generator: # Generator loss
+            - name: # Name of the loss function
+            - some_parameter: some_value
+    # Note that to use multiple losses, the model should be prepared via config to handle it via certain subloss name.
+    - optimizers: # Optimizers to use (see below)
+        - name:  # Name of the optimizer
+        - some_parameter: some_value
+    # For models containing multiple optimizers (for example GANS), the optimizers can be defined as losses above.
+    - schedulers: # Schedulers to use (see below)
+        - name:  # Name of the scheduler
+        - some_parameter: some_value
+    # For models containing multiple schedulers (for example GANS), the schedulers can be defined as losses above.
+    - n_channels:  # Number of input channels
+    - n_dimensions:  # Number of dimensions of the input data (2 or 3)
+    - tensor_shape:  # Shape of the input tensor
+    # This model config can support additional parameters that are specific to the model, for example:
+    - save_eval_images_every_n_epochs:  # Save evaluation images every n epochs, useful to assess training progress of generative models. Implemented in i.e. DCGAN.
+```
+
+## Optimizers
+GaNDLF-Synth interfaces GaNDLF core framework for optimizers. See the [optimizers directory](https://github.com/mlcommons/GaNDLF/blob/master/GANDLF/optimizers/__init__.py) for available optimizers. They support optimizer-specific configurable parameters, interfacing [Pytorch Optimizers](https://pytorch.org/docs/stable/optim.html).
+
+## Schedulers
+GaNDLF-Synth interfaces GaNDLF core framework for schedulers. See the [schedulers directory](https://github.com/mlcommons/GaNDLF/blob/master/GANDLF/schedulers/__init__.py) for available schedulers. They support scheduler-specific configurable parameters, interfacing [Pytorch Schedulers](https://pytorch.org/docs/stable/optim.html).
+
+## Losses
+GaNDLF-Synth supports multiple loss functions. See the [losses directory](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/losses/__init__.py) for available loss functions. They support loss-specific configurable parameters, interfacing [Pytorch Loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions).
+
+## Dataloader configs
+GaNDLF-Synth supports separate dataloader parameters for training, validation, test and inference dataloaders. They support configurable parameters, interfacing [Pytorch Dataloader](https://pytorch.org/docs/stable/data.html). The following fields are supported:
+```yaml
+dataloader_config:  # Dataloaders configuration (see below)
+  shuffle:  # Whether to shuffle the data
+  num_workers:  # Number of processes spawned to load the data
+  pin_memory:  # Whether to pin the memory for CPU-GPU transfer
+  timeout:  # Timeout for the dataloader processes
+  prefetch_factor:  # Number of batches to prefetch by each worker
+  persistent_workers:  # Whether to keep the worker processes alive between epochs
+```
+The fields for specific dataloaders are expected to be in the following format:
+```yaml
+dataloader_config:
+    train:
+        - some_parameter: some_value
+    val:
+        - some_parameter: some_value
+    test:
+        - some_parameter: some_value
+    inference:
+        - some_parameter: some_value
+```
+If given dataloader is not configured explicitly, the default values are used (see above).
+
+## Data preprocessing
+GaNDLF-Synth interfaces GaNDLF core framework for data preprocessing. To see available data preprocessing options, see [here](https://github.com/mlcommons/GaNDLF/blob/master/GANDLF/data/preprocessing/__init__.py).
+Separate preprocessing parameters can be defined for each dataloader (train, val, test, inference) as follows:
+```yaml
+data_preprocessing:
+    train:
+        - some_transform: some_value
+    val:
+        - some_transform: some_value
+    test:
+        - some_transform: some_value
+    inference:
+        - some_transform: some_value
+```
+
+## Data Augmentation
+GaNDLF-Synth interfaces GaNDLF core framework for data augmentation. To see available data augmentation options, see [here](https://github.com/mlcommons/GaNDLF/blob/master/GANDLF/data/augmentation/__init__.py) Augmentations are applied only to the training dataloader. 
+```yaml
+data_augmentation:
+    train:
+        - some_transform: some_value
+```
+
+
+## Post processing
+GaNDLF-Synth interfaces GaNDLF core framework for post processing. To see available post processing options, see [here](https://github.com/mlcommons/GaNDLF/blob/master/GANDLF/data/post_process/__init__.py). Post-processing is applied only to the inference dataloader.
+```yaml
+data_postprocessing:
+    inference:
+        - some_transform: some_value
+```
+
+## Distributed Training
+For detalis on using distributed training, see the [usage page](./usage.md#distributed-training).
+
+
diff --git a/docs/extending.md b/docs/extending.md
@@ -0,0 +1,117 @@
+## Environment
+
+Before starting to work on the code-level on GaNDLF, please follow the [instructions to install GaNDLF-Synth from sources](./setup.md). Once that's done, please verify the installation using the following command:
+
+```bash
+# continue from previous shell
+(venv_gandlf) $> 
+# you should be in the "GaNDLF" git repo
+(venv_gandlf) $> gandlf-synth verify-install
+```
+
+
+## Submodule flowcharts
+
+- The following flowcharts are intended to provide a high-level overview of the different submodules in GaNDLF-Synth. 
+- Navigate to the `README.md` file in each submodule folder for details.
+- Some flowcharts are still in development and may not be complete/present.
+
+## Overall Architecture
+
+- Command-line parsing: [gandlf run](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/entrypoints/run.py)
+- [Config Manager](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/config_manager.py): 
+    - Handles configuration parsing
+    - Provides configuration to other modules
+- [Training Manager](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/training_manager.py): 
+    - Main entry point from CLI
+    - Handles training functionality
+- [Inference Manager](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/inference_manager.py): 
+    - Handles inference functionality 
+    - Main entry point from CLI
+    - Performs actual inference 
+
+
+## Dependency Management
+
+To update/change/add a dependency in [setup](https://github.com/mlcommons/GaNDLF-Synth/blob/main/setup.py), please ensure **at least** the following conditions are met:
+
+- The package is being [actively maintained](https://opensource.com/life/14/1/evaluate-sustainability-open-source-project).
+- The new dependency is being testing against the **minimum python version** supported by GaNDLF-Synth (see the `python_requires` variable in [setup](https://github.com/mlcommons/GaNDLF-Synth/blob/main/setup.py)).
+- It does not clash with any existing dependencies.
+
+## Adding Models
+
+- Create the model in the new file in [architectures](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/models/architectures) folder.
+- Make sure the new class inherits from [ModelBase](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/models/architectures/base_model.py) class.
+- Create new `LightningModule` that will implement the training, validation, testing and inference logic. Make sure it inherits from [SynthesisModule] (https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/models/module_abc.py) class and implements necessary abstract methods.
+- Add the new model to the [ModuleFactory](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/models/module_factory.py) `AVAILABE_MODULES` dictionary. Note that the key in this dictionary should follow the naming convention: `labeling-paradigm_model-name`, and needs to match the key in the model config factory that we will create in the next steps.
+- Create the model config file in the [configs](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/configs) folder.
+- Implement the model configuration class that will parse model's configuration when creating model instance. Make sure it inherits from [AbstractModelConfig](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/configs/config_abc.py) class.
+- Add the new model configuration class to the [ModelConfigFactory](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/configs/model_config_factory.py) `AVAILABLE_MODEL_CONFIGS` dictionary. Note that the model config key in this dictionary should follow the naming convention: `labeling-paradigm_model-name`, and needs to match the key in the model factory that we created in the previous steps.
+
+## Adding Dataloading Functionalities
+
+GaNDLF-Synth handles dataloading for specific labeling paradigms in separate abstractions. If you wish to modify the dataloading functionality, please refer to the following modules:
+
+- [Datasets](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/data/datasets.py): Contains the dataset classes for different labeling paradigms.
+- [Dataset Factories](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/data/datasets_factory.py): Contains the factory class that creates the dataset instance based on the configuration.
+- [Dataloaders Factories](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/data/dataloaders_factory.py): Contains the factory class that creates the dataloader instance based on the configuration.
+
+Remember to add new datasets and dataloaders to the respective factory classes. For some cases, modifications of the training or inference logic may be required to accommodate the new dataloading functionality (see below).
+
+## Adding Training Functionality
+
+
+- For changes at the level of single training, validation, or test steps, modify the specific functions of a given module.
+- For changes at the level of the entire training loop, modify the [Training Manager](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/training_manager.py). The main training loop is handled via `Trainer` class of `Pytorch Lightning` - please refer to the [Pytorch Lightning documentation](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html) for more details.
+
+
+## Adding Inference Functionality
+
+- For changes at the level of single inference steps, modify the specific functions of a given module. Note that for inference, special dataloaders are used to load the data in the required format.
+- For changes at the level of the entire inference loop, modify the [Inference Manager](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/inference_manager.py). The main inference loop is handled via `Trainer` class of `Pytorch Lightning` - please refer to the [Pytorch Lightning documentation](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html) for more details.
+
+## Adding new CLI command
+
+Example: `gandlf-synth run` [CLI command](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/entrypoints/construct_csv.py)
+- Implement function and wrap it with `@click.command()` + `@click.option()`
+- Add it to `cli_subommands` [dict](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/entrypoints/subcommands.py)
+The command would be available under `gandlf-synth your-subcommand-name` CLI command.
+
+
+## Update parameters
+For any new feature that is configurable via config, please ensure the corresponding option in the ["extending" section of this documentation](./extending.md) is added, so that others can review/use/extend it as needed.
+
+## Update Tests
+
+Once you have made changes to functionality, it is imperative that the unit tests be updated to cover the new code. Please see the [full testing suite](https://github.com/mlcommons/GaNDLF-Synth/blob/main/gandlf_synth/testing/tests/) for details and examples. Note that tests are split into different categories, each having its own file in the aforementioned folder:
+- `test_modules.py`: module-specific tests
+- `test_generic.py`: global features tests
+- `entrypoints/`: tests for specific CLI commands
+
+## Run Tests
+
+### Prerequisites
+
+Tests are using [sample data](https://drive.google.com/uc?id=12utErBXZiO_0hspmzUlAQKlN9u-manH_), which gets downloaded and prepared automatically when you run unit tests. Prepared data is stored at `GaNDLF-Synth/testing/data/` automatically the first time test are ran. However, you may want to download & explore data by yourself.
+
+### Unit tests
+
+Once you have the virtual environment set up, tests can be run using the following command:
+
+```bash
+# continue from previous shell
+(venv_gandlf) $> pytest --device cuda # can be cuda or cpu, defaults to cpu
+```
+
+Any failures will be reported in the file `GaNDLF-Synth/testing/failures.log`.
+
+
+### Code coverage
+
+The code coverage for the unit tests can be obtained by the following command:
+
+```bash
+# continue from previous shell
+(venv_gandlf) $> coverage run -m pytest --device cuda; coverage report -m
+```