NVIDIA · Alexey-Kamenev · Jan 16, 2025 · Jan 28, 2025 · Jan 28, 2025 · Jan 28, 2025
diff --git a/examples/cfd/lagrangian_mgn/README.md b/examples/cfd/lagrangian_mgn/README.md
@@ -1,11 +1,9 @@
 # MeshGraphNet with Lagrangian mesh
 
-This is an example of Meshgraphnet for particle-based simulation on the
-water dataset based on
-<https://github.com/google-deepmind/deepmind-research/tree/master/learning_to_simulate>
-in PyTorch.
-It demonstrates how to train a Graph Neural Network (GNN) for evaluation
-of the Lagrangian fluid.
+This is an example of MeshGraphNet for particle-based simulation, based on the
+[Learning to Simulate](https://sites.google.com/view/learning-to-simulate/)
+work. It demonstrates how to use Modulus to train a Graph Neural Network (GNN)
+to simulate Lagrangian fluids, solids, and deformable materials.
 
 ## Problem overview
 
@@ -22,38 +20,47 @@ steps to maintain physically valid prediction.
 
 ## Dataset
 
-We rely on [DeepMind's particle physics datasets](https://sites.google.com/view/learning-to-simulate)
-for this example. They datasets are particle-based simulation of fluid splashing
-and bouncing in a box or cube.
+For this example, we use [DeepMind's particle physics datasets](https://sites.google.com/view/learning-to-simulate).
+Some of these datasets contain particle-based simulations of fluid splashing and bouncing
+within a box or cube while others use materials like sand or goop.
+There are a total of 17 datasets, with some of them listed below:
 
 | Datasets     | Num Particles | Num Time Steps |    dt    | Ground Truth Simulator |
 |--------------|---------------|----------------|----------|------------------------|
 | Water-3D     | 14k           | 800            | 5ms      | SPH                    |
 | Water-2D     | 2k            | 1000           | 2.5ms    | MPM                    |
 | WaterRamp    | 2.5k          | 600            | 2.5ms    | MPM                    |
+| Sand         | 2k            | 320            | 2.5ms    | MPM                    |
+| Goop         | 1.9k          | 400            | 2.5ms    | MPM                    |
+
+See the section **B.1** in the [original paper](https://arxiv.org/abs/2002.09405).
 
 ## Model overview and architecture
 
-In this model, we utilize a Meshgraphnet to capture the fluid system’s dynamics.
-We represent the system as a graph, with vertices corresponding to fluid particles
-and edges representing their interactions. The model is autoregressive, using
-historical data to predict future states. The input features for the vertices
-include the current position, current velocity, node type (e.g., fluid, sand,
-boundary), and historical velocity. The model's output is the acceleration,
-defined as the difference between the current and next velocity. Both velocity
-and acceleration are derived from the position sequence and normalized to a
-standard Gaussian distribution for consistency.
+This model uses MeshGraphNet to capture the dynamics of the fluid system.
+The system is represented as a graph, where vertices correspond to fluid particles,
+and edges represent their interactions. The model is autoregressive,
+utilizing historical data to predict future states. Input features for the vertices
+include current position, velocity, node type (e.g., fluid, sand, boundary),
+and historical velocity. The model’s output is acceleration, defined as the difference
+between current and next velocity. Both velocity and acceleration are derived from
+the position sequence and normalized to a standard Gaussian distribution
+for consistency.
 
 For computational efficiency, we do not explicitly construct wall nodes for
 square or cubic domains. Instead, we assign a wall feature to each interior
 particle node, representing its distance from the domain boundaries. For a
-system dimensionality of \(d = 2\) or \(d = 3\), the features are structured
+system dimensionality of $d = 2$ or $d = 3$, the features are structured
 as follows:
 
-- **Node features**: position (\(d\)), historical velocity (\(t \times d\)),
-                     one-hot encoding of node type (6), wall feature (\(2 \times d\))
-- **Edge features**: displacement (\(d\)), distance (1)
-- **Node target**: acceleration (\(d\))
+- **Node features**:
+  - position ($d$)
+  - historical velocity ($t \times d$),
+    where the number of steps $t$ can be set using `data.num_history` config parameter.
+  - one-hot encoding of node type (e.g. 6),
+  - wall feature ($2 \times d$)
+- **Edge features**: displacement ($d$), distance (1)
+- **Node target**: acceleration ($d$)
 
 We construct edges based on a predefined radius, connecting pairs of particle
 nodes if their pairwise distance is within this radius. During training, we
@@ -65,54 +72,88 @@ a small amount of noise is added during training.
 
 The model uses a hidden dimensionality of 128 for the encoder, processor, and
 decoder. The encoder and decoder each contain two hidden layers, while the
-processor consists of eight message-passing layers. We use a batch size of
-20 per GPU, and summation aggregation is applied for message passing in the
-processor. The learning rate is set to 0.0001 and decays exponentially with
-a rate of 0.9999991. These hyperparameters can be configured in the config file.
+processor consists of ten message-passing layers. We use a batch size of
+20 per GPU (for Water dataset), and summation aggregation is applied for
+message passing in the processor. The learning rate is set to 0.0001 and decays
+using cosine annealing schedule. These hyperparameters can be configured using
+command line or in the config file.
 
 ## Getting Started
 
 This example requires the `tensorflow` library to load the data in the `.tfrecord`
-format. Install with
+format. Install with:
 
 ```bash
-pip install tensorflow
+pip install "tensorflow<=2.17.1"
 ```
 
-To download the data from DeepMind's repo, run
+To download the data from DeepMind's repo, run:
 
 ```bash
 cd raw_dataset
 bash download_dataset.sh Water /data/
 ```
 
-Change the data path in `conf/config_2d.yaml` correspondingly
+This example uses [Hydra](https://hydra.cc/docs/intro/) for [experiment](https://hydra.cc/docs/patterns/configuring_experiments/)
+configuration. Hydra offers a convenient way to modify nearly any experiment parameter,
+such as dataset settings, model configurations, and optimizer options,
+either through the command line or config files.
+
+To view the full set of training script options, run the following command:
+
+```bash
+python train.py --help
+```
 
-To train the model, run
+If you encounter issues with the Hydra config, you may receive an error message
+that isn’t very helpful. In that case, set the `HYDRA_FULL_ERROR=1` environment
+variable for more detailed error information:
 
 ```bash
-python train.py
+HYDRA_FULL_ERROR=1 python train.py ...
 ```
 
-Progress and loss logs can be monitored using Weights & Biases. To activatethat,
-set `wandb_mode` to `online` in the `conf/config_2d.yaml` This requires to have an active
-Weights & Biases account. You also need to provide your API key in the config file.
+To train the model with the Water dataset, run:
 
 ```bash
-wandb_key: <your_api_key>
+python train.py +experiment=water data.data_dir=/data/Water
 ```
 
-The URL to the dashboard will be displayed in the terminal after the run is launched.
-Alternatively, the logging utility in `train.py` can be switched to MLFlow.
+Progress and loss logs can be monitored using Weights & Biases. To activate that,
+set `loggers.wandb.mode` to `online` in the command line:
+
+```bash
+python train.py +experiment=water data.data_dir=/data/Water loggers.wandb.mode=online
+```
 
-Once the model is trained, run
+An active Weights & Biases account is required. You will also need to set your
+API key either through the command line option `loggers.wandb.wandb_key`
+or by using the `WANDB_API_KEY` environment variable:
 
 ```bash
-python inference.py
+export WANDB_API_KEY=key
+python train.py ...
 ```
 
-This will save the predictions for the test dataset in `.gif` format in the `animations`
-directory.
+## Inference
+
+The inference script, `inference.py`, also supports Hydra configuration, ensuring
+consistency between training and inference runs.
+
+Once the model is trained, run the following command:
+
+```bash
+python inference.py +experiment=water \
+    data.data_dir=/data/Water \
+    data.test.num_samples=1 \
+    resume_dir=/data/models/lmgn/water \
+    output=/data/models/lmgn/water/inference
+```
+
+Use the `resume_dir` parameter to specify the location of the model checkpoints.
+
+This will save the predictions for the test dataset as `.gif` files in the
+`/data/models/lmgn/water/inference/animations` directory.
 
 ## References
 

diff --git a/examples/cfd/lagrangian_mgn/conf/config.yaml b/examples/cfd/lagrangian_mgn/conf/config.yaml
@@ -0,0 +1,101 @@
+# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES.
+# SPDX-FileCopyrightText: All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+defaults:
+  - /logging/python: default
+  - override hydra/job_logging: disabled  # We use rank-aware logger configuration instead.
+  - _self_
+
+hydra:
+  run:
+    dir: ${output}
+  output_subdir: hydra  # Default is .hydra which causes files not being uploaded in W&B.
+
+# Dimensionality of the problem (2D or 3D).
+dim: 2
+
+# Main output directory.
+output: outputs
+
+# The directory to search for checkpoints to continue training.
+resume_dir: ${output}
+
+# The dataset directory must be set either in command line or config.
+data:
+  data_dir: ???
+  num_history: 5
+  num_node_types: 6
+  train:
+    split: train
+  valid:
+    split: valid
+  test:
+    split: test
+
+# The loss should be set in the experiment.
+loss: ???
+
+# The optimizer should be set in the experiment.
+optimizer: ???
+
+# The scheduler should be set in the experiment.
+lr_scheduler: ???
+
+train:
+  batch_size: 20
+  epochs: 20
+  checkpoint_save_freq: 5
+  dataloader:
+    batch_size: ${..batch_size}
+    shuffle: true
+    num_workers: 8
+    pin_memory: true
+    drop_last: true
+
+test:
+  batch_size: 1
+  device: cuda
+  dataloader:
+    batch_size: ${..batch_size}
+    shuffle: false
+    num_workers: 1
+    pin_memory: true
+    drop_last: false
+
+compile:
+  enabled: false
+  args:
+    backend: inductor
+
+amp:
+  enabled: false
+
+loggers:
+  wandb:
+    _target_: loggers.WandBLogger
+    project: meshgraphnet
+    entity: modulus
+    name: l-mgn
+    group: l-mgn
+    mode: disabled
+    dir: ${output}
+    id:
+    wandb_key:
+    watch_model: false
+
+inference:
+  frame_skip: 1
+  frame_interval: 1
diff --git a/examples/cfd/lagrangian_mgn/conf/data/lagrangian_dataset.yaml b/examples/cfd/lagrangian_mgn/conf/data/lagrangian_dataset.yaml
@@ -0,0 +1,29 @@
+# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES.
+# SPDX-FileCopyrightText: All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+_target_: modulus.datapipes.gnn.lagrangian_dataset.LagrangianDataset
+_convert_: all
+
+name: ${data.name}
+data_dir: ${data.data_dir}
+split: ???
+num_samples: ???
+num_history: ${..num_history}
+num_steps: 600
+num_node_types: ${..num_node_types}
+noise_std: 0.0003
+radius: 0.015
+dt: 0.0025
diff --git a/examples/cfd/lagrangian_mgn/conf/experiment/goop.yaml b/examples/cfd/lagrangian_mgn/conf/experiment/goop.yaml
@@ -0,0 +1,42 @@
+# @package _global_
+
+# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES.
+# SPDX-FileCopyrightText: All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+defaults:
+  - /[email protected]: lagrangian_dataset
+  - /[email protected]: lagrangian_dataset
+  - /[email protected]: lagrangian_dataset
+  - /model: mgn_2d
+  - /loss: mseloss
+  - /optimizer: fused_adam
+  - /lr_scheduler: cosine
+
+data:
+  name: Goop
+  num_node_types: 9
+  train:
+    num_samples: 1000
+    num_steps: 395  # 400 - ${num_history}
+  valid:
+    num_samples: 30
+    num_steps: 100
+  test:
+    num_samples: 30
+    num_steps: 100
+
+model:
+  input_dim_nodes: 25  # 9 node types instead of 6.