DGPPO

Jax official implementation of ICLR2025 paper: Songyuan Zhang, Oswin So, Mitchell Black, and Chuchu Fan: "Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control".

Dependencies • Installation • Quickstart • Environments • Algorithms • Usage

Dependencies

We recommend to use CONDA to install the requirements:

conda create -n dgppo python=3.10
conda activate dgppo

Then install the dependencies:

pip install -r requirements.txt

Installation

Install DGPPO:

pip install -e .

Quickstart

To train a model on the LidarSpread environment, run:

python train.py --env LidarSpread --algo dgppo -n 3 --obs 3

To evaluate a model, run:

python test.py --path ./logs/LidarSpread/dgppo/seed0_xxxxxxxxxx_XXXX

Environments

We provide simulation environments across 3 different simulation engines.

MPE

We provide the following environments with the MPE (Multi-Agent Particle Environment) simulation engine: MPETarget, MPESpread, MPEFormation, MPELine, MPECorridor, MPEConnectSpread. In MPE, agents, goals/landmarks, and obstacles are represented as particles. Agents can observe other agents or obstacles when they are within their observation range. Agents follow the double integrator dynamics.

MPETarget: The agents need to reach their pre-assigned goals.
MPESpread: The agents need to collectively cover a set of goals without having access to an assignment.
MPEFormation: The agents need to spread evenly around a given landmark.
MPELine: The agents need to form a line between two given landmarks.
MPECorridor: The agents need to navigate through a narrow corridor and cover a set of given goals.
MPEConnectSpread: The agents need to cover a set of given goals while maintaining connectivity.

LidarEnv

We provide the following environments with the LidarEnv simulation engine: LidarTarget, LidarSpread, LidarLine, LidarBicycleTarget. In these environments, agents use LiDAR sensors to observe the environments. Agents can observe other agents or obstacles when they are within their observation range. Agents follow the double integrator dynamics or the bicycle dynamics.

LidarTarget: The agents need to reach their pre-assigned goals.
LidarSpread: The agents need to collectively cover a set of goals without having access to an assignment.
LidarLine: The agents need to form a line between two given landmarks.
LidarBicycleTarget: The agents (following the bicycle dynamics) need to reach their pre-assigned goals.

VMAS

We provide the following environments with the VMAS (VectorizedMultiAgentSimulator) simulation engine: VMASReverseTransport, VMASWheel. In these environments, contact dynamics are modeled. Agents follow the double integrator dynamics and have full observation.

VMASReverseTransport: The agents need to push a box from inside to its pre-assigned goals, while the center of the box must avoid the obstacles.
VMASWheel: The agents need to push a wheel to its pre-assigned angle, while the wheel must avoid a range of dangerous angles.

Custom Environments

It is very easy to create a custom environment by yourself! Choose one of the three engines, and inherit one of the existing environments. Define your reward function, graph connection, and dynamics, register the new environment in 'env/__init__.py`, and you are good to go!

Algorithms

We provide the following algorithms:

dgppo: Discrete GCBF Proximal Policy Optimization.
informarl: MAPPO with GNN (Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation).
informarl_lagr: MAPPO-Lagrangian with GNN. Replaced the sum-over-time cost with max-over-time cost.
hcbfcrpo: DGPPO but replace the learned GCBF with a hand-crafted CBF.

Usage

Train

To train the <algo> algorithm on the <env> environment with <n_agent> agents and <n_obs> obstacles, run:

python train.py --env <env> --algo <algo> -n <n_agent> --obs <n_obs>

The training logs will be saved in logs/<env>/<algo>/seed<seed>_<timestamp>_<four random letters>. We provide the following flags:

Required Flags

--env: Environment.
--algo: Algorithm.
-n: Number of agents.
--obs: Number of obstacles.

For algorithms

--cbf-schedule: [For dgppo and hcbfcrpo] Use the CBF schedule, default False (This should be used to reproduce the results).
--cbf-weight: [For dgppo and hcbrcrpo] Weight of the CBF loss, default 1.0.
--cbf-eps: [For dgppo and hcbrcrpo] Epsilon of the CBF loss, default 0.01.
--alpha: [For dgppo and hcbrcrpo] The class-$\kappa$ function, default 10.0.
--cost-weight: [For informarl] Weight of the cost term in the reward, default 0.0.
--cost-schedule: [For informarl] Use the cost schedule, default False.
--lagr-init: [For informarl_lagr] Initial value of the Lagrangian multiplier, default 0.5.
--lr-lagr: [For informarl_lagr] Learning rate of the Lagrangian multiplier, default 1e-7.
--clip-eps: Clip epsilon, default 0.25.
--coef-ent: Entropy coefficient, default 0.01.

For environments

--n-rays: [For LidarEnv] Number of LiDAR rays, default 32.
--full-observation: [For MPE and LidarEnv] Use full observation, default False.

Training options

--no-rnn: Do not use RNN, default False. Use this flag in the VMAS environments can accelerate the training.
--n-env-train: Number of environments for training, default 128. Decrease this number with --batch-size if the GPU memory is not enough.
--batch-size: Batch size, default 16384.
--n-env-test: Number of environments for testing (during training), default 32.
--log-dir: Directory to save the logs, default logs.
--eval-interval: Evaluation interval, default 50.
--eval-epi: Number of episodes for evaluation, default 1.
--save-interval: Save interval, default 50.
--seed: Random seed, default 0.
--steps: Number of training steps, default 200000.
--name: Name of the experiment, default None.
--debug: Debug mode, in which the logs will not be saved, default False.
--actor-gnn-layers: Number of layers in the actor GNN, default 2.
--Vl-gnn-layers: Number of layers in the $V^l$ GNN, default 2.
--Vh-gnn-layers: Number of layers in the $V^h$ GNN, default 1.
--lr-actor: Learning rate of the actor, default 3e-4. Consider changing to 1e-5 for 1 agent.
--lr-Vl: Learning rate of $V^l$, default 1e-3. Consider changing to 3e-4 for 1 agent.
--lr-Vh: Learning rate of $V^h$, default 1e-3. Consider changing to 3e-4 for 1 agent.
--rnn-layers: Number of layers in the RNN, default 1.
--use-lstm: Use LSTM, default False (use GRU).
--rnn-step: Number of RNN steps in a chunk, default 16.

Test

To test the learned model, use:

python test.py --path <path-to-log>

This should report the reward, min/max reward, cost, min/max cost, and the safety rate of the learned model. Also, it will generate videos of the learned model in <path-to-log>/videos. Use the following flags to customize the test:

Required Flags

--path: Path to the log.

Optional Flags

--epi: Number of episodes for testing, default 5.
--no-video: Do not generate videos, default False.
--step: If this is given, evaluate the model at the given step, default None (evaluate the last model).
-n: Number of agents, default as the same as training.
--obs: Number of obstacles, default as the same as training.
--env: Environment, default as the same as training.
--full-observation: Use full observation, default False.
--cpu: Use CPU only, default False.
--max-step: Maximum number of steps for each episode, default None.
--stochastic: Use stochastic policy, default False.
--log: Log the results, default False.
--seed: Random seed, default 1234.
--debug: Debug mode.
--dpi: DPI of the video, default 100.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dgppo		dgppo
media		media
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DGPPO

Dependencies

Installation

Quickstart

Environments

MPE

LidarEnv

VMAS

Custom Environments

Algorithms

Usage

Train

Required Flags

For algorithms

For environments

Training options

Test

Required Flags

Optional Flags

About

Releases

Packages

Languages

License

MIT-REALM/dgppo

Folders and files

Latest commit

History

Repository files navigation

DGPPO

Dependencies

Installation

Quickstart

Environments

MPE

LidarEnv

VMAS

Custom Environments

Algorithms

Usage

Train

Required Flags

For algorithms

For environments

Training options

Test

Required Flags

Optional Flags

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages