Skip to content

Official Implementation of ICLR2025 Paper: Songyuan Zhang, Oswin So, Mitchell Black, Chuchu Fan: "Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control".

License

Notifications You must be signed in to change notification settings

MIT-REALM/dgppo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LidarSpread LidarLine VMASReverseTransport VMASWheel
DGPPO Framework

Dependencies

We recommend to use CONDA to install the requirements:

conda create -n dgppo python=3.10
conda activate dgppo

Then install the dependencies:

pip install -r requirements.txt

Installation

Install DGPPO:

pip install -e .

Quickstart

To train a model on the LidarSpread environment, run:

python train.py --env LidarSpread --algo dgppo -n 3 --obs 3

To evaluate a model, run:

python test.py --path ./logs/LidarSpread/dgppo/seed0_xxxxxxxxxx_XXXX

Environments

We provide simulation environments across 3 different simulation engines.

MPE

We provide the following environments with the MPE (Multi-Agent Particle Environment) simulation engine: MPETarget, MPESpread, MPEFormation, MPELine, MPECorridor, MPEConnectSpread. In MPE, agents, goals/landmarks, and obstacles are represented as particles. Agents can observe other agents or obstacles when they are within their observation range. Agents follow the double integrator dynamics.

MPETarget MPESpread MPEFormation MPELine MPECorridor MPEConnectSpread Legend
  • MPETarget: The agents need to reach their pre-assigned goals.
  • MPESpread: The agents need to collectively cover a set of goals without having access to an assignment.
  • MPEFormation: The agents need to spread evenly around a given landmark.
  • MPELine: The agents need to form a line between two given landmarks.
  • MPECorridor: The agents need to navigate through a narrow corridor and cover a set of given goals.
  • MPEConnectSpread: The agents need to cover a set of given goals while maintaining connectivity.

LidarEnv

We provide the following environments with the LidarEnv simulation engine: LidarTarget, LidarSpread, LidarLine, LidarBicycleTarget. In these environments, agents use LiDAR sensors to observe the environments. Agents can observe other agents or obstacles when they are within their observation range. Agents follow the double integrator dynamics or the bicycle dynamics.

LidarTarget LidarSpread LidarLine LidarBicycleTarget
  • LidarTarget: The agents need to reach their pre-assigned goals.
  • LidarSpread: The agents need to collectively cover a set of goals without having access to an assignment.
  • LidarLine: The agents need to form a line between two given landmarks.
  • LidarBicycleTarget: The agents (following the bicycle dynamics) need to reach their pre-assigned goals.

VMAS

We provide the following environments with the VMAS (VectorizedMultiAgentSimulator) simulation engine: VMASReverseTransport, VMASWheel. In these environments, contact dynamics are modeled. Agents follow the double integrator dynamics and have full observation.

VMASReverseTransport VMASWheel
  • VMASReverseTransport: The agents need to push a box from inside to its pre-assigned goals, while the center of the box must avoid the obstacles.
  • VMASWheel: The agents need to push a wheel to its pre-assigned angle, while the wheel must avoid a range of dangerous angles.

Custom Environments

It is very easy to create a custom environment by yourself! Choose one of the three engines, and inherit one of the existing environments. Define your reward function, graph connection, and dynamics, register the new environment in 'env/__init__.py`, and you are good to go!

Algorithms

We provide the following algorithms:

Usage

Train

To train the <algo> algorithm on the <env> environment with <n_agent> agents and <n_obs> obstacles, run:

python train.py --env <env> --algo <algo> -n <n_agent> --obs <n_obs>

The training logs will be saved in logs/<env>/<algo>/seed<seed>_<timestamp>_<four random letters>. We provide the following flags:

Required Flags

  • --env: Environment.
  • --algo: Algorithm.
  • -n: Number of agents.
  • --obs: Number of obstacles.

For algorithms

  • --cbf-schedule: [For dgppo and hcbfcrpo] Use the CBF schedule, default False (This should be used to reproduce the results).
  • --cbf-weight: [For dgppo and hcbrcrpo] Weight of the CBF loss, default 1.0.
  • --cbf-eps: [For dgppo and hcbrcrpo] Epsilon of the CBF loss, default 0.01.
  • --alpha: [For dgppo and hcbrcrpo] The class-$\kappa$ function, default 10.0.
  • --cost-weight: [For informarl] Weight of the cost term in the reward, default 0.0.
  • --cost-schedule: [For informarl] Use the cost schedule, default False.
  • --lagr-init: [For informarl_lagr] Initial value of the Lagrangian multiplier, default 0.5.
  • --lr-lagr: [For informarl_lagr] Learning rate of the Lagrangian multiplier, default 1e-7.
  • --clip-eps: Clip epsilon, default 0.25.
  • --coef-ent: Entropy coefficient, default 0.01.

For environments

  • --n-rays: [For LidarEnv] Number of LiDAR rays, default 32.
  • --full-observation: [For MPE and LidarEnv] Use full observation, default False.

Training options

  • --no-rnn: Do not use RNN, default False. Use this flag in the VMAS environments can accelerate the training.
  • --n-env-train: Number of environments for training, default 128. Decrease this number with --batch-size if the GPU memory is not enough.
  • --batch-size: Batch size, default 16384.
  • --n-env-test: Number of environments for testing (during training), default 32.
  • --log-dir: Directory to save the logs, default logs.
  • --eval-interval: Evaluation interval, default 50.
  • --eval-epi: Number of episodes for evaluation, default 1.
  • --save-interval: Save interval, default 50.
  • --seed: Random seed, default 0.
  • --steps: Number of training steps, default 200000.
  • --name: Name of the experiment, default None.
  • --debug: Debug mode, in which the logs will not be saved, default False.
  • --actor-gnn-layers: Number of layers in the actor GNN, default 2.
  • --Vl-gnn-layers: Number of layers in the $V^l$ GNN, default 2.
  • --Vh-gnn-layers: Number of layers in the $V^h$ GNN, default 1.
  • --lr-actor: Learning rate of the actor, default 3e-4. Consider changing to 1e-5 for 1 agent.
  • --lr-Vl: Learning rate of $V^l$, default 1e-3. Consider changing to 3e-4 for 1 agent.
  • --lr-Vh: Learning rate of $V^h$, default 1e-3. Consider changing to 3e-4 for 1 agent.
  • --rnn-layers: Number of layers in the RNN, default 1.
  • --use-lstm: Use LSTM, default False (use GRU).
  • --rnn-step: Number of RNN steps in a chunk, default 16.

Test

To test the learned model, use:

python test.py --path <path-to-log>

This should report the reward, min/max reward, cost, min/max cost, and the safety rate of the learned model. Also, it will generate videos of the learned model in <path-to-log>/videos. Use the following flags to customize the test:

Required Flags

--path: Path to the log.

Optional Flags

  • --epi: Number of episodes for testing, default 5.
  • --no-video: Do not generate videos, default False.
  • --step: If this is given, evaluate the model at the given step, default None (evaluate the last model).
  • -n: Number of agents, default as the same as training.
  • --obs: Number of obstacles, default as the same as training.
  • --env: Environment, default as the same as training.
  • --full-observation: Use full observation, default False.
  • --cpu: Use CPU only, default False.
  • --max-step: Maximum number of steps for each episode, default None.
  • --stochastic: Use stochastic policy, default False.
  • --log: Log the results, default False.
  • --seed: Random seed, default 1234.
  • --debug: Debug mode.
  • --dpi: DPI of the video, default 100.

About

Official Implementation of ICLR2025 Paper: Songyuan Zhang, Oswin So, Mitchell Black, Chuchu Fan: "Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages