Skip to content

Official implementation for "Towards Safe Reinforcement Learning via Constraining Conditional Value at Risk" (IJCAI 2022)

License

Notifications You must be signed in to change notification settings

yingchengyang/CPPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CPPO

arXiv

Usage

  • The training code is in the folder '/src'.

  • These methods, including baselines and our methods, are based on Spinning Up (we delete unnecessary files to make the code clearer)

  • You first should install Spinning Up by

cd src
pip install -e .
  • Then you can train agents of baselines, including VPG, TRPO, PPO, and PG-CMDP, by the training code like
python -m spinup.run vpg --hid "[64,32]" --env Walker2d-v3 --exp_name Walker2d/vpg/vpg-seed0 --epochs 750 --seed 0
python -m spinup.run trpo --hid "[64,32]" --env Walker2d-v3 --exp_name Walker2d/trpo/trpo-seed0 --epochs 750 --seed 0
python -m spinup.run ppo --hid "[64,32]" --env Walker2d-v3 --exp_name Walker2d/ppo/ppo-seed0 --epochs 750 --seed 0
python -m spinup.run pg_cmdp --hid "[64,32]" --env Walker2d-v3 --exp_name Walker2d/pg_cmdp/pg_cmdp-seed0 --epochs 750 --seed 0 --delay 0.8 --nu_delay 0.8
  • For PG-CMDP, you can adjust hyperparameters like --delay, --nu_delay and so on.

  • You can train agents of CPPO for all five environments reported in the paper with the training code as below

python -m spinup.run cppo --hid "[64,32]" --env Ant-v3 --exp_name Ant/cppo/cppo-seed0 --epochs 750 --seed 0 --beta 2800 --nu_start 10.0 --gamma 0.99 --nu_delay 0.2 --delay 0.0024 --cvar_clip_ratio 0.018
python -m spinup.run cppo --hid "[64,32]" --env HalfCheetah-v3 --exp_name HalfCheetah/cvarppo/cppo-seed0 --epochs 750 --seed 0 --beta 2500 --nu_start 10.0 --gamma 0.99 --nu_delay 0.3 --delay 0.0002 --cvar_clip_ratio 0.01
python -m spinup.run cppo --hid "[64,32]" --env Hopper-v3 --exp_name Hopper/cvarppo/cppo-seed0 --epochs 750 --seed 0 --beta 2500 --nu_start 10.0 --gamma 0.999 --nu_delay 0.3 --delay 0.002 --cvar_clip_ratio 0.027
python -m spinup.run cppo --hid "[64,32]" --env Swimmer-v3 --exp_name Swimmer/cvarppo/cppo-seed0 --epochs 750 --seed 0 --beta 122 --nu_start -20.0 --gamma 0.999 --nu_delay 0.3 --delay 0.002 --cvar_clip_ratio 0.03
python -m spinup.run cppo --hid "[64,32]" --env Walker2d-v3 --exp_name Walker2d/cvarppo/cppo-seed0 --epochs 750 --seed 0 --beta 2500 --nu_start 10.0 --gamma 0.99 --nu_delay 0.3 --delay 0.0018 --cvar_clip_ratio 0.01
  • For CPPO, you can adjust hyperparameters like --beta, --nu_start, --nu_delay, --delay, --cvar_clip_ratio and so on.

  • Evaluate the performance under transition disturbance (changing mass)

python test_mass.py --task Hopper --algos "vpg trpo ppo pg_cmdp cppo" --mass_lower_bound 1.0 --mass_upper_bound 4.0 --mass_number 100 --episodes 5
python test_mass.py --task Swimmer --algos "vpg trpo ppo pg_cmdp cppo" --mass_lower_bound 25.0 --mass_upper_bound 55.0 --mass_number 100 --episodes 5
python test_mass.py --task Walker2d --algos "vpg trpo ppo pg_cmdp cppo" --mass_lower_bound 1.0 --mass_upper_bound 7.0 --mass_number 100 --episodes 5
python test_mass.py --task HalfCheetah --algos "vpg trpo ppo pg_cmdp cppo" --mass_lower_bound 3.0 --mass_upper_bound 10.0 --mass_number 100 --episodes 5
  • Evaluate the performance under observation disturbance (random noises)
python test_state.py --task Walker2d --algos "vpg trpo ppo pg_cmdp cppo" --epsilon_low 0.0 --epsilon_upp 0.4 --epsilon_num 100 --episodes 5
python test_state.py --task Hopper --algos "vpg trpo ppo pg_cmdp cppo" --epsilon_low 0.0 --epsilon_upp 0.1 --epsilon_num 100 --episodes 5
python test_state.py --task Swimmer --algos "vpg trpo ppo pg_cmdp cppo" --epsilon_low 0.0 --epsilon_upp 0.4 --epsilon_num 100 --episodes 5
python test_state.py --task HalfCheetah --algos "vpg trpo ppo pg_cmdp cppo" --epsilon_low 0.0 --epsilon_upp 0.5 --epsilon_num 100 --episodes 5
  • Evaluate the performance under observation disturbance (adversarial noises)
python test_state_adversary.py --task Walker2d --algos "vpg trpo ppo pg_cmdp cppo" --epsilon_low 0.0 --epsilon_upp 0.2 --epsilon_num 100 --episodes 5
python test_state_adversary.py --task Hopper --algos "vpg trpo ppo pg_cmdp cppo" --epsilon_low 0.0 --epsilon_upp 0.1 --epsilon_num 100 --episodes 5
python test_state_adversary.py --task Swimmer --algos "vpg trpo ppo pg_cmdp cppo" --epsilon_low 0.0 --epsilon_upp 0.4 --epsilon_num 100 --episodes 5
python test_state_adversary.py --task HalfCheetah --algos "vpg trpo ppo pg_cmdp cppo" --epsilon_low 0.0 --epsilon_upp 0.5 --epsilon_num 100 --episodes 5

Citation

If you find CPPO helpful, please cite our paper.

@inproceedings{
    ying2022towards,
    title={Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk},
    author={Ying, Chengyang and Zhou, Xinning and Su, Hang and Yan, Dong and Chen, Ning and Zhu, Jun},
    booktitle={International Joint Conference on Artificial Intelligence},
    year={2022},
    url={https://arxiv.org/abs/2206.04436}
}

About

Official implementation for "Towards Safe Reinforcement Learning via Constraining Conditional Value at Risk" (IJCAI 2022)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages