This repository contains the code/results corresponding to our ESE650 Final Project on Safe Reinforcement Learning.
Different branches of the repository contain code/results as indicated below:
- CAP adaption for safety-gym environment (branch: CAP_original)
- Our implemnetation of CAP-PG(very Slow) (branch: CAP_CCEM)
- Our implemnetation of CAP-PG(Faster) (branch: CAP_CCEM_FAST)
For all questions and issues please reach to Swati Gupta([email protected]), Jasleen Dhanoa([email protected]) and Kausik Sivakumar([email protected]).
- Create conda virtual encironment with python 3.6 (Use only python 3.6.13, mujoco_py installation fails otherwise)
- Install mujoco_py: 'pip3 install -U 'mujoco-py<2.2,>=2.1'
- Run
import mujoco_py
in python terminal. If there is no error proceed to check if your installation is done properly(step 10) - If you face GLEW initalization error(especially on AWS): do steps 4-9
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
sudo ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so
sudo apt-get install libglew-dev
sudo apt-get install --reinstall libglvnd0
- Do CONDA INSTALL FOR EVERYTHING ELSE which might be missing while running
import mujoco_py
in python terminal e.g.- numpy, clff, patchelf, imageio - Check if your installation of mujoco_py is done properly by making an environment: https://github.com/openai/mujoco-py
- Install gym via pip if it is not already installed:
pip install gym
- Install
safety_gym
by following steps 1-3:git clone https://github.com/openai/safety-gym.git
cd safety-gym
pip install -e .
- Edit or comment
mujoco
in the seup.py under/safety-gym
and addmujoco-py<2.2,>=2.1
instead as we have mujoco210 - Check if the installation is done properly by making an environment:https://github.com/openai/safety-gym(Mujoco needs to be installed prior to this!)
Pytorch needs to be installed and requires the appropriate version of CUDA if running on GPU
- In order to modify your existing environment to be able to run our adaptation of CAP code for safety-gym use the
environment.yml
file provided in the repository e.g.conda env update --name <name> --file environment.yml --prune
- Modify the setup file (i.e) comment the
mujoco-py==2.0.2.7
requirement since we already havemujoco-py<2.2,>=2.1
- Install mujoco200(mujoco200_linux was used by us) from https://www.roboti.us/download.html alongside mujoco210(already installed)
- Run CAP follwing commands given below:
The EC2 instance needs all of the above installations.
- Create an EC2 instance and ssh into it using pem file
- Create a new tmux session-
tmux new -s [name]
- Run the CAP code by specifying the hyperparameters as shown below.
- Detach and attach from the session as required.
- It will generate live train and test plots for you to monitor the progress of your model
- Install starter RL kit from https://github.com/openai/safety-starter-agents for getting the Baselines code
- The following Robots, Tasks and RL-Algorithms are available for training and plotting in the safety-starter-agents code:
- robot_list = ['point', 'car', 'doggo']
- task_list = ['goal1', 'goal2', 'button1', 'button2', 'push1', 'push2']
- algo_list = ['ppo', 'ppo_lagrangian', 'trpo', 'trpo_lagrangian', 'cpo']
- Choose the robot, task and algo to create to train on a suitable environment. In our paper, we have run all experiments on point goal1 environment for all of the above algorithms. e.g.
time python experiment.py --algo cpo --task goal1 --robot point --seed 20 --cpu 8
- In order to perform plotting use the plot.py script provided in safety-starter-agents code. eg:
python plot.py 2022-04-15_ppo_PointGoal1/2022-04-15_21-45-36-ppo_PointGoal1_s0 2022-04-16_trpo_PointGoal1/2022-04-16_00-25-36-trpo_PointGoal1_s0 ppo_lagrangian_PointGoal1/ 2022-04-14_20-51-30-trpo_lagrangian_PointGoal1_s20/ 2022-04-15_01-23-25-cpo_PointGoal1_s20/ --legend 'PPO' 'TRPO' 'PPO Lagrangian' 'TRPO Lagrangian' 'CPO' --value 'CostRate' 'Entropy' 'KL' 'AverageEpRet' 'AverageEpCost' --smooth 10
Please go to the appropriate branch to run the CAP code as given above. The commands to run the code are the same.
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --cost-limit 0 --binary-cost --cost-constrained --penalize-uncertainty --learn-kappa --penalty-kappa 0.1 --disable-cuda --symbolic-env
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --cost-limit 0 --binary-cost --cost-constrained --penalize-uncertainty --learn-kappa --penalty-kappa 0.1 --symbolic-env
<<<<<<< HEAD>>>>>>>>>>>
- Modify the setup file (i.e) comment the
mujoco-py==2.0.2.7
requirement since we already havemujoco-py<2.2,>=2.1
- Install requirements using the
environment.yml
with steps provided as above - Install mujoco200 from https://www.roboti.us/download.html
- Attach to session with given pem file (ssh)
- Run
tmux attach-session -t CAP_original
This is where the CAP code is currently running ctrl +B
andD
to detach from the session (this does not mean that the tmux session is stopped)- Use
tmux kill-session -t session_name
to stop the session
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --binary-cost --cost-limit 0 --state-size 60 --belief-size 60 --hidden-size 60 --cost-constrained --penalize-uncertainty --learn-kappa --penalty-kappa 0.1 --symbolic-env --max-episode-length 1000 --episodes 1000 --planning-horizon 50 --checkpoint-experience --cost-discount 1
Choosing Binary Cost(0,1) hence cost limit(C) =0 makes sense. State size is set as 60 because of gym observation size. action Repeat happening by default
=======
We acknowledge and thank the efforts of the maintainers of these wondeful repositories. It wouldn't have been possible to do this project without building up on top of their efforts!
- Github repo for the original CAP implementation: https://github.com/Redrew/CAP
- Github repo for safety-starter agents: https://github.com/openai/safety-starter-agents
- Github repo mujoco_py:https://github.com/openai/mujoco-py
5b6e07d... README update