CAP adaptation for safety_gym environment
Environment: Safexp-PointGoal1-v0 #Installation
- Create conda virtual encironment with python 3.6 (Only use python 3.6.13, mujoco_py installation fails otherwise)
- Copy mujoco210 folder
pip install -U 'mujoco-py<2.2,>=2.1'
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
sudo ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so
- CONDA INSTALL FOR EVERYTHING ELSE - numpy, clff, patchelf, imageio
sudo apt-get install libglew-dev
sudo apt-get install --reinstall libglvnd0
- Install
safety_gym
from https://github.com/openai/safety-gym and Mujoco by following the steps - Edit or comment
mujoco
in the seup.py under/safety-gym
and addmujoco-py<2.2,>=2.1
instead - Install starter rl kit from https://github.com/openai/safety-starter-agents and do the same as 3 for this environment as well
task: goal1, robot: point, algos: all
robot_list = ['point', 'car', 'doggo'] task_list = ['goal1', 'goal2', 'button1', 'button2', 'push1', 'push2'] algo_list = ['ppo', 'ppo_lagrangian', 'trpo', 'trpo_lagrangian', 'cpo']
Ex:
time python experiment.py --algo cpo --task goal1 --robot point --seed 20
Add --cpu
to use more than one core Ex --cpu 4
to use 4 cores
action_space - Box(2,) [-1,1]
observation_space - Box(60,) [-inf,inf]
In order to perform plotting use the plot.py script. Example:
python plot.py 2022-04-15_ppo_PointGoal1/2022-04-15_21-45-36-ppo_PointGoal1_s0 2022-04-16_trpo_PointGoal1/2022-04-16_00-25-36-trpo_PointGoal1_s0 ppo_lagrangian_PointGoal1/ 2022-04-14_20-51-30-trpo_lagrangian_PointGoal1_s20/ 2022-04-15_01-23-25-cpo_PointGoal1_s20/ --legend 'PPO' 'TRPO' 'PPO Lagrangian' 'TRPO Lagrangian' 'CPO' --value 'CostRate' 'Entropy' 'KL' 'AverageEpRet' 'AverageEpCost' --smooth 10
python plot.py data/2022-05-03_ppo_lagrangian_PointGoal1/2022-05-03_14-47-32-ppo_lagrangian_PointGoal1_s20 data/2022-05-03_trpo_lagrangian_PointGoal1/2022-05-03_14-22-20-trpo_lagrangian_PointGoal1_s20 data/2022-05-03_14-17-51-cpo_PointGoal1_s20 data/2022-05-03-CAP data/2022-05-03-fCAP_CCEM/ --xaxis Epoch --legend 'PPO Lagrangian' 'TRPO Lagrangian' 'CPO' 'CAP(Adapted)' 'CAP-PG-FAST(Ours)' --value 'AverageEpRet' 'AverageEpCost' --smooth 20 --paper
conda env create -f environment.yml
conda env update --name <name> --file environment.yml --prune
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --cost-limit 0 --binary-cost --cost-constrained --penalize-uncertainty --learn-kappa --penalty-kappa 0.1 --disable-cuda --symbolic-env
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --cost-limit 0 --binary-cost --cost-constrained --penalize-uncertainty --learn-kappa --penalty-kappa 0.1 --symbolic-env
- Modify the setup file (i.e) comment the
mujoco-py==2.0.2.7
requirement since we already havemujoco-py<2.2,>=2.1
- Install requirements using the
environment.yml
with steps provided as above - Install mujoco200 from https://www.roboti.us/download.html
- Create a new session-
tmux new -s [name]
- Attach to session with given pem file (ssh)
- Run
tmux attach-session -t CAP_original
This is where the CAP code is currently running or just runtmux attach
. This will attach to the last session. ctrl +B
andD
to detach from the session (this does not mean that the tmux session is stopped)- scrolling:
ctrl+b
and[
- Use
tmux kill-session -t session_name
to stop the session
- In order to SCP files to your local, first create a tar file of the directory e.g.
tar -zcvf Results_Hyper2.tar.gz results_hyper2_CAP_Jason
- Then open a terminal and scp to the file location and give the folder you want to scp to e.g.
scp -i kausik.pem [email protected]:~/workspace/safe_reinforcement_learning/CAP_orig/Results_Hyper2.tar.gz .
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --binary-cost --cost-limit 0 --state-size 60 --belief-size 60 --hidden-size 60 --cost-constrained --penalize-uncertainty --learn-kappa --penalty-kappa 0.1 --symbolic-env --max-episode-length 1000 --episodes 1000 --planning-horizon 50 --checkpoint-experience --cost-discount 1
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --binary-cost --cost-limit 50 --state-size 60 --belief-size 60 --hidden-size 60 --cost-constrained --penalize-uncertainty --learn-kappa --penalty-kappa 0.1 --symbolic-env --max-episode-length 1000 --episodes 1000 --planning-horizon 50 --checkpoint-experience --cost-discount 1.0 --batch-size 256--id Safe_gym_experiment2 --action-noise 0.01
Choosing Binary Cost(0,1) hence cost limit(C) =0 makes sense. State size is set as 60 because of gym observation size. action Repeat happening by default
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --binary-cost --cost-limit 150 --state-size 60 --belief-size 60 --hidden-size 60 --cost-constrained --penalize-uncertainty --learn-kappa --penalty-kappa 0.01 --symbolic-env --max-episode-length 1000 --episodes 1000 --planning-horizon 50 --checkpoint-experience --cost-discount 1.0 --batch-size 256 --id Safe_gym_experiment3 --action-noise 0.01
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --binary-cost --cost-limit 100 --state-size 60 --belief-size 60 --hidden-size 60 --cost-constrained --penalize-uncertainty --learn-kappa --penalty-kappa 0.05 --symbolic-env --max-episode-length 1000 --episodes 1000 --planning-horizon 50 --checkpoint-experience --cost-discount 1.0 --batch-size 256 --id CAP_original_experiment4 --action-noise 0.01
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --binary-cost --cost-limit 100 --state-size 60 --belief-size 128 --hidden-size 128 --cost-constrained --symbolic-env --max-episode-length 1000 --episodes 1000 --planning-horizon 64 --checkpoint-experience --cost-discount 1.0 --batch-size 128 --id CAP_orig_experiment5 --action-noise 0.01
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal2-v0 --binary-cost --cost-limit 220 --state-size 60 --belief-size 128 --hidden-size 128 --cost-constrained --penalty-kappa 0.0 --symbolic-env --max-episode-length 1000 --episodes 1000 --planning-horizon 64 --checkpoint-experience --cost-discount 1.0 --batch-size 128 --id CAP_original_exp6 --action-noise 0.01
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --binary-cost --cost-limit 100 --state -size 60 --belief-size 128 --hidden-size 128 --cost-constrained --symbolic-env --max-episode-length 1000 --episodes 1000 --planning-horizon 64 --checkpoint-experience --cost-discount 1.0 --batch-size 128 --id CAP_orig_experiment7 --action-noise 0.01 --penalty-kappa=0.1 --learn-kappa
python3 cap-planet/run_cap_planet.py --env Safexp-PointGoal1-v0 --binary-cost --cost-limit 100 --state-size 60 --belief-size 128 --hidden-size 128 --cost-constrained --symbolic-env --learn-kappa --max-episode-length 1000 --episodes 1000 --planning-horizon 64 --checkpoint-experience --cost-discount 1.0 --batch-size 128 --id CAP_orig_experiment8 --action-noise 0.01