Policy Optimization as Online Learning with Mediator Feedback

@inproceedings{randomist,
  author    = {Alberto Maria Metelli and
               Matteo Papini and
               Pierluca D'Oro and
               Marcello Restelli},
  title     = {Policy Optimization as Online Learning with Mediator Feedback},
  booktitle = {The Thirty-Fifth {AAAI} Conference on Artificial Intelligence, {AAAI} 2021}
}

Installation

We mainly use pytorch and numpy as linear algebra libraries. Run pip install -r requirements.txt, preferably on a clean environment, to install all the required packages.

Code structure

The structure of the repository is the following:

algorithms/
envs/
mellog/
utils.py
...

The algorithms/ directory contains scripts with classes for each algorithm. Some of them, are grouped together (OPTIMIST and FTL, PHE and the discrete version of RANDOMIST). algorithms/randomist.py and algorithms/discrete.py contain the abstract classes instantiated by the other algorithms to perform policy optimization with a discrete number of policies, with and without importance sampling.

How to run experiments

Experiments can be run with three main scripts:

run_my_experiment.py for the illustrative experiments in the MAB setting;
run_lqg_experiment.py for experiments in the Linear Quadratic Gaussian Regulator environment:
run_rl_experiment.py for experiments in the Mountain Car and Cartpole environments.

Each script requires the specification of a log directory and a name for the experiment. The last one should be the same for different runs of the same experiment. For each run, a directory is produced, contained files describing the used hyperparameters and a metrics.csv file with the logged metrics. For instance, for running MCMC-RANDOMIST on the Mountain Car environment for 2500 iterations, you can launch:

python run_rl_experiment.py --algorithm randomistMCMC --logdir log/mountaincar --exp_name mcmcrandomist --logging_freq 25 --pseudo_rewards_per_timestep 1.1 --env car --n_iterations 2500

To reproduce the results in the paper, we also provide three *.sh scripts that automate the process of launching multiple runs of the experiments using different algorithms. For instance, just use the command sh run_lqg_experiment.sh to reproduce the complete LQG experiment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Policy Optimization as Online Learning with Mediator Feedback

Installation

Code structure

How to run experiments

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
algorithms		algorithms
envs		envs
mellog		mellog
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_illustrative_experiment.sh		run_illustrative_experiment.sh
run_lqg_experiment.py		run_lqg_experiment.py
run_lqg_experiment.sh		run_lqg_experiment.sh
run_my_experiment.py		run_my_experiment.py
run_one_mab_experiment.sh		run_one_mab_experiment.sh
run_rl_experiment.py		run_rl_experiment.py
utils.py		utils.py

proceduralia/randomist

Folders and files

Latest commit

History

Repository files navigation

Policy Optimization as Online Learning with Mediator Feedback

Installation

Code structure

How to run experiments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages