This repository contains the code for our paper "Off-Policy Evaluation with Deficient Support Using Side Information", accepted at NeurIPS 2022. The pre-print of the paper is available here.
The experiments were made with the following configuration:
- Ubuntu: version 20.04.
- Python: version 3.8.
- Pip: version 22.1.1.
- RAM: for the pre-processing phase, we rely on the pyIEOE package. Due to the dataset size, the pre-processing phase may take up to 40 GB of RAM.
The code is mainly built upon the Open Bandit Pipeline package and the pyIEOE package.
First of all, download the Open Bandit Dataset at this link: https://research.zozo.com/data.html . Then, create a directory called "logs" (if not already present), and unzip the dataset inside the "logs" directory. The final structure should look like this:
logs
└── open_bandit_dataset
Now, we create our virtual environment.
sudo apt update
sudo apt install python3.8-venv
python3 -m venv virtualenv
To activate the virtual environment, run:
source virtualenv/bin/activate
After the activation, we can install the requirements:
pip install --upgrade pip
pip install --upgrade "jax[cpu]"
pip install -r requirements.txt
To start the complete experiments, run:
python3 def_benchmark_ope.py
WARNING: The PseudoInverse estimator is much slower than the others, you may want to run the others in parallel first with:
python3 def_benchmark_ope_NO_PI.py
If you want to change the number of deficient actions (the default is 8 over a total of 80), you can do it in this way:
python3 def_benchmark_ope_NO_PI.py setting.n_deficient_actions=m
where m can be any number between 0 and 79.
If you use this code in your work, please cite our paper:
Bibtex:
@incollection{felicioni2022offpolicy,
author = {Nicol{\`{o}} Felicioni and Maurizio {Ferrari Dacrema} and Marcello Restelli and Paolo Cremonesi},
title = {Off-Policy Evaluation with Deficient Support Using Side Information},
booktitle = {Advances in Neural Information Processing Systems 35 (NeurIPS)},
year = {2022},
}