Skip to content

v0.5.2

Compare
Choose a tag to compare
@usaito usaito released this 13 Jan 12:57
· 81 commits to master since this release
288a5c9

Updates

  • Implement obp.policy.QLearner (#144 )
  • Implement Balanced IPW Estimator as obp.ope.BalancedInverseProbabilityWeighting. See Sondhi et al.(2020) for details. (#146 ).
  • Implement the Cascade Doubly Robust estimator for the combinatorial action OPE as obp.ope.CascadeDR. See Kiyohara et al.(2022) for details. (#142 )
  • Implement a data-driven hyperparameter tuning method for OPE called SLOPE proposed by Su et al.(2020) and Tucker et al.(2021) (#148 )
  • Implement new estimators for the standard OPE based on a power-mean transformation of importance weights proposed by Metelli et al.(2021) (#149 )
  • Implement dataset class for generating synthetic logged bandit data with multiple loggers. Corresponding estimators will be added in the next update (#150 )
  • Implement an argument to control the number of deficient actions in obp.dataset.SyntheticBanditDataset and obp.dataset.MultiClassToBanditReduction. See Sachdeva et al.(2020) for details. (#150 )
  • Implement some flexible functions to synthesize reward function and behavior policy (#145 )

Minors

  • Adjust to sklearn>=1.0.0
  • Fix/update error messages, docstrings, and examples.

References

  • Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, and Yasuo Yamamoto. Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model. WSDM2022.
  • Arjun Sondhi, David Arbour, Drew Dimmery. Balanced Off-Policy Evaluation in General Action Spaces. AISTATS2020.
  • Yi Su, Pavithra Srinath, Akshay Krishnamurthy. Adaptive Estimator Selection for Off-Policy Evaluation. ICML2020.
  • George Tucker and Jonathan Lee. Improved Estimator Selection for Off-Policy Evaluation. 2021
  • Alberto Maria Metelli, Alessio Russo, Marcello Restelli. Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning. NeurIPS2021.
  • Noveen Sachdeva, Yi Su, and Thorsten Joachims. "Off-policy Bandits with Deficient Support.", KDD2020.
  • Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims. "Effective Evaluation using Logged Bandit Feedback from Multiple Loggers.", KDD2018.
  • Nathan Kallus, Yuta Saito, and Masatoshi Uehara. "Optimal Off-Policy Evaluation from Multiple Logging Policies.", ICML2021.