Code for our AJCAI 2020 paper:
"Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward"
by Baihan Lin (Columbia).
For the latest full paper: https://arxiv.org/abs/2009.08457
All the experimental results can be reproduced using the code in this repository. Feel free to contact me by [email protected] if you have any question about our work.
Abstract
We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different episodes and the reward feedbacks are not always available to the decision making agents. For this online semi-supervised learning setting, we introduced Background Episodic Reward LinUCB (BerlinUCB), a solution that easily incorporates clustering as a self-supervision module to provide useful side information when rewards are not observed. Our experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach over the standard contextual bandit. Lastly, we introduced a relevant real-life example where this problem setting is especially useful.
Language: Matlab
Platform: MacOS, Linux, Windows
by Baihan Lin, Dec 2019
If you find this work helpful, please try out the models and cite our works. Thanks!
@inproceedings{lin2020semi,
title={Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward},
author={Lin, Baihan},
booktitle={Australasian Joint Conference on Artificial Intelligence},
year={2020},
organization={Springer}
}
- Matlab
This work is related the following work by the same author and shares codes with their repositories. Feel free to check out and contact the primary author Baihan Lin ([email protected]) for any questions. Thank you!
- https://github.com/doerlbh/ABaCoDE (Lin et al., "Contextual Bandit with Adaptive Feature Extraction", ICDMW 2018)
- https://github.com/doerlbh/MiniVox (Lin & Zhang, "VoiceID on the fly: A Speaker Recognition System that Learns from Scratch", InterSpeech 2020)
- https://github.com/doerlbh/mentalRL (Lin et al., "A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry", AAMAS 2020; Lin et al., "Split Q Learning: Reinforcement Learning with Two-Stream Rewards", IJCAI 2019)
- https://github.com/doerlbh/dilemmaRL (Lin et al., "Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior", arXiv 2020)