Efficient Bandit Algorithms for Online Multiclass Prediction
The multiclass bandit setting models a wide range of practical supervised learning applications where the learner only receives partial feedback with respect to the true label (E.g. in many web applications users often only provide positive “click” feedback which does not necessarily fully disclose a true label). The Banditron has the ability to learn in a multiclass classification setting with the “bandit” feedback which only reveals whether or not the prediction made by the algorithm was correct or not (but does not necessarily reveal the true label).