Thesis of Pratik Gajane

Multi-armed bandits with unconventional feedback

The multi-armed bandit (MAB) problem is a mathematical formulation of the exploration-exploitation trade-off in reinforcement learning in which the learner chooses an arm from a set of available arms in a sequence of trials in order to maximise their reward. In the classical MAB problem, the learner receives absolute bandit feedback i.e. it receives as feedback the reward of the arm it selects. In many practical situations however, different kind of feedback is more readily available. In this thesis, we study two of such kinds of feedback, namely, relative feedback and corrupt feedback.

Jury

Directeur de thèse : Philippe Preux, Professeur, Université de Lille, Villeneuve d’Ascq Co-encadrant : Tanguy Urvoy, chercheur, Orange Labs, Lannion Rapporteurs : Aurélien Garivier, Professeur, Institut de Mathématiques de Toulouse, Université Paul Sabatier, Toulouse Maarten de Rijke, Professor, University of Amsterdam Examinateurs : Alexandra Carpentier, chercheur, Institut für Mathematik, Universität Postdam Richard Combes, Centrale-Supélec, Saclay Emilie Kaufmann, chercheur, CNRS, CRIStAL Gabor Lugosi, University Pompeu Fabra, Barcelone

Thesis of the team defended on 14/11/2017