CRIStAL - Centre de Recherche en Informatique et Automatique de Lille

Thesis of Pratik Gajane

Multi-armed bandits with unconventional feedback

The multi-armed bandit (MAB) problem is a mathematical formulation of the exploration-exploitation trade-off in reinforcement learning in which the learner chooses an arm from a set of available arms in a sequence of trials in order to maximise their reward. In the classical MAB problem, the learner receives absolute bandit feedback i.e. it receives as feedback the reward of the arm it selects. In many practical situations however, different kind of feedback is more readily available. In this thesis, we study two of such kinds of feedback, namely, relative feedback and corrupt feedback.

Jury

Directeur de thèse : Philippe Preux, Professeur, Université de Lille, Villeneuve d’Ascq Co-encadrant : Tanguy Urvoy, chercheur, Orange Labs, Lannion Rapporteurs : Aurélien Garivier, Professeur, Institut de Mathématiques de Toulouse, Université Paul Sabatier, Toulouse Maarten de Rijke, Professor, University of Amsterdam Examinateurs : Alexandra Carpentier, chercheur, Institut für Mathematik, Universität Postdam Richard Combes, Centrale-Supélec, Saclay Emilie Kaufmann, chercheur, CNRS, CRIStAL Gabor Lugosi, University Pompeu Fabra, Barcelone

Thesis of the team defended on 14/11/2017

AGENDA

Every dates to be informed about meetings not to miss

UTILITIES

Recruitment

Join our research teams

Thesis of Pratik Gajane

Multi-armed bandits with unconventional feedback

Jury

AGENDA

UTILITIES

Recruitment