Thesis of Julien Perolat


This thesis presents our work on reinforcement learning in stochastic games. The two first parts of this manuscript study the problem of learning from batch data. We propose a first approach using approximate dynamic programming in zero-sum two-player Markov games and discuss its limitations for general sum Markov games. As a second approach, we study the direct minimization of the Bellman residual for zero-sum two player Markov games. This apprach is generalized to general sum Markov games. Finally we study the online setting and propose an actor-critic algorithm that converges in both zero-sum two-player Markov games and in cooperative multistage games.


Directeur de thèse : Olivier PIETQUIN Rapporteurs : Damien ERNST, Doina PRECUP Examinateurs : Laurance Duchien, Ronald ORTNER, Bilal PIOT, Bruno SCHERRER, Karl TUYLS

Thesis of the team defended on 18/12/2017