Learning stochastic models generating sequences has many applications in natural language processing, speech recognitions or bioinformatics. Multiplicity Automata (MA) are graphical latent variable models that encompass a wide variety of linear systems. In particular, they can model stochastic languages, stochastic processes and controlled processes. Traditional learning algorithms such as the one of Baum-Welch are iterative, slow and may converge to local optima. A recent alternative is to use the Method of Moments (MoM) to design consistent and fast algorithms with pseudo-PAC guarantees. However, MoM-based algorithms have two main disadvantages. First, the PAC guarantees hold only if the size of the learned model corresponds to the size of the target model. Second, although these algorithms learn a function close to the target distribution, most do not ensure it will be a distribution. Thus, a model learned from a finite number of examples may return negative values or values that do not sum to one. This thesis addresses both problems. First, we extend the theoretical guarantees for compressed models, and propose a regularized spectral algorithm that adjusts the size of the model to the data. Then, an application in electronic warfare is proposed to sequence of the dwells of a superheterodyne receiver. Finally, we design new learning algorithms based on the MoM that do not suffer the problem of negative probabilities. We show for one of them pseudo-PAC guarantees.
Directeur de Thèse : Olivier PIETQUIN Rapporteurs : Joëlle PINEAU, François DENIS Examinateurs : Cyrille ENDERLI, Odalric-Ambrym MAILLARD, Marc TOMMASI