Thesis of Azadeh Saffarian

Prediction and pattern matching algorithms for RNA multi-structures

RNA (ribonucleic acid) molecules have various functions in cells. Just as they can store and deliver the DNA message for the protein synthesis (messenger RNAs), they can also directly catalyze chemical reactions or act as a regulator (functional RNAs, also called non-coding RNAs). Nowadays, recent sequencing technologies yield billions of genomic sequences -- DNA, RNA -- at a very small cost. However, sequencing is only the first step: The function of the sequence remains open for investigation. The objective of the thesis is to define new computational methods to help sequence and structure analysis of non-coding RNAs. In this perspective, the "secondary structure" of an RNA, made with base pairs, provides useful hints to further study its function. Our work is focused on sets of all possible RNA structures for a given sequence, introducing the concept of "RNA multi-structures". The thesis details how such sets can be constructed systematically to generate all locally optimal secondary structures, and how they can be used as a pattern to identify non-coding RNAs in genomic sequences. We provide efficient algorithms for these two problems. These algorithms have been implemented in the software tools Alterna and Regliss and tested on real data, providing new insight into RNA structures.

Jury

Directrice : Hélène Touzet, Directrice de recherche CNRS, LIFL Co-encadrant : Mathieu Giraud, Chargé de recherche CNRS, LIFL Rapporteurs : Pascal Ferraro, Maître de Conférences HDR, Université Bordeaux Robert Giegerich, Professeur, Université Bielefeld (Allemagne) Membres : François Boulier, Professeur, Université Lille 1 Yann Ponty, Chargé de Recherche CNRS, LRI, Orsay et LIX, Palaiseau

Thesis of the team Bonsai defended on 16/11/2011