Thesis of Pierre Pericard

Algorithms for conserved markers sequences reconstruction in metagenomics data

Recent advances in DNA sequencing now allow studying the genetic material from microbial communities extracted from natural environmental samples. This new research field, called metagenomics, is leading innovation in many areas such as human health, agriculture, and ecology. To analyse such samples, new bioinformatics methods are still needed to ascertain the studied community taxonomic composition because accurate organisms identification is a necessary step to understand even the simplest ecosystems. However, current sequencing technologies are generating short and noisy DNA fragments, which only partially cover the complete genes sequences, giving rise to a major challenge for high resolution taxonomic analysis. We developped MATAM, a new bioinformatic methods dedicated to fast reconstruction of low-error complete sequences from conserved phylogenetic markers, starting from raw sequencing data. This methods is a multi-step process that builds and analyses a read overlap graph. We applied MATAM to the reconstruction of the small sub unit ribosomal ARN in simulated, synthetic and genuine metagenomes. We obtained high quality results, improving the state of the art.

Jury

- Directeur de thèse : Hélène TOUZET - Rapporteurs : Claudine MÉDIGUE, Dominique LAVENIER - Examinateurs : Laetitia VERMEULEN-JOURDAN, Pierre PEYRET, Samuel BLANQUART

Thesis of the team Bonsai defended on 27/10/2017