Bonsai Bioinformatics

Postdoctoral position

We have a postdoctoral opportunity in our lab (Bonsai group, Lille, France) in computational biology and highthroughput sequencing data analysis. This position is funded by Inria.

  • Application deadline: August 2015, 21st
  • Start date : November 2015 (negotiable)
  • Duration: 16 months

A theoretical and comparative analysis of seed models for DNA sequences

Sequence similarity search is a fundamental way of analyzing nucleotide sequences, that has many applications in high throughput sequencing data analysis: Read mapping, genome assembly, etc. Most popular methods are typically based on a seed-and-extend approach, which has several variants according to the underlying combinatorial model used for the seed: Spaced seeds allowing for errors at some predetermined positions, transition seeds, approximate seeds with arbitrary errors, combination of seeds, to cite a few. The particular choice of a seed model is determined by the tradeoff between search speed, sensitivity, and selectivity.

The objective of the research is to conduct a comparative analysis of the models of seeds from both theoretical and practical point of views. In our group several methods have been developed that rely on different type of seeds, depending on the application. Now we would like to have a better insight on the advantages and drawbacks of each type of seeds. For instance, in the case of indels, approximate seeds should be preferred over spaced seeds. But what type of approximate seeds? More generally, the goal is to determine the method(s) of choice depending on several parameters (size of the pattern, rate of error, type of errors…). This will shed a new light on seeds and their use in algorithms for DNA sequences.

Your Profile

  • Ph.D. in Theoretical Computer Science or in a similar area
  • Knowledge of one or more of the following subjects is favorable: formal languages and automata, algorithms on words and trees, analytic combinatorics.


To apply, please send a resume, a cover letter and the names of three references to:, and


L. Noé, D.EK. MArtin,  A Coverage Criterion for Spaced Seeds and Its  Applications to Support Vector Machine String Kernels and k-Mer  Distances, Journal of Computational Biology, Vol. 21,  no 12, 28 p. (2014).

M. Frith, L. Noé. Improved search heuristics find 20 000 new alignments between human and mouse genomes, Nucleic Acids Research, vol. 42, no 7, e59 p.(2014)

C. Vroland, M. Salson, H. Touzet,  Lossless seeds for searching short patterns with high error rates, International Workshop On Combinatorial Algorithms (IWOCA) to appaer in LNCS (2014)

N. Philippe, M. Salson, T. Commes, É. Rivals, CRAC: an integrated approach to the analysis of RNA-seq reads, In Genome Biology, volume 14 (2013)

Kopylova E., Noé L. and Touzet H. SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data, Bioinformatics (2012)

  • cnrs
  • university of lille