We are looking for a postdoctoral researcher to join the Full-RNA project for 24 months in Lille, France, to work on indexation of huge RNA-seq datasets.

Scientific context

This research project seeks to develop ready-to-use tools for uncovering important transcript variants from massive RNA-seq datasets, as well as for stratifying patients according to their full transcriptional profile. Achieving this requires an index structure that can cope with (hundreds of) thousands of RNA-seq datasets. In the team we developed REINDEER [1], the first structure that was able to index thousands of RNA-seq datasets and provide the quantification of a queried sequence in each indexed dataset. With over one million human RNA-seq datasets now available in public databases, there is a great opportunity to refine this index structure and make it even more powerful.

Purpose of the position

The purpose of this post-doc position is to investigate and implement ways to improve the index, looking at both theoretical and practical aspects. Possible developments include:

  1. Improving the storage in itself by adopting more efficient hash functions and optimizing the storage of the quantifications.
  2. Introducing a new representation for variants that offer meaningful information on the sequence and optimised storage.
  3. Designing a more versatile index that can be partitioned in such a way that queries may require access to only a few partitions.
  4. Adapting REINDEER to take advantage of third-generation sequencing data.

Institutional context

The post-doc will be hosted in the Bonsai research group in CRIStAL at the University de Lille and will be co-supervised by Camille Marchet and Mikaël Salson.

The Full-RNA project is funded by the ANR (French National Research Agency) and is a collaboration between I2BC/Univ Paris-Saclay (coordinator: Daniel Gautheret), IRMB/Univ Montpellier (coordinator: Thérèse Commes) and Institut Pasteur in Paris (coordinator: Rayan Chikhi). We have a long-standing collaboration with I2BC, IRMB and Institut Pasteur de Paris, and together we have published high-impact papers.

Recruitment

The ideal candidate will have a PhD in bioinformatics or computer science, with experience in all or some of these topics: data structures, algorithms, indexing techniques, programming in C/C++.

The successful candidate will be offered a two-year postdoctoral position. A budget is available in the ANR project for the funding of the equipment, the missions as well as for the supervision of a master student.

If you are interested by this position, please submit your CV and a cover letter detailing your experience and qualifications to Camille Marchet and Mikaël Salson (camille.marchet@univ-lille.fr, mikael.salson@univ-lille.fr).

The position will ideally start on September 1st.


[1] Marchet, C., Iqbal, Z., Gautheret, D., Salson, M., & Chikhi, R. (2020). REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets. Bioinformatics, 36(Supplement 1), i177-i185.

The bonsai picture is by Noj Han under the CC BY SA license.