We are looking for a postdoctoral researcher to join the Full-RNA project for 24 months in Lille, France, to work on indexation of huge RNA-seq datasets.
Scientific context
This research project seeks to develop ready-to-use tools for uncovering important transcript variants from massive RNA-seq datasets, as well as for stratifying patients according to their full transcriptional profile. Achieving this requires an index structure that can cope with (hundreds of) thousands of RNA-seq datasets. In the team we developed REINDEER [1], the first structure that was able to index thousands of RNA-seq datasets and provide the quantification of a queried sequence in each indexed dataset. With over one million human RNA-seq datasets now available in public databases, there is a great opportunity to refine this index structure and make it even more powerful.
Purpose of the position
The purpose of this post-doc position is to investigate and implement ways to improve the index, looking at both theoretical and practical aspects. Possible developments include:
- Improving the storage in itself by adopting more efficient hash functions and optimizing the storage of the quantifications.
- Introducing a new representation for variants that offer meaningful information on the sequence and optimised storage.
- Designing a more versatile index that can be partitioned in such a way that queries may require access to only a few partitions.
- Adapting REINDEER to take advantage of third-generation sequencing data.
Institutional context
The post-doc will be hosted in the Bonsai research group in CRIStAL at the University de Lille and will be co-supervised by Camille Marchet and Mikaël Salson.
The Full-RNA project is funded by the ANR (French National Research Agency) and is a collaboration between I2BC/Univ Paris-Saclay (coordinator: Daniel Gautheret), IRMB/Univ Montpellier (coordinator: Thérèse Commes) and Institut Pasteur in Paris (coordinator: Rayan Chikhi). We have a long-standing collaboration with I2BC, IRMB and Institut Pasteur de Paris, and together we have published high-impact papers.
Recruitment
The ideal candidate will have a PhD in bioinformatics or computer science, with experience in all or some of these topics: data structures, algorithms, indexing techniques, programming in C/C++.
The successful candidate will be offered a two-year postdoctoral position. A budget is available in the ANR project for the funding of the equipment, the missions as well as for the supervision of a master student.
If you are interested by this position, please submit your CV and a cover letter detailing your experience and qualifications to Camille Marchet and Mikaël Salson (camille.marchet@univ-lille.fr, mikael.salson@univ-lille.fr).
The position will ideally start on September 1st.
[1] Marchet, C., Iqbal, Z., Gautheret, D., Salson, M., & Chikhi, R. (2020). REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets. Bioinformatics, 36(Supplement 1), i177-i185.