Instrument
Instrument
Instrument
Instrument

Miniaturisation pour la Synthèse, l’Analyse et la Protéomique (MSAP) UAR 3290

Thesis: Computational methods for screening archaeological and palaeontological samples

Recent advances in molecular biology have paved the way for access to the genetic information contained in samples of ancient organisms, renewing approaches to species identification used in archaeology and paleontology. Paleogenomics is at the cutting edge of these techniques and has seen spectacular development. However, this field remains limited by the fragility of the DNA molecule. On the other hand, the proteome made up of the residual ancient proteins contained in the material tends to be more stable over time, making it possible to identify species from samples dating back hundreds of thousands of years [1]. What’s more, it’s a fast, cost-effective and minimally invasive technique. Paleoproteomics is thus an interesting complement to paleogenomics, and is currently making rapid progress. One of the first achievements was the large-scale screening of Denisova cave fossils for human bones based on the collagen present [2]. The method has since been applied to a wide range of animals, periods and ecosystems [3,4,5]. Today, there is a growing trend to extend paleoproteomics to the exploration of other materials, such as enamel, eggshell, ivory, hair or body hair, using proteins other than collagen as markers, thus broadening the technique’s application possibilities. In this context, the availability of reliable and efficient methods for automatic data analysis becomes crucial. This involves mass spectral data, either from MS or MS/MS instruments. Existing bioinformatics tools are very patchy, and the preliminary work presented in [6,7] shows just how complex the problem is. There is therefore a real need to provide the scientific community with a comprehensive framework specifically designed to meet the challenges of species identification in old proteomic data. This is the aim of this thesis project. More specifically, the aim is to introduce a generic formal framework for manipulating mass spectra and analyzing them under a variety of conditions.

This includes the following three main questions:
● Propose algorithms dedicated to the pre-processing of MALDI spectra
● Introduce the notion of a ‘theoretical model spectrum’, then propose algorithms for generating theoretical spectra from sequence data, complete or incomplete, or from mass data.
● Propose species identification algorithms based on the comparison of experimental and theoretical spectra.

Directeur-e-s:
● Fabrice BRAY (MSAP, ingénieur de recherche CNRS, HDR)
● Hélène TOUZET (CRIStAL équipe BONSAI, DR CNRS, HDR)

Encadrante:
● Julie Jacques (CRIStAL équipe ORKAD, MCF

In the headlines