Summary

The main focus of the group is the development of novel algorithms for the comparison of multiple biological sequences. Multiple comparisons have the advantage of precisely revealing evolutionary traces, thus allowing the identification of functional constraints imposed on the evolution of biological entities. Most comparisons are currently carried out on the basis of sequence similarity. Our goal is to extend this scope by allowing comparisons based on any relevant biological signal such as sequence homology, structural similarity, genomic structure, functional similarity and more generally any signal that may be identified within biological sequences. Using such heterogeneous signals serves two complementary purposes: (i) producing better models that take advantage of the evolutionary resilience, (ii) improving our understanding of the evolutionary processes that leads to the diversification of biological features. For this purpose, we are developing methods for the comparison of protein sequences, protein structures, RNA sequences and structures as well as complete genomes. We apply these methods to a wide range of biological questions that include: Leishmania Donovani resistance, animal and plant domestication, human and other model system gene annotation. We are also applying similar algorithms to longitudinal data analysis in order to mine recordings for predictive patterns, with a special interest in the obesity onset in murine models. All the tools we develop are open source freeware that can either be downloaded for personal use or accessed through dedicated web interfaces at tcoffee.crg.eu.


Research projects

  • Development of multiple sequence aligners: T-Coffee
  • Homology modelling of Non Coding RNA
  • Large Scale Protein Sequence Alignments
  • Multiple Genome Alignments
  • Longitudinal data modelling
  • Structure based multiple sequence comparison and classification
  • Protein and RNA structural evolution
  • HPC Computation tools: Nextflow and Containerization