Anno: 
2018
Nome e qualifica del proponente del progetto: 
sb_p_950997
Abstract: 

A phylogenetic tree is a structure showing the evolutionary relationships among various biological species or other entities (taxa), their phylogeny, based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating common ancestry, called the Tree of Life.

Several automatic procedures have been designed to produce the phylogenetic tree for a group of species; all of them work on the basis of partial information and complete it using different optimality criteria. Changing the tool or the optimality criteria different trees are produced.
Comparing these trees to find their similarities and dissimilarities (and looking among them for the most probable one) is an important issue in biology.

In the literature, there is plenty of definitions of possible distance metrics that have been proposed to compare phylogenetic trees and extensively studied, but most of them seem to be completely uncorrelated each other.

Moreover, the evolutionary development of more organisms at the same time (e.g. of symbionts) is studied by co-phylogeny and, in order to do this, two or more phylogenetic trees are compared to detect possible evolutive similarities.
So, one of the main issues in this context consists again in comparing trees.

In this research, we will compare different measures of similarity for trees, in order to understand on the one hand if there are connections and correlations, on the other hand if some of them are more suitable than other to extract some biological information about evolution and co-evolution.

Finally, we will deep inside the important concept of reconciliation (that can be seen in some sense as a distance for trees), in order to understand whether it is possible to exploit it to deduce again information about co-evolution.

ERC: 
PE1_15
PE6_13
Innovatività: 

Although it could seem impossible, the many notions of tree similarity/dissimilarity have never been compared each other in a systematic and exhaustive way (we have already cited the very few papers dealing with this issue). So, for the first time, we will try to cover this lack in the literature, both theoretically and by experiments.
We will exploit the results of this research to try to propose some new summarizing distance measures that are fast to compute, and that are supposedly based on a set of operations (i.e. replacement or switch among trees) that transform a tree into another one; in doing this, we will only focus on operations that are biologically meaningful.

Moreover, we will investigate on reconciliations. Note that host-switches only occur between a pair of contemporary species. It may therefore be desirable to enforce this restriction and demand the existence of a temporal order on the nodes of the trees such that all host-switches occur between contemporary species (time feasible scenarios). Unfortunately, it is NP-hard [Oal11, THL11] to determine the reconciliation with maximum number of co-speciations. This difficulty has been traditionally overcome by ignoring the time feasibility requirement [BAK12, Dal11, THL11]; we would like to face the problem in an alternative and novel way, that is allowing only certain tree topolgies that can be exploited to solve the problem efficiently. We foresee that this choice could potentially be used to design efficient heuristics or approximation algorithms that, by locally identifying these tree structures, make optimal local choices.

We have already done some steps in this directions [CMS18]; in particular, we have studied the very special case in which the two trees are identical caterpillars and proven that in this special case the problem of determining an optimal reconciliation can be solved in polynomial time and that the similarity distance based on reconciliations is equivalent to the very well known distance based on the maximum subtree agreement [MT13, SS09]. This is not true for general trees and we strongly believe that this approach is promising. In the future, we would like to analyze tree structures that better fit biology requirements.

We conclude by observing that this research group has variegated competences; this is a richness, because we will have the opportunity to apply the techniques used in our original scientific field to the subject of this project and will be able to retrieve the many results present in the literature that are hidden under several names and contexts and have been designed to solve completely different problems.
We have already experienced that this approach of meet all togehter on a problem, each coming from a different field of computer science, is successfull and leads to strong results.

References:

[BAK12] Bansal MS, Alm E, Kellis M: Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28(12) (2012).

[CMS18] Calamoneri T, Monti A, Sinaimeri B: Is Co-Divergence Suitable to Understand Co-Evolution? Manuscript (2018).

[Dal11] Doyon JP, Scornavacca C, Gorbunov KY, Szollosi GJ, Ranwez V, Berry V: An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers. Proc. RECOMB-CG 2010, LNCS 6398, 2011.

[MT13] Martin DM, Thatte BD: The maximum agreement subtree problem. Discrete Applied Mathematics 161(13) (2013).

[Oal11] Ovadia Y, Fielder D, Conow C, Libeskind-Hadas, R: The cophylogeny reconstruction problem is NP-complete. Journal of Computational Biology 18(1) (2011).

[SS09] Steel M, Szkely LA: An improved bound on the maximum agreement subtree problem. Applied Mathematics Letters 22(11) (2009).

[THL11] Tofigh A, Hallett M, Lagergren J: Simultaneous identification of duplications and lateral gene transfers. Journal of IEEE/ACM Transactions on Computational Biology and Bioinformatics 8(2) (2011).

Codice Bando: 
950997

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma