A phylogenetic tree is a rooted tree with its leaves labeled by species, or strains of species, more abstractly known as taxa and it is a model used to exhibit ancestral relations between the species. More recently, it has been observed that, when certain biological events are believed to be involved, phylogenies can be better represented as (possibly cyclic) networks.
A pathway is a series of interactions among molecules in a cell that leads to a certain product or a change in a cell and can trigger the assembly of new molecules, such as a fat or a protein.
Whichever is the structure we decide to use, there are many reasons to wonder whether they are similar or not, and hence many distances have been introduced in the literature in order to quantify the dissimilarity of two structures. Among them there are MAST and MASN, consisting in looking for the Maximum Agreement SubTree and SubNetwork, respectively, that somehow require to search isomorphic subgraphs within the input structures.
In this research, we aim at introducing for the first time a new notion of distance that works for each one of the general structures defined above; this versatility is guaranteed because it is heavily based on the concept of graph isomorphism (GI), on which the experience of the components of the research group is very strong.
Also the idea of introducing the isomorphism is completely novel and we believe we can achieve very interesting results.
Although many distances have been defined and tested in the literature, unfortunately none of them is satisfactory.
Indeed, classical distances are easy to compute but are not able to convey enough information (e.g. Robinson-Foulds distance is overly sensitive to some small changes: just moving a leaf at the end of a caterpillar tree -i.e. a single spine to which all leaves are attached- to the other end will create a tree with the maximum possible RF distance to the original tree, yet the two trees are intuitively very similar).
On the contrary, distances based on transformation operations would be more useful to discriminate structures, but are NP-hard to compute and most of real data-sets are too large to be handled in a reasonable time, so they are in fact unusable.
So, our idea to introduce a new distance based on the concept of graph isomorphism would have the merit of satisfactorily comparing either biological or medical structures because it would act globally, as distances based on transformations are able to do, but it would exploit the tractability of graph isomorphism problem on trees and other special graphs to be computed in polynomial time.
So, it would capture a good balance between effectiveness and efficiency, with immediate gains for all the applications exploiting distances between structures we have already discussed.
Aware that bioinformatics and computer science applied to medicine have a lot in common, most of the members of this research group are within the Interdepartment Center S.T.I.T.C.H. (Sapienza information-based Technology InnovaTion Center for Health https://web.uniroma1.it/stitch/). This is for us a great opportunity to interact with researchers in medicine, who will provide us some interesting real-life scenarios to lead our experiments.
Moreover, we interact with the Evolutionary Biology Lab at INRIA, Grenoble, and this put us in contact with biologists, proposing us many applicative context.
Furthermore, this group is in contact with some of the most expert researchers in the field of GI.
We are sure we will be able to take advantage by the relations with these multidisciplinary research areas, both in terms of gaining real-life problems to solve and in terms of verifying the applicability of our solutions.
Finally, we observe that this research group has variegated competences, from theoretical computer science to algorithmics, from formal methods to bioinformatics, from network medicine to logics, from parallel systems to wireless networks.
This is a richness, because we will have the opportunity to apply the techniques used in our original scientific fields to the subject of this project and will be able to retrieve the many results present in the literature that are hidden under several names and contexts and have been designed to solve completely different problems.
We have already experienced that this approach of meeting all together on a problem, each coming from a different field of computer science, is successful and leads to strong results.