Dissimilarity space representations and automatic feature selection for protein function prediction

04 Pubblicazione in atti di convegno
De Santis Enrico, Martino Alessio, Rizzi Antonello, Mascioli Fabio Massimo Frattale

Dissimilarity spaces, along with feature reduction/ selection techniques, are among the mainstream approaches when dealing with pattern recognition problems in structured (and possibly non-metric) domains. In this work, we aim at investigating dissimilarity space representations in a biology-related application, namely protein function classification, as proteins are a seminal example of structured data given their primary and tertiary structures. Specifically, we propose two different analyses relying on both the complete dissimilarity matrix and a dimensionally-reduced version of the complete dissimilarity matrix, thereby casting the pattern recognition problem from structured domains towards real-valued feature vectors, for which any standard classification algorithm can be used. A third, hybrid, analysis uses a clustering-based one-class classifier exploiting different representations. First results conducted on a subset of the Escherichia coli proteome are promising and some of the analyses presented in this work may also dually suit field-experts, further bridging the gap between natural sciences and computational intelligence techniques.

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma