Model-based dimensionality reduction techniques for high-dimensional data

Anno
2017
Proponente Francesca Martella - Professore Associato
Sottosettore ERC del proponente del progetto
Componenti gruppo di ricerca
Componente Categoria
Maria Brigida Ferraro Componenti il gruppo di ricerca
Componente Qualifica Struttura Categoria
Jeanine J. Houwing-Duistermaat Professor Faculty of Mathematics and Physical Sciences, University of Leeds (The Netherlands)) Altro personale Sapienza o esterni
Caterina Fusilli PhD Bioinformatics Unit IRCCS Casa Sollievo della Sofferenza ¿ Mendel Roma (IT) Altro personale Sapienza o esterni
Ana Colubi Professor Department of Statistics of the University of Oviedo (Spain) Altro personale Sapienza o esterni
Tommaso Mazza PhD Bioinformatics Unit IRCCS Casa Sollievo della Sofferenza ¿ Mendel Roma (IT) Altro personale Sapienza o esterni
Cristina Mollica PhD Altro personale Sapienza o esterni
Abstract

Applications in various domains often lead to high-dimensional data, which put up the challenge of interpreting a huge mass of data, which often consists of millions of measurements. A first step towards addressing this challenge is the use of data reduction techniques, which is essential in the data mining process to reveal natural structures and to identify interesting patterns in the analyzed data. The research project entails relevant classes of dimensionality reduction techniques, which are introduced to account for high-dimensional data complexities. We assume that the (high-dimensionality) complexity may be captured via two different approaches which allow to summarize the two modes (rows and columns) of a data matrix: asymmetric and symmetric treatment of the two modes of the data matrix. In the asymmetric approach, the two modes assume a different role. The first mode represents objects and is summarized by clustering methods; while the other mode refers to variables and is reduced according to a factorial technique. In the symmetric approach, the two modes have an equal role and both are summarized by clustering techniques. Both approaches will be considered in a finite mixture context due to the potential advantages it has when compared with non-probabilistic clustering techniques. In particular, attention will be focused on the development of clustering/biclustering and simultaneous clustering/factorial reduction approaches for a continuous/discrete data matrix. The impact of time occasions in the model specification, and similarities between the finite mixture and the fuzzy logic approaches will also be considered.

ERC
Keywords:
name

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma