Clustering rows and columns in a categorical and mixed-type data matrix.

Anno
2021
Proponente Francesca Martella - Professore Associato
Sottosettore ERC del proponente del progetto
PE1_14
Componenti gruppo di ricerca
Componente Categoria
Monia Ranalli Componenti strutturati del gruppo di ricerca
Vittoria Carolina Malpassuti Dottorando/Assegnista/Specializzando componente non strutturato del gruppo di ricerca
Emiliano Seri Dottorando/Assegnista/Specializzando componente non strutturato del gruppo di ricerca
Marco Alfo' Componenti strutturati del gruppo di ricerca
Donatella Vicari Componenti strutturati del gruppo di ricerca
Componente Qualifica Struttura Categoria
Maria Francesca Marino Assistant professor University of Florence Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca
Jeanine Houwing Full professor University of Utrecht Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca
Antonello Maruotti Full professor University of Rome LUMSA Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca
Caterina Fusilli Associate Researcher Merck Group Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca
Abstract

In the last years, we have experienced a substantial increase of applications in several empirical domains where high-dimensional data were entailed. With high-dimensional data we mean hundreds or thousands of variables for each unit in the observed sample. This has attracted the interest of a growing number of researchers due to the need of data reduction techniques in such applications, and several new approaches have been investigated is these areas.
Biclustering techniques have been proposed in several scientific fields especially to analyze data matrices where the two modes, which are usually units (rows) and variables (columns), can play the same role. In such cases, subsets of units may in fact be homogeneous only under a limited set of conditions (variables) while showing little similarities outside these.
One of the main tasks for modern statistical approaches to biclustering is to develop techniques for handling categorical (nominal and ordinal) and mixed-type data. Such data are encountered very frequently in practice whenever, for example, attitudes, abilities, or opinions are quantities of interest. However, practitioners often apply in such a context techniques developed for continuous data that can often be found to be inappropriate. This can lead to wrong results and, therefore, it would be worth taking the essential characteristics and features of these data into proper account to develop more appropriate techniques.
Our research project aims at defining new biclustering approaches for categorical and mixed-type data; specifically, we will start by extending clustering methods for categorical or mixed-type data in a two-mode setting, through both heuristic and model-based approaches. We will also look at extensions of such techniques to evaluate the impact of time in the model specification when longitudinal data (the units are followed in time) are available.

ERC
PE1_14, PE1_18
Keywords:
CLUSTER ANALYSIS, ANALISI STATISTICA DEI DATI, ANALISI MULTIVARIATA, STATISTICA COMPUTAZIONALE, BIOSTATISTICA

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma