Ricerc@Sapienza

Clustering rows and columns in a categorical and mixed-type data matrix.

Anno

2021

Proponente Francesca Martella - Professore Associato

Struttura

DIPARTIMENTO DI SCIENZE STATISTICHE

Sottosettore ERC del proponente del progetto

PE1_14

Componenti gruppo di ricerca

Componente	Categoria
Monia Ranalli	Componenti strutturati del gruppo di ricerca
Vittoria Carolina Malpassuti	Dottorando/Assegnista/Specializzando componente non strutturato del gruppo di ricerca
Emiliano Seri	Dottorando/Assegnista/Specializzando componente non strutturato del gruppo di ricerca
Marco Alfo'	Componenti strutturati del gruppo di ricerca
Donatella Vicari	Componenti strutturati del gruppo di ricerca

Componente	Qualifica	Struttura	Categoria
Maria Francesca Marino	Assistant professor	University of Florence	Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca
Jeanine Houwing	Full professor	University of Utrecht	Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca
Antonello Maruotti	Full professor	University of Rome LUMSA	Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca
Caterina Fusilli	Associate Researcher	Merck Group	Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca

Abstract

In the last years, we have experienced a substantial increase of applications in several empirical domains where high-dimensional data were entailed. With high-dimensional data we mean hundreds or thousands of variables for each unit in the observed sample. This has attracted the interest of a growing number of researchers due to the need of data reduction techniques in such applications, and several new approaches have been investigated is these areas.
Biclustering techniques have been proposed in several scientific fields especially to analyze data matrices where the two modes, which are usually units (rows) and variables (columns), can play the same role. In such cases, subsets of units may in fact be homogeneous only under a limited set of conditions (variables) while showing little similarities outside these.
One of the main tasks for modern statistical approaches to biclustering is to develop techniques for handling categorical (nominal and ordinal) and mixed-type data. Such data are encountered very frequently in practice whenever, for example, attitudes, abilities, or opinions are quantities of interest. However, practitioners often apply in such a context techniques developed for continuous data that can often be found to be inappropriate. This can lead to wrong results and, therefore, it would be worth taking the essential characteristics and features of these data into proper account to develop more appropriate techniques.
Our research project aims at defining new biclustering approaches for categorical and mixed-type data; specifically, we will start by extending clustering methods for categorical or mixed-type data in a two-mode setting, through both heuristic and model-based approaches. We will also look at extensions of such techniques to evaluate the impact of time in the model specification when longitudinal data (the units are followed in time) are available.

ERC

PE1_14, PE1_18

Keywords:

CLUSTER ANALYSIS, ANALISI STATISTICA DEI DATI, ANALISI MULTIVARIATA, STATISTICA COMPUTAZIONALE, BIOSTATISTICA