Ricerc@Sapienza

New statistical learning methods for model-based unsupervised classification of complex and high dimensional data

Anno

2020

Proponente Roberto Rocci - Professore Ordinario

Struttura

DIPARTIMENTO DI SCIENZE STATISTICHE

Sottosettore ERC del proponente del progetto

PE1_14

Componenti gruppo di ricerca

Componente	Categoria
Marco Alfo'	Componenti strutturati del gruppo di ricerca / Structured participants in the research project
Monia Ranalli	Componenti strutturati del gruppo di ricerca / Structured participants in the research project
Donatella Vicari	Componenti strutturati del gruppo di ricerca / Structured participants in the research project
Vittoria Carolina Malpassuti	Dottorando/Assegnista/Specializzando componente non strutturato del gruppo di ricerca / PhD/Assegnista/Specializzando member non structured of the research group
Francesca Martella	Componenti strutturati del gruppo di ricerca / Structured participants in the research project
Irene Cozzolino	Dottorando/Assegnista/Specializzando componente non strutturato del gruppo di ricerca / PhD/Assegnista/Specializzando member non structured of the research group
Maurizio Vichi	Componenti strutturati del gruppo di ricerca / Structured participants in the research project
Paolo Giordani	Componenti strutturati del gruppo di ricerca / Structured participants in the research project

Componente	Qualifica	Struttura	Categoria
Giovanni Trovato	Full professor	University of Rome "Tor Vergata"	Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships
Fabrizio Mattesini	Full professor	University of Rome "Tor Vergata"	Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships
Silvia D'Angelo	Postdoc research fellow	University College Dublin. Irlanda	Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships
Michael Fop	Assistant professor	University College Dublin. Irlanda	Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships
Alfonso Russo	Ph.D. student	University of Rome "Tor Vergata"	Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships
Isabella Corazziari	Researcher	Istat	Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships

Abstract

Nowadays, in almost any research field data are inherently complex and high dimensional, due to the increasing availability of information granted by the new technologies. Classical approaches to supervised and unsupervised statistical learning are inadequate and cannot be directly applied to these "modern" data. The inadequateness is due to the increased complexity of:
a) data schemes. The classical data scheme "statistical units x quantitative variables" is often obsolete. Newer and more informative schemes are in use: mixed data, where some variables are quantitative and some other are categorical, matrix data, where the same units (persons or objects) are measured on the same variables in different occasions, multidimensional networks, where multiple relations are observed among a set of units (nodes) etc.;
b) models. New technologies make available a huge amount of features (variables) for the same unit. Classical methods do not allow us to model properly such high dimensional data because they would produce models with a large number of parameters that cannot be efficiently estimated, especially when the number of observations is small compared to the number of parameters and/or variables;
c) algorithms. On high dimensional data, the estimation and/or selection of classical models become computationally infeasible and/or do not give the results in the required time.
The aim of the research group is to work on the aforementioned three sources of complexity proposing new statistical learning methods for unsupervised classification able to:
1) face with new complex data schemes: mixed data, matrix data and multidimensional networks;
2) model parsimoniously. The idea is to introduce new flexible parameterizations that are sufficiently parsimonious and allow us to select the relevant information when the number of variables is very large.
3) reduce the computational complexity of the algorithms making use of new methods of estimation and/or model selection.

ERC

PE1_14, SH1_6, LS2_14

Keywords:

ANALISI STATISTICA DEI DATI, MODELLI STATISTICI, STATISTICA COMPUTAZIONALE, ECONOMETRIA, BIOSTATISTICA