New statistical learning methods for model-based unsupervised classification of complex and high dimensional data

Anno
2020
Proponente Roberto Rocci - Professore Ordinario
Sottosettore ERC del proponente del progetto
PE1_14
Componenti gruppo di ricerca
Componente Categoria
Marco Alfo' Componenti strutturati del gruppo di ricerca / Structured participants in the research project
Monia Ranalli Componenti strutturati del gruppo di ricerca / Structured participants in the research project
Donatella Vicari Componenti strutturati del gruppo di ricerca / Structured participants in the research project
Vittoria Carolina Malpassuti Dottorando/Assegnista/Specializzando componente non strutturato del gruppo di ricerca / PhD/Assegnista/Specializzando member non structured of the research group
Francesca Martella Componenti strutturati del gruppo di ricerca / Structured participants in the research project
Irene Cozzolino Dottorando/Assegnista/Specializzando componente non strutturato del gruppo di ricerca / PhD/Assegnista/Specializzando member non structured of the research group
Maurizio Vichi Componenti strutturati del gruppo di ricerca / Structured participants in the research project
Paolo Giordani Componenti strutturati del gruppo di ricerca / Structured participants in the research project
Componente Qualifica Struttura Categoria
Giovanni Trovato Full professor University of Rome "Tor Vergata" Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships
Fabrizio Mattesini Full professor University of Rome "Tor Vergata" Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships
Silvia D'Angelo Postdoc research fellow University College Dublin. Irlanda Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships
Michael Fop Assistant professor University College Dublin. Irlanda Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships
Alfonso Russo Ph.D. student University of Rome "Tor Vergata" Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships
Isabella Corazziari Researcher Istat Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships
Abstract

Nowadays, in almost any research field data are inherently complex and high dimensional, due to the increasing availability of information granted by the new technologies. Classical approaches to supervised and unsupervised statistical learning are inadequate and cannot be directly applied to these "modern" data. The inadequateness is due to the increased complexity of:
a) data schemes. The classical data scheme "statistical units x quantitative variables" is often obsolete. Newer and more informative schemes are in use: mixed data, where some variables are quantitative and some other are categorical, matrix data, where the same units (persons or objects) are measured on the same variables in different occasions, multidimensional networks, where multiple relations are observed among a set of units (nodes) etc.;
b) models. New technologies make available a huge amount of features (variables) for the same unit. Classical methods do not allow us to model properly such high dimensional data because they would produce models with a large number of parameters that cannot be efficiently estimated, especially when the number of observations is small compared to the number of parameters and/or variables;
c) algorithms. On high dimensional data, the estimation and/or selection of classical models become computationally infeasible and/or do not give the results in the required time.
The aim of the research group is to work on the aforementioned three sources of complexity proposing new statistical learning methods for unsupervised classification able to:
1) face with new complex data schemes: mixed data, matrix data and multidimensional networks;
2) model parsimoniously. The idea is to introduce new flexible parameterizations that are sufficiently parsimonious and allow us to select the relevant information when the number of variables is very large.
3) reduce the computational complexity of the algorithms making use of new methods of estimation and/or model selection.

ERC
PE1_14, SH1_6, LS2_14
Keywords:
ANALISI STATISTICA DEI DATI, MODELLI STATISTICI, STATISTICA COMPUTAZIONALE, ECONOMETRIA, BIOSTATISTICA

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma