New statistical learning methods for model-based unsupervised classification of complex and high dimensional data
Componente | Categoria |
---|---|
Marco Alfo' | Componenti strutturati del gruppo di ricerca / Structured participants in the research project |
Monia Ranalli | Componenti strutturati del gruppo di ricerca / Structured participants in the research project |
Donatella Vicari | Componenti strutturati del gruppo di ricerca / Structured participants in the research project |
Vittoria Carolina Malpassuti | Dottorando/Assegnista/Specializzando componente non strutturato del gruppo di ricerca / PhD/Assegnista/Specializzando member non structured of the research group |
Francesca Martella | Componenti strutturati del gruppo di ricerca / Structured participants in the research project |
Irene Cozzolino | Dottorando/Assegnista/Specializzando componente non strutturato del gruppo di ricerca / PhD/Assegnista/Specializzando member non structured of the research group |
Maurizio Vichi | Componenti strutturati del gruppo di ricerca / Structured participants in the research project |
Paolo Giordani | Componenti strutturati del gruppo di ricerca / Structured participants in the research project |
Componente | Qualifica | Struttura | Categoria |
---|---|---|---|
Giovanni Trovato | Full professor | University of Rome "Tor Vergata" | Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships |
Fabrizio Mattesini | Full professor | University of Rome "Tor Vergata" | Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships |
Silvia D'Angelo | Postdoc research fellow | University College Dublin. Irlanda | Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships |
Michael Fop | Assistant professor | University College Dublin. Irlanda | Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships |
Alfonso Russo | Ph.D. student | University of Rome "Tor Vergata" | Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships |
Isabella Corazziari | Researcher | Istat | Altro personale aggregato Sapienza o esterni, titolari di borse di studio di ricerca / Other aggregate personnel Sapienza or other institution, holders of research scholarships |
Nowadays, in almost any research field data are inherently complex and high dimensional, due to the increasing availability of information granted by the new technologies. Classical approaches to supervised and unsupervised statistical learning are inadequate and cannot be directly applied to these "modern" data. The inadequateness is due to the increased complexity of:
a) data schemes. The classical data scheme "statistical units x quantitative variables" is often obsolete. Newer and more informative schemes are in use: mixed data, where some variables are quantitative and some other are categorical, matrix data, where the same units (persons or objects) are measured on the same variables in different occasions, multidimensional networks, where multiple relations are observed among a set of units (nodes) etc.;
b) models. New technologies make available a huge amount of features (variables) for the same unit. Classical methods do not allow us to model properly such high dimensional data because they would produce models with a large number of parameters that cannot be efficiently estimated, especially when the number of observations is small compared to the number of parameters and/or variables;
c) algorithms. On high dimensional data, the estimation and/or selection of classical models become computationally infeasible and/or do not give the results in the required time.
The aim of the research group is to work on the aforementioned three sources of complexity proposing new statistical learning methods for unsupervised classification able to:
1) face with new complex data schemes: mixed data, matrix data and multidimensional networks;
2) model parsimoniously. The idea is to introduce new flexible parameterizations that are sufficiently parsimonious and allow us to select the relevant information when the number of variables is very large.
3) reduce the computational complexity of the algorithms making use of new methods of estimation and/or model selection.