Nome e qualifica del proponente del progetto: 
sb_p_2624012
Anno: 
2021
Abstract: 

Nowadays a vast amount of complex data, frequently unstructured, are stored and easily accessible.
It is usual to encounter complex data in many fields such as economics, finance, health, social or environmental sciences, to name but a few. One of the main issues for modern statistics is to jointly establish and implement new clustering methods for reducing and synthesize such information. Complex data cannot be fully analysed by the existing standard methodologies.
The research project aims at introducing new clustering methods involving innovative sources of information, such as text or functions.
The main objective is twofold. On the one hand, we will focus on text data. New document clustering techniques, with a fuzzy approach, will be introduced through the use of appropriate dissimilarity measures that take into account the intrinsic nature of the data. In addition, double clustering methods able to jointly partition documents and terms will be studied.
On the other hand, we will analyse functional data. First, (fuzzy) clustering methods for such data will be addressed and then we will focus on clustering in a subspace of reduced dimension to improve the classification performance.
All the proposed techniques will be implemented in (open-source) software macros/libraries to be shared so that practitioners working in the field can freely use them.

ERC: 
PE1_14
PE1_18
SH1_6
Componenti gruppo di ricerca: 
sb_cp_is_3334094
sb_cp_is_3388145
sb_cp_is_3387563
sb_cp_is_3516068
sb_cp_is_3351216
sb_cp_is_3386754
sb_cp_es_462992
sb_cp_es_462993
sb_cp_es_462994
sb_cp_es_462995
sb_cp_es_462996
sb_cp_es_462997
Innovatività: 

Advances in data collection and storage have led to an increasing amount of complex data, such as text, functional or high-dimensional data.
This has placed terms such as "data science" or "big data" into the spotlight in the past few years and has rendered statisticians/data scientists among the most demanded professionals. This area is evolving fast in many seemingly unrelated directions.

Standard clustering methods are not able to handle complex data structures involving text or functional data and cannot be directly applied.
Our research project aims at defining new (fuzzy) clustering methods for the above complex data. On the one hand, we will focus on document clustering and, on the other hand, we will face with data reduction of functional data. Besides the methodological contribution, all the acquired scientific knowledge will be made available to stakeholders and end-users through efficient, free and open-source software (mainly for the R software platform). An online repository to access the methodological developments, data and documented software will be created so that end-users can learn how to apply the results on real data and fully benefit from them.
In addition, the results will be presented at international and national conferences and will be submitted to international statistical journals with impact factor and belonging to the A-class.

The proposed methodological advances will be applied to data coming from economics, health, environmental and social sciences. Besides these domains of application, all those empirical contexts which may be characterized by complex, text or functional, information could benefit from new proposals and new solutions to these problems. For example, as for the case of the current COVID-19 crisis, the written information can provide a better understanding of the problem than the simple numbers. This information could be exploited to create better early-warning indicators.

Furthermore, the project will create a scientific transnational network, composed by our team and other experts from Italy and abroad, with the aim of establishing collaborative framework able to come up with statistically well-founded tools involving complex data, with emphasis to text and functional data. This is coherent also with the Horizon Europe framework program.

Codice Bando: 
2624012

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma