Anno: 
2017
Nome e qualifica del proponente del progetto: 
sb_p_500826
Abstract: 

After Chris Anderson's statement: "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete" published in Wired in 2008, a big discussion arose about the role of modelling schemes in the digital era. The Big Data paradigm, with its overwhelming impact on technology and science, proposes, in some sense, a purely inductive alternative to the physical, model-based, description of reality. It is thus very natural and important to raise the question, about the limits of such a description. In what circumstances can one learn (predict, extract-features, etc.) efficiently from the data without the use of models, theories or hierarchical hypotheses? What is behind the apparent success of tools like deep learning and what is its link with well-known theoretical tools, e.g., the renormalisation group? It is even more important to raise the question about the positive synergies that theoretical schemes and data can jointly trigger. The notable examples of weather forecast and epidemic spreading have proved that suitable data-driven computational schemes can effectively tame the high dimensionality embedding complex phenomena. Still the whole matter is far from being settled. This project aims at addressing this set of problems by blending in a unique effort several tools and approaches: dynamical systems and information theory, neural networks and machine learning approaches, data-driven modelling schemes. Several case studies will be considered in several areas: e.g., modelling and predicting social dynamics (opinions, mobility, information dynamics), statistical mechanics in "non standard" situations, i.e., systems far from equilibrium and/or without an Hamiltonian structure, modelling innovation dynamics, textual analysis and classification, extraction of features from images, etc.

Componenti gruppo di ricerca: 
sb_cp_is_626497
sb_cp_is_622087
sb_cp_is_621703
sb_cp_is_621247
sb_cp_is_623694
sb_cp_is_624808
sb_cp_es_107533
sb_cp_es_107534
sb_cp_es_107535
sb_cp_es_107536
sb_cp_es_107537
sb_cp_es_107538
sb_cp_es_107539
sb_cp_es_107540
sb_cp_es_107541
Innovatività: 

(WP1) Interplay of theoretical modelling and inference methods

Statistical models trained on data often perform surprisingly well, in the sense that they are able to "generalise well", i.e., to make sufficiently accurate prediction on yet unseen data, and they are able to do this without a priori knowledge about the fundamental laws governing the studied systems, eventually inferring them, at some level of description, as a byproduct of the inference process (Carrasquilla et al., Nat. Phys. 13, 431, 2017; MacKay, Cambridge Univ. Press, 2003). Models, instead, provide a description of a particular set of experimental observations, relying on some (postulated) assumptions on the fundamental laws governing the observed data, of which they are a particular realisation. In this perspective, the big data deluge opened an unprecedented opportunity to close a virtuous loop between data, information extracted from inference methods, and modelling. Here we aim at clarifying the role that models have in the interpretation of data and at finding a rationale for the effectiveness of machine learning techniques (deep learning, in particular), eventually improving the state of the art of inference methods. To this end several techniques will be adopted from the statistical mechanics of disordered systems (e.g., replica method, stochastic stability, cavity fields, message passing, etc.).

More specifically about the case studies:

[IMAGES] The potential role and existence of two-distance correlations in the set of attractive faces and the question of dimensionality reduction have been already discussed [Pallet et al 2010, Eisenthal et al 2006]. However, a systematic study of these issues from a rigorous, information-theoretical point of view, is still lacking. Here we aim at inferring, the "rules" determining the perceived facial beauty. The inference process will be performed through neural networks that (i) automatically extract the relevant features for the evaluation of attractiveness (and the properties of attractive faces in terms of such features) and (ii) infer the number n of relevant n-distance interactions and to estimate the amount of complexity contained in the data that cannot be captured in terms of the selected variables.

[TEXTS] In the area of text analysis we aim at inferring the basic principles underlying the construction of texts and more generally of discrete sequences of characters. Through a systematic comparison between more traditional text analysis tools (bag of words, n-grams, etc.), agnostic tools as data compression techniques (whose level of understanding is already very good) and pure machine learning tools, the projects aims at raising the bar of the understanding of these phenomena to further improve language related applications.

[BIOLOGICAL SEQUENCES] This research would lead to the development of an optimisation algorithm that allows to extract relevant variables from datasets describing problems in which cell-to-cell variability of gene expression is a prominent feature. This method could become a standard tool of Bioinformatics analysis of RNA-sequencing data, but it would be easily extended to the analysis of datasets related to completely different fields of research. Furthermore, the juxtaposition of the theoretical investigation could elucidate the causes of the variability observed in the data at a network level, giving an additional tool to understand the eziology of complex diseases.

(WP2) Data-grounded modelling of complex dynamical processes.

As already mentioned above one of the great challenges of our era is that of acquiring a satisfactory description and prediction power for social systems. The possibility to combine in a unique framework clever modelling techniques with cutting edge inference tools has to potential to unfold the dynamics of social systems: from information dynamics (information pollution, misinformation, opinion dynamics, confirmation bias and echo chambers) to urban dynamics (mobility, accessibility, exclusion, etc.) to planetary scale challenges (climate change, security, financial crisis, etc.). Here it will interesting to combine already existing datasets about human activities with suitably conceived games/experiments (through the www.xtribe.eu platform, conceived and deployed within the Physics Dept.).

(WP3) Modelling out-of-equilibrium physical systems. We shall build and investigate models for non equilibrium statistical mechanics going beyond "stochastic thermodynamics", e.g., the dynamics of heavy particle in cases with negative temperature or systems with non trivial diffusive behaviour. The main ingredients adopted, beyond numerical computations, will be: * The analysis of experimental works (e.g., S. Braun et al. Science 339, 52, 2013) for the selection of the proper model; * Langevin equations obtained by means of kinetic theory considerations, in the spirit of the Smoluchowski approach.

Codice Bando: 
500826
Keywords: 

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma