Anno: 
2018
Nome e qualifica del proponente del progetto: 
sb_p_941364
Abstract: 

DSS4FI aims at developing and implementing a methodology and a toolchain to support policy makers in identifying core technologies, emerging research trends, novel concepts and policy needs for the establishment of a European-led Open Internet initiative. To ensure accurate forecasting of key future technologies, DSS4FI will leverage on automatic support tools to be implemented by:
a) an optimized toolchain of novel and state-of-the-art technologies in text mining, topic detection and cognitive analysis, for automatically extracting, modeling and analyzing emerging research and technology trends from a variety of web sources;
b) a validation interface to assess the outcomes from point a) above by internal and external experts.

DSS4FI will dynamically extract information from a number of unstructured and structured sources with the purpose of informing the stakeholders (domain experts, policy-makers and economists) on relevant emerging research trends, open problems, challenges, needs and technologies in the field of the NGI. The results will be available through a Visual Analytics Interface, to support the users in answering tactical and strategic questions, like:
* What are the emerging research trends?
* Is a given research topic likely to grow, decline, stay stable or split?

The decsion support system will also help answering operational questions. Examples are:
* Where exactly are the main open problems that, if solved, could significantly impact on industry and society?
* How can policy-makers foster progress in these research areas?
* Are there research communities that work on similar or complementary topics and could profitably cooperate?
* More in general, which support actions are needed?

ERC: 
PE6_2
PE6_7
PE6_11
Innovatività: 

Forecasting the future applicability and impact of new studies and topics in research networks (RNs) contributes to a thorough accurate identification of key future technologies.
The proposed research can be framed in the area of topic prediction and diffusion, which, in turn, is strictly reliant on two sub-processes: topic detection and topic propagation. Concerning topic detection, other than keyword extraction techniques, there are two very popular techniques: probabilistic latent semantic indexing (PLSI) [1] and latent Dirichlet allocation (LDA) [2]. PLSI defines a robust generative model that associates the probability of each co-occurrence (w, d), where w is a word and d a document, as a mixture of independent multinomial distributions. Nevertheless, because the number of topics in PLSI is a hyperparameter, this strategy can not be used for real-time inference. LDA is a generative model that relates words and messages through latent topics. Each document is a mixture of topics where they are characterized as Dirichlet distributions over words. LDA has also been applied to the specific task of topic detection in research communities [3]. Very recently, deep methods have been used to enhance the quality of latent models. For example, LDA2vec [4] learns dense word vectors jointly with Dirichlet-distributed latent document-level mixtures of topic vectors.
Standard topic models such as PLSI and LDA are the pioneers of offline topic learning from a text corpus. To model the dynamics of topics of the occurrence and decay of topics in social networks, several models have been introduced, among which, temporal mining algorithms, such as SAX* [5]. In SAX* sliding temporal windows are used to detect co-occurring clusters of tokens with the same or similar shape. Shapes are represented by discretized strings extracted from temporal signals using Symbolic Aggregate Approximation.

Once a mechanism to detect and extract topics has been designed, the subsequent task is to explain and predict the diffusion of a topic in the network. In the literature two categories of information diffusion models are described: explanatory and predictive models [6]. The goal of explanatory models is to determine the underlying spreading cascade given a particular activation sequence, while predictive models focus on predicting how a diffusion process would unfold in a given network, by learning from the past. Alternatives to information diffusion approaches include community detection algorithms. Among the many proposed methods, Wang et al. [7] recently proposed to combine the structural perturbation similarity with the resource allocation index to perform an evolutionary community detection.

Building on, and going beyond, state of the art methodologies, in this research we propose a multilayer graph framework to model topic dissemination between researchers. Differently from previous approaches, our aim is to obtain both high quality topics and an effective prediction of their future impact, through the joint application of a deep semantic topic extraction techniques and a topic propagation methodology based on a graph multi-layer structure.

The application of this methodology is made possible by the preliminary acquisition of a large-scale research networks (RN) with rich information on paper abstracts and keywords, co-authors and co-citations, impact factors, and publication sources. The acquisition of this dataset from Sematic Scholar, DBLP and other web resources will be per-se a relevant result of the project.

Finally, accepting an observation of the original H2020 project's reviewers, specific attention will be paid to the usability of the tool, through the use of state-of-the-art visualization interfaces for an effective exploitation of the system¿s results by human experts.

References

[1] Thomas Hofmann. Probabilistic latent semantic indexing. In 22nd ACM SIGIR 1999.
[2] David Blei, Andrew Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 2003.
[3] Binling Nie and Shouqian Sun, Using Text Mining Techniques to Identify Research Trends: A Case Study of Design Research, Applied Sciences 2017
[4] Moody, Christopher. 2016. ¿lda2vec: introducing a new hybrid algorithm¿. http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing...
[5] G. Stilo and P. Velardi "Efficient Temporal Mining of Micro-blog Texts and its application to Event Discovery", Data Mining and Knowledge Discovery 2016
[6] Adrien Guille, Hakim Hacid, Cecile Favre, and Djamel A. Zighed. Information diffusion in online socinetworks: A survey. SIGMOD Rec., July 2013.
[7] Peizhuo Wang, Lin Gao, and Xiaoke Ma. Dynamic community detection based on network structural perturbation and topological similarity. Journal of Statistical Mechanics: Theory and Experiment, 2017

Codice Bando: 
941364

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma