Ricerc@Sapienza

Developing State-of-the-Art Natural Language Processing systems for Semantics-first Multilingual Text Understanding

Anno

2021

Proponente Simone Conia - Ricercatore

Struttura

DIPARTIMENTO DI INGEGNERIA INFORMATICA, AUTOMATICA E GESTIONALE -ANTONIO RUBERTI-

Sottosettore ERC del proponente del progetto

PE6_9

Componenti gruppo di ricerca

Componente	Categoria
Roberto Navigli	Aggiungi Tutor di riferimento (Professore o Ricercatore afferente allo stesso Dipartimento del Proponente)

Abstract

Over the recent years, Natural Language Processing (NLP) has witnessed a tremendous growth, especially thanks to the advent of modern language models such as ELMo (Peters et al., 2018), BERT (Devlin et al., 2019), XLM-RoBERTa (Conneau et al. 2020). Such language models are so effective that have now become the de facto input representation method in every technique presented at top-tier conferences and journals for NLP.

Among the fields that benefited from the knowledge captured by these language models, some of the most important are Word Sense Disambiguation (WSD) and Semantic Role Labeling (SRL), two tasks that are often considered to be fundamental to enable Natural Language Understanding, that is, the ability of machines not only to read but also to understand text (Navigli, 2018). In particular, WSD is the lexical-level task of understanding the meaning of a word in context, whereas SRL is the sentence-level task of understanding the semantic structure of a text. Together, these two tasks can provide semantically-rich information about documents, aiding downstream tasks such as Question Answering, Information Retrieval, Machine Translation, inter alia.

However, while the research community has made great efforts to propose better and better Deep Learning systems, their intrinsic complexity is also an entry barrier that makes such systems difficult to understand and use for end users. Indeed, research systems are often far from being ready to use out-of-the-box as they have to be repackaged and inserted into long pipelines. This means that end users are actually discouraged to use high-performing WSD and SRL systems, hindering their diffusion in real-word applications.

In this project, we aim at developing state-of-the-art tools for WSD and SRL that are i) easy to use out-of-the-box, ii) always available online, and iii) state-of-the-art.

ERC

PE6_9, PE6_7, PE6_11

Keywords:

INTELLIGENZA ARTIFICIALE, LINGUISTICA COMPUTAZIONALE, APPRENDIMENTO AUTOMATICO