Natural Language Processing

InVeRo: Making Semantic Role Labeling Accessible with Intelligible Verbs and Roles

Semantic Role Labeling (SRL) is deeply dependent on complex linguistic resources and sophisticated neural models, which makes the task difficult to approach for non-experts. To address this issue we present a new platform named Intelligible Verbs and Roles (InVeRo). This platform provides access to a new verb resource, VerbAtlas, and a state-of-the-art pre-trained implementation of a neural, span-based architecture for SRL.

Quasi bidirectional encoder representations from transformers for Word Sense Disambiguation

While contextualized embeddings have produced performance breakthroughs in many Natural Language Processing (NLP) tasks, Word Sense Disambiguation (WSD) has not benefited from them yet. In this paper, we introduce QBERT, a Transformer based architecture for contextualized embeddings which makes use of a coattentive layer to produce more deeply bidirectional representations, better-fitting for the WSD task.

Generationary or “How We Went beyond Word Sense Inventories and Learned to Gloss”

Mainstream computational lexical semantics embraces the assumption that word senses can be represented as discrete items of a predefined inventory. In this paper we show this needs not be the case, and propose a unified model that is able to produce contextually appropriate definitions. In our model, Generationary, we employ a novel span-based encoding scheme which we use to fine-tune an English pre-trained Encoder-Decoder system to generate glosses.

Breaking Through the 80% Glass Ceiling: Raising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information

Neural architectures are the current state of the art in Word Sense Disambiguation (WSD). However, they make limited use of the vast amount of relational information encoded in Lexical Knowledge Bases (LKB). We present Enhanced WSD Integrating Synset Embeddings and Relations (EWISER), a neural supervised architecture that is able to tap into this wealth of knowledge by embedding information from the LKB graph within the neural architecture, and to exploit pretrained synset embeddings, enabling the network to predict synsets that are not in the training set.

Personalized PageRank with Syntagmatic Information for Multilingual Word Sense Disambiguation

Exploiting syntagmatic information is an encouraging research focus to be pursued in an effort to close the gap between knowledge-based and supervised Word Sense Disambiguation (WSD) performance. We follow this direction in our next-generation knowledge-based WSD system, SyntagRank, which we make available via a Web interface and a RESTful API. SyntagRank leverages the disambiguated pairs of co-occurring words included in SyntagNet, a lexical-semantic combination resource, to perform state-of-the-art knowledge-based WSD in a multilingual setting.

Extracting declarative process models from natural language

Process models are an important means to capture information on organizational operations and often represent the starting point for process analysis and improvement. Since the manual elicitation and creation of process models is a time-intensive endeavor, a variety of techniques have been developed that automatically derive process models from textual process descriptions. However, these techniques, so far, only focus on the extraction of traditional, imperative process models.

An ecology-based index for text embedding and classification

Natural language processing and text mining applications have gained a growing attention and diffusion in the computer science and machine learning communities. In this work, a new embedding scheme is proposed for solving text classification problems. The embedding scheme relies on a statistical assessment of relevant words within a corpus using a compound index originally proposed in ecology: this allows to spot relevant parts of the overall text (e.g., words) on the top of which the embedding is performed following a Granular Computing approach.

An infoveillance system for detecting and tracking relevant topics from italian tweets during the COVID-19 event

The year 2020 opened with a dramatic epidemic caused by a new species of coronavirus that soon has been declared a pandemic by the WHO due to the high number of deaths and the critical mass of worldwide hospitalized patients, of order of millions. The COVID-19 pandemic has forced the governments of hundreds of countries to apply several heavy restrictions in the citizens’ socio-economic life. Italy was one of the most affected countries with long-term restrictions, impacting the socio-economic tissue.

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma