word sense disambiguation

Game theory meets embeddings: a unified framework for Word Sense Disambiguation

Game-theoretic models, thanks to their intrinsic ability to exploit contextual information, have shown to be particularly suited for the Word Sense Disambiguation task. They represent ambiguous words as the players of a non cooperative game and their senses as the strategies that the players can select in order to play the games. The interaction among the players is modeled with a weighted graph and the payoff as an embedding similarity function, that the players try to maximize.

Huge automatically extracted training sets for multilingual Word Sense Disambiguation

We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation. Our datasets cover all the nouns in the English WordNet and their translations in other languages for a total of millions of sense-tagged sentences. Experiments prove that these corpora can be effectively used as training sets for supervised WSD systems, surpassing the state of the art for low- resourced languages and providing competitive results for English, where manually annotated training sets are accessible.

Two knowledge-based methods for High-Performance Sense Distribution Learning

Knowing the correct distribution of senses within a corpus can potentially boost the performance of Word Sense Disambiguation (WSD) systems by many points. We present two fully automatic and language-independent methods for computing the distribution of senses given a raw corpus of sentences. Intrinsic and extrinsic evaluations show that our methods outperform the current state of the art in sense distribution learning and the strongest baselines for the most frequent sense in multiple languages and on domain-specific test sets.

With More Contexts Comes Better Performance: Contextualized Sense Embeddings for All-Round Word Sense Disambiguation

Contextualized word embeddings have been employed effectively across several tasks in Natural Language Processing, as they have proved to carry useful semantic information. However, it is still hard to link them to structured sources of knowledge. In this paper we present ARES (context-AwaRe Embeddings of Senses), a semi-supervised approach to producing sense embeddings for the lexical meanings within a lexical knowledge base that lie in a space that is comparable to that of contextualized word vectors.

The Knowledge Acquisition Bottleneck Problem in Multilingual Word Sense Disambiguation

Word Sense Disambiguation (WSD) is the task of identifying the meaning of a word in a given context. It lies at the base of Natural Language Processing as it provides semantic information for words. In the last decade, great strides have been made in this field and much effort has been devoted to mitigate the knowledge acquisition bottleneck problem, i.e., the problem of semantically annotating texts at a large scale and in different languages. This issue is ubiquitous in WSD as it hinders the creation of both multilingual knowledge bases and manually-curated training sets.

CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages

Knowing the Most Frequent Sense (MFS) of a word has been proved to help Word Sense Disambiguation (WSD) models significantly. However, the scarcity of sense-annotated data makes it difficult to induce a reliable and high-coverage distribution of the meanings in a language vocabulary. To address this issue, in this paper we present CluBERT, an automatic and multilingual approach for inducing the distributions of word senses from a corpus of raw sentences.

SenseDefs: a multilingual corpus of semantically annotated textual definitions: exploiting multiple languages and resources jointly for high-quality Word Sense Disambiguation and Entity Linking

Definitional knowledge has proved to be essential in various Natural Language Processing tasks and applications, especially when information at the level of word senses is exploited. However, the few sense-annotated corpora of textual definitions available to date are of limited size: this is mainly due to the expensive and time-consuming process of annotating a wide variety of word senses and entity mentions at a reasonably high scale.

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma