The Knowledge Acquisition Bottleneck Problem in Multilingual Word Sense Disambiguation

2020

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence Survey track.

The Knowledge Acquisition Bottleneck Problem in Multilingual Word Sense Disambiguation

04 Pubblicazione in atti di convegno

Pasini Tommaso

DOI: 10.24963/ijcai.2020/687

Word Sense Disambiguation (WSD) is the task of identifying the meaning of a word in a given context. It lies at the base of Natural Language Processing as it provides semantic information for words. In the last decade, great strides have been made in this field and much effort has been devoted to mitigate the knowledge acquisition bottleneck problem, i.e., the problem of semantically annotating texts at a large scale and in different languages. This issue is ubiquitous in WSD as it hinders the creation of both multilingual knowledge bases and manually-curated training sets. In this work, we first introduce the reader to the task of WSD through a short historical digression and then take the stock of the advancements to alleviate the knowledge acquisition bottleneck problem. In that, we survey the literature on manual, semi-automatic and automatic approaches to create English and multilingual corpora tagged with sense annotations and present a clear overview over supervised models for WSD. Finally, we provide our view over the future directions that we foresee for the field.

survey word sense disambiguation Natural Language Processing Word Sense Distribution