Quasi bidirectional encoder representations from transformers for Word Sense Disambiguation
While contextualized embeddings have produced performance breakthroughs in many Natural Language Processing (NLP) tasks, Word Sense Disambiguation (WSD) has not benefited from them yet. In this paper, we introduce QBERT, a Transformer based architecture for contextualized embeddings which makes use of a coattentive layer to produce more deeply bidirectional representations, better-fitting for the WSD task.