Attention Word Embedding

Shashank Sonkar, Andrew E. Waters, Richard G. Baraniuk

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Word embedding models learn semantically rich vector representations of words and are widely used to initialize natural processing language (NLP) models. The popular continuous bag-of-words (CBOW) model of word2vec learns a vector embedding by masking a given word in a sentence and then using the other words as a context to predict it. A limitation of CBOW is that it equally weights the context words when making a prediction, which is inefficient, since some words have higher predictive value than others. We tackle this inefficiency by introducing the Attention Word Embedding (AWE) model, which integrates the attention mechanism into the CBOW model. We also propose AWE-S, which incorporates subword information. We demonstrate that AWE and AWE-S outperform the state-of-the-art word embedding models both on a variety of word similarity datasets and when used for initialization of NLP models.

Original languageEnglish (US)
Title of host publicationCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
EditorsDonia Scott, Nuria Bel, Chengqing Zong
PublisherAssociation for Computational Linguistics (ACL)
Pages6894-6902
Number of pages9
ISBN (Electronic)9781952148279
StatePublished - 2020
Event28th International Conference on Computational Linguistics, COLING 2020 - Virtual, Online, Spain
Duration: Dec 8 2020Dec 13 2020

Publication series

NameCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference

Conference

Conference28th International Conference on Computational Linguistics, COLING 2020
Country/TerritorySpain
CityVirtual, Online
Period12/8/2012/13/20

ASJC Scopus subject areas

  • Computer Science Applications
  • Computational Theory and Mathematics
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Attention Word Embedding'. Together they form a unique fingerprint.

Cite this