TY - GEN
T1 - Attention Word Embedding
AU - Sonkar, Shashank
AU - Waters, Andrew E.
AU - Baraniuk, Richard G.
N1 - Publisher Copyright:
© 2020 COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Word embedding models learn semantically rich vector representations of words and are widely used to initialize natural processing language (NLP) models. The popular continuous bag-of-words (CBOW) model of word2vec learns a vector embedding by masking a given word in a sentence and then using the other words as a context to predict it. A limitation of CBOW is that it equally weights the context words when making a prediction, which is inefficient, since some words have higher predictive value than others. We tackle this inefficiency by introducing the Attention Word Embedding (AWE) model, which integrates the attention mechanism into the CBOW model. We also propose AWE-S, which incorporates subword information. We demonstrate that AWE and AWE-S outperform the state-of-the-art word embedding models both on a variety of word similarity datasets and when used for initialization of NLP models.
AB - Word embedding models learn semantically rich vector representations of words and are widely used to initialize natural processing language (NLP) models. The popular continuous bag-of-words (CBOW) model of word2vec learns a vector embedding by masking a given word in a sentence and then using the other words as a context to predict it. A limitation of CBOW is that it equally weights the context words when making a prediction, which is inefficient, since some words have higher predictive value than others. We tackle this inefficiency by introducing the Attention Word Embedding (AWE) model, which integrates the attention mechanism into the CBOW model. We also propose AWE-S, which incorporates subword information. We demonstrate that AWE and AWE-S outperform the state-of-the-art word embedding models both on a variety of word similarity datasets and when used for initialization of NLP models.
UR - http://www.scopus.com/inward/record.url?scp=85149661321&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149661321&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85149661321
T3 - COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
SP - 6894
EP - 6902
BT - COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
A2 - Scott, Donia
A2 - Bel, Nuria
A2 - Zong, Chengqing
PB - Association for Computational Linguistics (ACL)
T2 - 28th International Conference on Computational Linguistics, COLING 2020
Y2 - 8 December 2020 through 13 December 2020
ER -