TY - GEN
T1 - MANER
T2 - 4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023
AU - Sonkar, Shashank
AU - Wang, Zichao
AU - Baraniuk, Richard G.
N1 - Funding Information:
This work was supported by NSF grants 1842378, ONR grant N0014-20-1-2534, AFOSR grant FA9550-22-1-0060, and a Vannevar Bush Faculty Fellowship, ONR grant N00014-18-1-2047.
Publisher Copyright:
© 2023 Proceedings of the Annual Meeting of the Association for Computational Linguistics. All rights reserved.
PY - 2023
Y1 - 2023
N2 - This paper investigates the problem of Named Entity Recognition (NER) for extreme lowresource languages with only a few hundred tagged data samples. A critical enabler of most of the progress in NER is the readily available, large-scale training data for languages such as English and French. However, NER for lowresource languages remains relatively underexplored, leaving much room for improvement. We propose Mask Augmented Named Entity Recognition (MANER), a simple yet effective method that leverages the distributional hypothesis of pre-trained masked language models (MLMs) to improve NER performance for lowresource languages significantly. MANER repurposes the [mask] token in MLMs, which encodes valuable semantic contextual information, for NER prediction. Specifically, we prepend a [mask] token to every word in a sentence and predict the named entity for each word from its preceding [mask] token. We demonstrate that MANER is well-suited for NER in low-resource languages; our experiments show that for 100 languages with as few as 100 training examples, it improves on the state-of-the-art by up to 48% and by 12% on average on F1 score. We also perform detailed analyses and ablation studies to understand the scenarios that are best suited to MANER.
AB - This paper investigates the problem of Named Entity Recognition (NER) for extreme lowresource languages with only a few hundred tagged data samples. A critical enabler of most of the progress in NER is the readily available, large-scale training data for languages such as English and French. However, NER for lowresource languages remains relatively underexplored, leaving much room for improvement. We propose Mask Augmented Named Entity Recognition (MANER), a simple yet effective method that leverages the distributional hypothesis of pre-trained masked language models (MLMs) to improve NER performance for lowresource languages significantly. MANER repurposes the [mask] token in MLMs, which encodes valuable semantic contextual information, for NER prediction. Specifically, we prepend a [mask] token to every word in a sentence and predict the named entity for each word from its preceding [mask] token. We demonstrate that MANER is well-suited for NER in low-resource languages; our experiments show that for 100 languages with as few as 100 training examples, it improves on the state-of-the-art by up to 48% and by 12% on average on F1 score. We also perform detailed analyses and ablation studies to understand the scenarios that are best suited to MANER.
UR - http://www.scopus.com/inward/record.url?scp=85175852687&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85175852687&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85175852687
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 219
EP - 226
BT - 4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023 - Proceedings of the Workshop
A2 - Moosavi, Nafise Sadat
A2 - Gurevych, Iryna
A2 - Hou, Yufang
A2 - Kim, Gyuwan
A2 - Young, Jin Kim
A2 - Schuster, Tal
A2 - Agrawal, Ameeta
PB - Association for Computational Linguistics (ACL)
Y2 - 13 July 2023
ER -