Skip to Main content Skip to Navigation
Conference papers

Distant supervision and noisy label learning for low resource named entity recognition: A study on Hausa and Yorùbá

Abstract : The lack of labeled training data has limited the development of natural language processing tools, such as named entity recognition, for many languages spoken in developing countries. Techniques such as distant and weak supervision can be used to create labeled data in a (semi-) automatic way. Additionally, to alleviate some of the negative effects of the errors in automatic annotation, noise-handling methods can be integrated. Pretrained word embeddings are another key component of most neural named entity classifiers. With the advent of more complex contextual word embeddings, an interesting trade-off between model size and performance arises. While these techniques have been shown to work well in highresource settings, we want to study how they perform in low-resource scenarios. In this work, we perform named entity recognition for Hausa and Yorùbá, two languages that are widely spoken in several developing countries. We evaluate different embedding approaches and show that distant supervision can be successfully leveraged in a realistic low-resource scenario where it can more than double a classifier's performance.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03359111
Contributor : Emmanuel Vincent Connect in order to contact the contributor
Submitted on : Wednesday, September 29, 2021 - 10:18:01 PM
Last modification on : Wednesday, November 3, 2021 - 7:05:34 AM
Long-term archiving on: : Thursday, December 30, 2021 - 7:59:04 PM

File

adelani_AfricaNLP2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03359111, version 1

Collections

Citation

David Ifeoluwa Adelani, Michael A Hedderich, Dawei Zhu, Esther van den Berg, Dietrich Klakow. Distant supervision and noisy label learning for low resource named entity recognition: A study on Hausa and Yorùbá. ICLR Workshops (AfricaNLP & PML4DC 2020), Apr 2020, Addis Ababa, Ethiopia. ⟨hal-03359111⟩

Share

Metrics

Record views

18

Files downloads

82