MasakhaPOS: Part-of-Speech Tagging for 20 African Languages
The code is based on HuggingFace implementation (License: Apache 2.0).
The license of the POS dataset is in CC-BY-4.0-NC, the monolingual data have difference licenses depending on the news website license.
- python
- transformers : state-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
- seqeval : testing framework for sequence labeling.
- ptvsd : remote debugging server for Python support in Visual Studio and Visual Studio Code.
pip install transformers seqeval ptvsd
If you make use of this dataset, please cite us:
@inproceedings{Dione2023MasakhaPOSPT,
title={MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages},
author={Cheikh M. Bamba Dione and David Adelani and Peter Nabende and Jesujoba Alabi and Thapelo Sindane and Happy Buzaaba and Shamsuddeen Hassan Muhammad and Chris Chinenye Emezue and Perez Ogayo and Anuoluwapo Aremu and Catherine Gitau and Derguene Mbaye and Jonathan Mukiibi and Blessing Sibanda and Bonaventure F. P. Dossou and Andiswa Bukula and Rooweither Mabuya and Allahsera Auguste Tapo and Edwin Munkoh-Buabeng and victoire Memdjokam Koagne and Fatoumata Ouoba Kabore and Amelia Taylor and Godson Kalipe and Tebogo Macucwa and Vukosi Marivate and Tajuddeen Gwadabe and Mboning Tchiaze Elvis and Ikechukwu Onyenwe and Gratien Atindogbe and Tolulope Adelani and Idris Akinade and Olanrewaju Samuel and Marien Nahimana and Th'eogene Musabeyezu and Emile Niyomutabazi and Ester Chimhenga and Kudzai Gotosa and Patrick Mizha and Apelete Agbolo and Seydou Traore and Chinedu Uchechukwu and Aliyu Yusuf and Muhammad Abdullahi and Dietrich Klakow},
year={2023}
}