Coder Social home page Coder Social logo

masakhane-pos's Introduction

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

The code is based on HuggingFace implementation (License: Apache 2.0).

The license of the POS dataset is in CC-BY-4.0-NC, the monolingual data have difference licenses depending on the news website license. The monolingual data used for annotation can be found here

Required dependencies

  • python
    • transformers : state-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
    • seqeval : testing framework for sequence labeling.
    • ptvsd : remote debugging server for Python support in Visual Studio and Visual Studio Code.
pip install transformers seqeval ptvsd

If you make use of this dataset, please cite us:

BibTeX entry and citation info

@inproceedings{dione-etal-2023-masakhapos,
    title = "{M}asakha{POS}: Part-of-Speech Tagging for Typologically Diverse {A}frican languages",
    author = "Dione, Cheikh M. Bamba  and Adelani, David Ifeoluwa  and Nabende, Peter  and Alabi, Jesujoba  and Sindane, Thapelo  and Buzaaba, Happy  and Muhammad, Shamsuddeen Hassan  and Emezue, Chris Chinenye  and Ogayo, Perez  and Aremu, Anuoluwapo  and Gitau, Catherine  and Mbaye, Derguene  and Mukiibi, Jonathan  and Sibanda, Blessing  and Dossou, Bonaventure F. P.  and Bukula, Andiswa  and Mabuya, Rooweither  and Tapo, Allahsera Auguste  and Munkoh-Buabeng, Edwin  and Memdjokam Koagne, Victoire  and Ouoba Kabore, Fatoumata  and Taylor, Amelia  and Kalipe, Godson  and Macucwa, Tebogo  and Marivate, Vukosi  and Gwadabe, Tajuddeen  and Elvis, Mboning Tchiaze  and Onyenwe, Ikechukwu  and Atindogbe, Gratien  and Adelani, Tolulope  and Akinade, Idris  and Samuel, Olanrewaju  and Nahimana, Marien  and Musabeyezu, Th{\'e}og{\`e}ne  and Niyomutabazi, Emile  and Chimhenga, Ester  and Gotosa, Kudzai  and Mizha, Patrick  and Agbolo, Apelete  and Traore, Seydou  and Uchechukwu, Chinedu  and Yusuf, Aliyu  and Abdullahi, Muhammad  and Klakow, Dietrich",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.609",
    doi = "10.18653/v1/2023.acl-long.609",
    pages = "10883--10900",
    abstract = "In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages.",
}

masakhane-pos's People

Contributors

dadelani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.