Coder Social home page Coder Social logo

mun3 / unimorph_inflect Goto Github PK

View Code? Open in Web Editor NEW

This project forked from antonisa/unimorph_inflect

0.0 0.0 0.0 129.04 MB

A python library for easily querying morphological inflection models trained on Unimorph

License: Apache License 2.0

Python 100.00%

unimorph_inflect's Introduction

Unimorph_Inflect: A Python NLP Library for Generating Morphological Inflection in Many Human Languages

Setup

Unimorph_inflect supports Python 3.7. We strongly recommend that you install Unimorph_inflect from source following the steps below. A PyPI release is forthcoming.

Install from source from this git repository will give you more flexibility in developing on top of unimorph_inflect and training your own models. For this option, run

git clone https://github.com/antonisa/unimorph_inflect.git
cd unimorph_inflect
python setup.py install
#pip install -e .

Getting Started

You can get started by simply following these steps in your Python interactive interpreter:

>>> import unimorph_inflect
>>> unimorph_inflect.download('eng')   # This downloads the English models, if you don't have them already
>>>
>>> from unimorph_inflect import inflect
>>> result = inflect("laugh", "V;PST", language='eng')
>>> print(result[0])
laughed

Note: inflect() returns a list of outputs, hence the "[0]") there

You don't really need to explicitly download each dataset (as shown in the second line above); the inflect() function will ask you about downloading the model for a language if it is not downloaded already.

Trained Models for unimorph_inflect

We currently provide models trained on all Unimorph data (except 1000 examples used as a development set) for some high-resource languages, trained in a monolingual setting.

You can list the available languages/models with:

>>> unimorph_inflect.supported_languages
['ady', 'ang', 'ast', 'bel', 'bul', 'cat', 'dan', ...]

The accuracy on the development sets are as follows:

Language ISO Supported PoS Dev Accuracy
Adyghe ady N, ADJ 90.0
Ancient Greek grc N, ADJ 89.0
Armenian hye V, N, ADJ 98.9
Albanian sqi V, N, 69.0
Asturian ast V, N, ADJ 99.0
Arabic ara V, N, ADJ 23.0
Bashkir bak N, ADJ 81.0
Basque eus V 48.0
Belarusian bel V, N, ADJ 91.0
Bulgarian bul V, N, ADJ 99.0
Catalan cat V 100
Czech ces V, N, ADJ 94.0
Danish dan N, ADJ 82.0
Dutch nld N, ADJ 98.0
English eng V 97.0
Estonian est V, N 84.0
Faroese fao V, N, ADJ 95.0
Farsi fas V 93.0
French fra V 97.0
Galician gal V, 100
German deu V, N 100
Georgian kat V, N, ADJ 100
Greek ell V, N, ADJ 84.0
Hebrew heb V, N 90.0
Hindi hin V 78.0
Hungarian hun V, N 97.2
Irish gle V 85.6
Icelandic isl V, N 93.0
Italian ita V 99.2
Latvian lav V, N, ADJ 99.0
Lithuanian lit V, N, ADJ 96.0
Lower Sorbian dsb V, N, ADJ 94.0
Makedonian mkd V, N, ADJ 100
Navajo nav N, ADJ 90.0
North Sami sme V, N, ADJ 95.0
Norwegian Bokmål nob V, N, ADJ 77.0
Old English ang V, N, ADJ 84.0
Old Saxon osx V, N, ADJ 93.0
Polish pol V, N, ADJ 95.0
Portuguese por V, N, ADJ 100
Quechua que V, N, ADJ 32.0
Romanian ron V, N, ADJ 83.0
Russian rus V, N, ADJ 94.0
Sanskrit san N, ADJ 79.0
Serbocroatian hbs V, N, ADJ 92.7
Slovenian slv V, N, ADJ 97.0
Spanish spa V 100
Swahili swc V, N, ADJ 66.0
Swedish swe V, N, ADJ 96.0
Turkish tur V, N, ADJ 84.2
Ukranian ukr V, N, ADJ 97.0
Urdu urd V, N 71.0
Welsh cym V 97.0
Venetian vec V 98.0
Zulu zul V, N, ADJ 87.0

A simple call of the inflect function with your desired language should download the necessary models, but you can also download them from here.

References

If you use our models in your research, please cite our EMNLP 2019 paper along with the necessary Unimorph datasets:

@inproceedings{anastasopoulos19emnlp,
    title = {Pushing the Limits of Low-Resource Morphological Inflection},
    author = {Anastasopoulos, Antonios and Neubig, Graham},
    booktitle = {Proc. EMNLP},
    address = {Hong Kong},
    month = {November},
    year = {2019},
}

This release is not the same as CMU's SIGMORPHON 2019 Shared Task system. The system is a cleaned up version of the shared task code and the models are trained on almost all Unimorph data for each language, whereas in the competition we used the designated datasets.

Issues and Usage Q&A

Please use the GitHub Issue Tracker for bug reports, language/feature requests, and other questions.

LICENSE

Unimorph_inflect is released under the Apache License, Version 2.0. See the LICENSE file for more details.

unimorph_inflect's People

Contributors

antonisa avatar gpetho avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.