Coder Social home page Coder Social logo

apertium-tyv's Introduction

Tuvan

                            apertium-tyv
===============================================================================

This is an Apertium monolingual language package for Tuvan. What
you can use this language package for:

* Morphological analysis of Tuvan
* Morphological generation of Tuvan
* Part-of-speech tagging of Tuvan

Requirements
===============================================================================

You will need the following software installed:

* lttoolbox (>= 3.3.0)
* apertium (>= 3.3.0)
* vislcg3 (>= 0.9.9.10297)
* hfst (>= 3.8.2)

If this does not make any sense, we recommend you look at: apertium.org

Compiling
===============================================================================

Given the requirements being installed, you should be able to just run:

$ ./configure
$ make

You can use ./autogen.sh instead of ./configure if you're compiling
from SVN.

If you're doing development, you don't have to install the data, you
can use it directly from this directory.

If you are installing this language package as a prerequisite for an
Apertium translation pair, then do (typically as root / with sudo):

# make install

You can give a --prefix to ./configure to install as a non-root user,
but make sure to use the same prefix when installing the translation
pair and any other language packages.

Testing
===============================================================================

If you are in the source directory after running make, the following
commands should work:

$  echo "TODO: test sentence" | apertium -d . tyv-morph
TODO: test analysis result

$ echo "TODO: test sentence" | apertium -d . tyv-tagger
TODO: test tagger result

Files and data
===============================================================================

* apertium-tyv.tyv.dix           - Monolingual dictionary
* apertium-tyv.tyv.lexc          - Morphotactic dictionary
* apertium-tyv.tyv.twol          - Morphophonological rules
* apertium-tyv.tyv.rlx           - Constraint Grammar disambiguation rules
* apertium-tyv.post-tyv.dix      - Post-generator
* tyv.prob                       - Tagger model
* modes.xml                      - Translation modes

For more information
===============================================================================

* https://wiki.apertium.org/wiki/Installation
* https://wiki.apertium.org/wiki/apertium-tyv
* https://wiki.apertium.org/wiki/Using_an_lttoolbox_dictionary

Help and support
===============================================================================

If you need help using this language pair or data, you can contact:

* Mailing list: [email protected]
* IRC: #apertium on irc.oftc.net

See also the file AUTHORS included in this distribution.

apertium-tyv's People

Contributors

ftyers avatar ilnarselimcan avatar jonorthwash avatar mr-martian avatar sushain97 avatar tinodidriksen avatar unhammer avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

tarbagan msalchak

apertium-tyv's Issues

More mismatches and missing inflections

Two more forms in Iskhakov & Pal'mbakh are not being generated:

  • Прошедшее повествовательное время на -п-тыр, прошедшее историческое/заглазное/неожиданное, эрткен үэниң медээ хевири (I&P 373). The book says it's a past tense used to describe a sudden occurrence.

    кээп-тир мен
    кээп-тир сен
    кээп-тир
    кээп-тир бис
    кээп-тир силер
    кээп-тирлер
    

    Without the hyphen, the analyzer can parse the кээптир мен as кел<v><iv><perf><aor><p1><sg>, but it
    generates келиптир мен for that form.

  • Прошедшее-настоящее время на -пышаан (I&P 379). Looks like it denotes an action that started in the past and is still going on, Anderson & Harrison annotate is as durative.

    келбишаан мен
    келбишаан сен
    келбишаан
    келбишаан бис
    келбишаан силер
    келбишааннар
    

    There is an analysis for келбишаан as a verbal adverb though: кел<v><iv><gna_still>. Are these forms
    considered analytic?

Installed modes are missing files

modes.xml includes some modes with install="yes", but the required
files aren't installed.

Some generic suggestions:

  • -lexc and -twol modes probably aren't useful to users

  • -spell modes should depend on --enable-ospell

  • .deps files are never installed, so any modes using them shouldn't be
    installed.

  • Messages for package app-dicts/apertium-tyv-9999:

  • Failed to find '/usr/share/apertium/apertium-tyv/.deps/tyv.twol.hfst' in install image.

  • QA: missing files required for mode tyv-twol.

  • Failed to find '/usr/share/apertium/apertium-tyv/.deps/tyv.LR.lexc.hfst' in install image.

  • QA: missing files required for mode tyv-lexc.

  • Failed to find '/usr/share/apertium/apertium-tyv/tyv.zhfst' in install image.

  • QA: missing files required for mode tyv-spell.

  • Failed to find '/usr/share/apertium/apertium-tyv/.deps/acceptor.default.hfst' in install image.

  • QA: missing files required for mode tyv-tokenise.

чор + -Iр form

Opening an issue per @ftyers' request.

чор:чор<v><iv><aor><p3><sg>

Is this inflection correct, or should it be чоор?

Double <perf> tag

In my generated paradigm for кел, some forms have double <perf> tag:

келиптипкен:кел<v><iv><perf><perf><ger_perf><nom>
келивитпепкен:кел<v><iv><perf><neg><perf><ger_perf><nom>
келиптипкеш:кел<v><iv><perf><perf><gna_perf>
келивитпепкеш:кел<v><iv><perf><neg><perf><gna_perf>
келиптиптер:кел<v><iv><perf><perf><p3><sg>
келивитпептер:кел<v><iv><perf><neg><perf><p3><sg>

plus all of their inflections by case, person, etc. Looks like a double -{I}pt{I}, not sure if Tuvan allows this.

Possible inflection errors

I've been comparing Apertium-generated paradigms with the ones in Iskhakov & Pal'mbakh 1961 grammar book (Ф. Г. Исхаков, А. А. Пальмбах. Грамматика тувинского языка: Фонетика и морфология.) and found some mismatches.
Disclaimer: I am not a speaker of Tuvan.

  1. Some Apertium-generated imperative forms for кел:

    келеалыңар:кел<v><iv><imp><p1><pl>
    келейн:кел<v><iv><imp><p1><sg>
    келеалы:кел<v><iv><imp><p1><du>
    

    I&P book has келиилиңер, келийн, келиили respectively (pp. 391-392).

  2. Some <p3><pl> forms have a double -лер. I haven't seen this in the literature and it looked suspicious.

    келдилер:кел<v><iv><ifi><p3><pl>
    келдилерлер:кел<v><iv><ifi><p3><pl>
    

    I&P has келдилер for this analysis (I&P 365), and Harrison, 2000 has keldi(ler). The same pattern in other tenses:

    келгеннер:кел<v><iv><ger_past><nom>+э<cop><aor><p3><pl>
    келгеннерлер:кел<v><iv><ger_past><nom>+э<cop><aor><p3><pl>
    келгендирлер:кел<v><iv><ger_past><nom>+э<cop><aor><evid><p3><pl>
    келгендирлерлер:кел<v><iv><ger_past><nom>+э<cop><aor><evid><p3><pl>
    ...
    

Generator errors identified through the shared task

This is a list of generator errors that Aziyana Bayyr-ool identified while working on the error analysis for the shared task.

Incorrect inflections:

Generated form Correct form
ижиарлар:ижик<v><TD><aor><p3><pl> ижигерлер
көрдүнүүлү:көрдүн<v><iv><imp><p1><du> көрдүнээли
садырлар:сад<v><tv><aor><p3><pl> садарлар
садыылы:сад<v><tv><imp><p1><du> садаалы
тырылыйн:тырыл<v><TD><imp><p1><sg> тырлыйн
тырылырлар:тырыл<v><TD><aor><p3><pl> тырлырлар
ужуаалы:ужук<v><TD><imp><p1><du> ужаалы
холужуптур бис:холуш<v><iv><perf><aor><p1><pl> холужуптар бис
холужуптурлар:холуш<v><iv><perf><aor><p3><pl> холужуптарлар
хоорулур:хоорул<v><iv><aor><p3><sg> хоорлур
хоорулур мен:хоорул<v><iv><aor><p1><sg> хоорлур мен
хоорулур сен:хоорул<v><iv><aor><p2><sg> хоорлур сен
чыглыңар:чыыл<v><iv><imp><p2><pl> чыглыылыӊар (see Note below)
шымыныр силер:шымын<v><TD><aor><p2><pl> шымныр силер
мөгеейн:мөгей<v><iv><imp><p1><sg> мөгейээйн (rare/unusual)
мөгееалы:мөгей<v><iv><imp><p1><du> мөгейээли (rare/unusual)

Note: чыглыңар:чыыл<v><iv><imp><p2><pl>: Aziyana says this form exists (meaning 'вы собирайтесь') but does not correspond to this lemma. The correct form for чыыл should be чыглыылыӊар ('давайте соберемся').

Incorrect lemmas:

Lemma in the lexicon Correct lemma
номчун<v> номчуттун
өпей<v> өпейле (Aziyana says өпей exists too but as a name)

Forms that are plausible but rarely or never used, so Aziyana has doubts about them:

мөгеейн:мөгей<v><iv><imp><p1><sg>
мөгееалы:мөгей<v><iv><imp><p1><du>
аржаяйн:аржай<v><TD><imp><p1><sg>
арзаяйн:арзай<v><TD><imp><p1><sg>
мажаяйн:мажай<v><TD><imp><p1><sg>

some remaining imperative forms

Some imperatives are still broken.

This includes the following regressions because of #2:

>       2 ^чугаалаваайн/*чугаалаваайн$
>       2 ^сагындырбаайн/*сагындырбаайн$
>       1 ^кортпаайн/*кортпаайн$
>       1 ^чугаалаваайн/*чугаалаваайн$
>       1 ^чорбаайн/*чорбаайн$
>       1 ^чажырбаайн/*чажырбаайн$
>       1 ^узуткаваайн/*узуткаваайн$
>       1 ^барбаайн/*барбаайн$
>       1 ^чажырбаайн/*чажырбаайн$
>       1 ^тайылбырлаваайн/*тайылбырлаваайн$
>       1 ^адаваайн/*адаваайн$

And the following form from tests/verbs.yaml:

[1/3][FAIL] саг<v><tv><imp><p1><du> => Missing results: саалы
[1/3][FAIL] саг<v><tv><imp><p1><du> => Unexpected results: сааалы

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.