Coder Social home page Coder Social logo

Comments (4)

jonorthwash avatar jonorthwash commented on August 23, 2024

As far as I can tell, they are getting marked in the analysis.

$ echo абитуриент | apertium -d . tat-morph
^абитуриент/абитуриент<n><attr>/абитуриент<n><nom>/абитуриент<n><nom>+и<cop><aor><p3><pl>/абитуриент<n><nom>+и<cop><aor><p3><sg>$^./.<sent>$

$ echo Курил | apertium -d . tat-morph
^Курил/Курил<np><top><attr>/Курил<np><top><nom>/Курил<np><top><attr><err_orth>/Курил<np><top><nom><err_orth>/Курил<np><top><nom>+и<cop><aor><p3><pl>/Курил<np><top><nom>+и<cop><aor><p3><sg>/Курил<np><top><nom>+и<cop><aor><p3><pl><err_orth>/Курил<np><top><nom>+и<cop><aor><p3><sg><err_orth>$^./.<sent>$

Or else, maybe I don't understand the problem.

Also, note that you don't need %{☭%} on the right side of a given entry if it's categorised as N1-RUS. That is, you should change a line like

абитуриент:абитуриент%{☭%} N1-RUS ; ! ""

to just

абитуриент:абитуриент N1-RUS ; ! ""

. The reason is that the N1-RUS definition already contains %{☭%}. This will result in two %{☭%}s in the lexc transducer, e.g.,

$ echo "абитуриент<n><dat>" | hfst-lookup .deps/tat.LR.lexc.hfst 
hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata.
Using HFST basic transducer format and performing slow lookups
> абитуриент<n><dat>	абитуриент{☭}{☭}>{G}{A}	0.000000

This has the potential to break a certain amount of phonology.

from apertium-tat.

mansayk avatar mansayk commented on August 23, 2024

It seems that adjectives don't have A1-RUS form, how should I mark them? And what about NP-TOP, NP-ANT-M, NP-COG-OB?.. Maybe it is better to leave the following form?

абитуриент:абитуриент%{☭%} N1 ; ! ""

from apertium-tat.

jonorthwash avatar jonorthwash commented on August 23, 2024

I would say it's better to use N1-RUS for nouns. For other parts of speech you can either make separate categories in the same way as N1-RUS or hard-code them like you have them.

One big advantage of having a separate category—besides not having to type/copy %{☭%} a lot—is that it will make it a lot easier to implement <err_orth> tags for forms that are spelled as if the words were not from Russian (like абитуриентне). In fact, we could simply add the following line to N1-RUS to achieve this:

N1 ; ! Err/Orth

On the other hand, perhaps not all words in this category are misspelled that way consistently, so it's possible we'd want to exclude them from getting <err_orth> tags. We could then either do everything manually or make a separate N1-RUS-ALWAYS category or similar. I favour more categories over hard-coding the phonology on a word-by-word basis.

from apertium-tat.

mansayk avatar mansayk commented on August 23, 2024

I did that, I added -RUS to many categories, for example, A1-RUS, NP-TOP-RUS... Please take a look. I hope everything is correct.

from apertium-tat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.