Hello! I made quite big commit: <a class="commit-link" data-hove

As far as I can tell, they are getting marked in the analysis. <div class="snippet

Loanwords after marking them about apertium-tat HOT 4 CLOSED

apertium commented on August 23, 2024

Loanwords after marking them

from apertium-tat.

Comments (4)

jonorthwash commented on August 23, 2024

As far as I can tell, they are getting marked in the analysis.

$ echo абитуриент | apertium -d . tat-morph
^абитуриент/абитуриент<n><attr>/абитуриент<n><nom>/абитуриент<n><nom>+и<cop><aor><p3><pl>/абитуриент<n><nom>+и<cop><aor><p3><sg>$^./.<sent>$

$ echo Курил | apertium -d . tat-morph
^Курил/Курил<np><top><attr>/Курил<np><top><nom>/Курил<np><top><attr><err_orth>/Курил<np><top><nom><err_orth>/Курил<np><top><nom>+и<cop><aor><p3><pl>/Курил<np><top><nom>+и<cop><aor><p3><sg>/Курил<np><top><nom>+и<cop><aor><p3><pl><err_orth>/Курил<np><top><nom>+и<cop><aor><p3><sg><err_orth>$^./.<sent>$

Or else, maybe I don't understand the problem.

Also, note that you don't need %{☭%} on the right side of a given entry if it's categorised as N1-RUS. That is, you should change a line like

абитуриент:абитуриент%{☭%} N1-RUS ; ! ""

to just

абитуриент:абитуриент N1-RUS ; ! ""

. The reason is that the N1-RUS definition already contains %{☭%}. This will result in two %{☭%}s in the lexc transducer, e.g.,

$ echo "абитуриент<n><dat>" | hfst-lookup .deps/tat.LR.lexc.hfst 
hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata.
Using HFST basic transducer format and performing slow lookups
> абитуриент<n><dat>	абитуриент{☭}{☭}>{G}{A}	0.000000

This has the potential to break a certain amount of phonology.

from apertium-tat.

mansayk commented on August 23, 2024

It seems that adjectives don't have A1-RUS form, how should I mark them? And what about NP-TOP, NP-ANT-M, NP-COG-OB?.. Maybe it is better to leave the following form?

абитуриент:абитуриент%{☭%} N1 ; ! ""

from apertium-tat.

jonorthwash commented on August 23, 2024

I would say it's better to use N1-RUS for nouns. For other parts of speech you can either make separate categories in the same way as N1-RUS or hard-code them like you have them.

One big advantage of having a separate category—besides not having to type/copy %{☭%} a lot—is that it will make it a lot easier to implement <err_orth> tags for forms that are spelled as if the words were not from Russian (like абитуриентне). In fact, we could simply add the following line to N1-RUS to achieve this:

N1 ; ! Err/Orth

On the other hand, perhaps not all words in this category are misspelled that way consistently, so it's possible we'd want to exclude them from getting <err_orth> tags. We could then either do everything manually or make a separate N1-RUS-ALWAYS category or similar. I favour more categories over hard-coding the phonology on a word-by-word basis.

from apertium-tat.

mansayk commented on August 23, 2024

I did that, I added -RUS to many categories, for example, A1-RUS, NP-TOP-RUS... Please take a look. I hope everything is correct.

from apertium-tat.

Loanwords after marking them about apertium-tat HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent