Comments (4)
As far as I can tell, they are getting marked in the analysis.
$ echo абитуриент | apertium -d . tat-morph
^абитуриент/абитуриент<n><attr>/абитуриент<n><nom>/абитуриент<n><nom>+и<cop><aor><p3><pl>/абитуриент<n><nom>+и<cop><aor><p3><sg>$^./.<sent>$
$ echo Курил | apertium -d . tat-morph
^Курил/Курил<np><top><attr>/Курил<np><top><nom>/Курил<np><top><attr><err_orth>/Курил<np><top><nom><err_orth>/Курил<np><top><nom>+и<cop><aor><p3><pl>/Курил<np><top><nom>+и<cop><aor><p3><sg>/Курил<np><top><nom>+и<cop><aor><p3><pl><err_orth>/Курил<np><top><nom>+и<cop><aor><p3><sg><err_orth>$^./.<sent>$
Or else, maybe I don't understand the problem.
Also, note that you don't need %{☭%}
on the right side of a given entry if it's categorised as N1-RUS
. That is, you should change a line like
абитуриент:абитуриент%{☭%} N1-RUS ; ! ""
to just
абитуриент:абитуриент N1-RUS ; ! ""
. The reason is that the N1-RUS
definition already contains %{☭%}
. This will result in two %{☭%}
s in the lexc
transducer, e.g.,
$ echo "абитуриент<n><dat>" | hfst-lookup .deps/tat.LR.lexc.hfst
hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata.
Using HFST basic transducer format and performing slow lookups
> абитуриент<n><dat> абитуриент{☭}{☭}>{G}{A} 0.000000
This has the potential to break a certain amount of phonology.
from apertium-tat.
It seems that adjectives don't have A1-RUS form, how should I mark them? And what about NP-TOP, NP-ANT-M, NP-COG-OB?.. Maybe it is better to leave the following form?
абитуриент:абитуриент%{☭%} N1 ; ! ""
from apertium-tat.
I would say it's better to use N1-RUS
for nouns. For other parts of speech you can either make separate categories in the same way as N1-RUS
or hard-code them like you have them.
One big advantage of having a separate category—besides not having to type/copy %{☭%}
a lot—is that it will make it a lot easier to implement <err_orth>
tags for forms that are spelled as if the words were not from Russian (like абитуриентне
). In fact, we could simply add the following line to N1-RUS
to achieve this:
N1 ; ! Err/Orth
On the other hand, perhaps not all words in this category are misspelled that way consistently, so it's possible we'd want to exclude them from getting <err_orth>
tags. We could then either do everything manually or make a separate N1-RUS-ALWAYS
category or similar. I favour more categories over hard-coding the phonology on a word-by-word basis.
from apertium-tat.
I did that, I added -RUS to many categories, for example, A1-RUS, NP-TOP-RUS... Please take a look. I hope everything is correct.
from apertium-tat.
Related Issues (20)
- "алд" instead of "ал" HOT 9
- асфальтны is not analyzed correctly HOT 16
- "бульдог" is not analyzed in the form "бульдогка" HOT 3
- бульдозер, бульдозерында HOT 1
- бунтарь, бунтарьлар HOT 16
- конъюнктивитны HOT 1
- объективрак HOT 1
- шәфәкъны
- Affixes after quotes HOT 2
- "китаб" instead of "китап" HOT 9
- Rule conflicts HOT 3
- -RUS tag vs -RUS-BACK and -RUS_FRONT HOT 4
- гыйнвар:январь HOT 2
- поши, пошиең HOT 14
- corpus data in tests-tatcorpus HOT 11
- Add analysis for 'дисәңче'
- Does archaic -мак verb form accept additional affixes
- Unrecognized numerals HOT 2
- Installed modes are missing files
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apertium-tat.