Comments (16)
The same thing here:
echo "ательены" | apertium-destxt -n | lt-proc -z -w 'apertium-tat/tat.automorf.bin' | cg-proc -z 'apertium-tat/tat.rlx.bin' | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' | apertium-retxt
^ательены/*ательены$
root@apertium:~# echo "ательене" | apertium-destxt -n | lt-proc -z -w 'apertium-tat/tat.automorf.bin' | cg-proc -z 'apertium-tat/tat.rlx.bin' | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' | apertium-retxt
^ательене/ателье<n><sg><acc>$
from apertium-tat.
According to Tatar orthographical dictionary it should be "асфальтны", not "асфальтне":
So we should definitely generate асфальтны, but should we analyse both forms? That is, is асфальтне attested commonly enough?
(Btw, the dictionary link doesn't show any relevant information when I click on it.)
from apertium-tat.
Also, can you confirm how nouns that end in ль behave, like роль, руль, автомобиль? What about words that end in бль, like рубль, ансамбль, etc.?
from apertium-tat.
but should we analyse both forms? That is, is асфальтне attested commonly enough?
Some people of course can write "асфальтне", but it will be spelling mistake. If we analyze both forms, than it will also affect apertium's spellchecker.
Although that spellchecker doesn't already work as expected because of many archaic and dialect words in the dictionary, that's why I think we should add some 'Orth' tag for "good" words in the dictionary and spellchecker would use only them...
Maybe here we should analyze both forms but add some additional tag that means that it is not orthographically correct. If I remember correctly @IlnarSelimcan already used one a couple of times...
from apertium-tat.
Also, can you confirm how nouns that end in ль behave, like роль, руль, автомобиль? What about words that end in бль, like рубль, ансамбль, etc.?
Most of them have affixes with front vowels, but there might be exceptions. For example, correct ones:
рольдән
рульдән
автомобильдән
ансамбльдән
but
акропольдан (I don't know why, but http://suzlek.antat.ru/words.php?txtW=%D0%B0%D0%BA%D1%80%D0%BE%D0%BF%D0%BE%D0%BB%D1%8C&submit=%D0%AD%D0%B7%D0%BB%D3%99%D2%AF)
from apertium-tat.
And some more:
фасоль, фасолена
декольте, декольтесы
кольт, кольты
вольт, вольты
from apertium-tat.
The dictionary urls aren't giving me any information of the sort you seem to be describing:
from apertium-tat.
^ательены/*ательены$
Do Russian words ending in ‹е› generally take back vowel endings? That is, is this part of a larger pattern, or is it an exception?
from apertium-tat.
Related issue: we have the lexicon set up to do both ноябрьдә and ноябрьда. Which is correct?
from apertium-tat.
Also, is it январенда or январендә? Once I got фасоленда working, январендә is now being produced as январенда. I'll hack it to only work with оль words for now, but this will need to be investigated.
from apertium-tat.
I think we should add some 'Orth' tag for "good" words in the dictionary and spellchecker would use only them...
Actually, we do the reverse. We add a tag <err_orth>
for words that are attested but are considered orthographic errors, and we just automatically remove them when we generate the spell checker. So what we want (and as of eb360c7 now get) is the following:
$ echo "асфальтны" | apertium -d . tat-morph
^асфальтны/асфальт<n><acc>$^./.<sent>$
$ echo "асфальтне" | apertium -d . tat-morph
^асфальтне/асфальт<n><acc><err_orth>$^./.<sent>$
Have a look at the commit—with knowledge of how the word-class categorisation works, it's pretty simple to do for many words.
from apertium-tat.
"Акрополь" is strange. You can search for that word here:
http://suzlek.antat.ru
And it finds it.
from apertium-tat.
According to the aforementioned website the correct one is "ноябрьдә".
from apertium-tat.
And also it says, the correct one is "январенда".
from apertium-tat.
"фасоль"
- correct "фасолена" according to orthographical dictionary.
- correct "фасольгә" according to explanatory dictionary.
So, it turned out both of them can be treated as correct?
from apertium-tat.
Do Russian words ending in ‹е› generally take back vowel endings? That is, is this part of a larger pattern, or is it an exception?
I cannot right now say it explicitly, but I think you are right. All words that came to my mind have endings with back vowels: ришельесы, ательесы, льесы, подпольесы.
from apertium-tat.
Related Issues (20)
- "алд" instead of "ал" HOT 9
- "бульдог" is not analyzed in the form "бульдогка" HOT 3
- бульдозер, бульдозерында HOT 1
- бунтарь, бунтарьлар HOT 16
- конъюнктивитны HOT 1
- объективрак HOT 1
- шәфәкъны
- Affixes after quotes HOT 2
- "китаб" instead of "китап" HOT 9
- Loanwords after marking them HOT 4
- Rule conflicts HOT 3
- -RUS tag vs -RUS-BACK and -RUS_FRONT HOT 4
- гыйнвар:январь HOT 2
- поши, пошиең HOT 14
- corpus data in tests-tatcorpus HOT 11
- Add analysis for 'дисәңче'
- Does archaic -мак verb form accept additional affixes
- Unrecognized numerals HOT 2
- Installed modes are missing files
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apertium-tat.