Coder Social home page Coder Social logo

Comments (16)

mansayk avatar mansayk commented on June 23, 2024

The same thing here:

echo "ательены" | apertium-destxt -n | lt-proc -z -w 'apertium-tat/tat.automorf.bin' | cg-proc -z 'apertium-tat/tat.rlx.bin' | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' | apertium-retxt
^ательены/*ательены$

root@apertium:~# echo "ательене" | apertium-destxt -n | lt-proc -z -w 'apertium-tat/tat.automorf.bin' | cg-proc -z 'apertium-tat/tat.rlx.bin' | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' | apertium-retxt
^ательене/ателье<n><sg><acc>$

from apertium-tat.

jonorthwash avatar jonorthwash commented on June 23, 2024

According to Tatar orthographical dictionary it should be "асфальтны", not "асфальтне":

So we should definitely generate асфальтны, but should we analyse both forms? That is, is асфальтне attested commonly enough?

(Btw, the dictionary link doesn't show any relevant information when I click on it.)

from apertium-tat.

jonorthwash avatar jonorthwash commented on June 23, 2024

Also, can you confirm how nouns that end in ль behave, like роль, руль, автомобиль? What about words that end in бль, like рубль, ансамбль, etc.?

from apertium-tat.

mansayk avatar mansayk commented on June 23, 2024

but should we analyse both forms? That is, is асфальтне attested commonly enough?

Some people of course can write "асфальтне", but it will be spelling mistake. If we analyze both forms, than it will also affect apertium's spellchecker.

Although that spellchecker doesn't already work as expected because of many archaic and dialect words in the dictionary, that's why I think we should add some 'Orth' tag for "good" words in the dictionary and spellchecker would use only them...

Maybe here we should analyze both forms but add some additional tag that means that it is not orthographically correct. If I remember correctly @IlnarSelimcan already used one a couple of times...

from apertium-tat.

mansayk avatar mansayk commented on June 23, 2024

Also, can you confirm how nouns that end in ль behave, like роль, руль, автомобиль? What about words that end in бль, like рубль, ансамбль, etc.?

Most of them have affixes with front vowels, but there might be exceptions. For example, correct ones:
рольдән
рульдән
автомобильдән
ансамбльдән
but
акропольдан (I don't know why, but http://suzlek.antat.ru/words.php?txtW=%D0%B0%D0%BA%D1%80%D0%BE%D0%BF%D0%BE%D0%BB%D1%8C&submit=%D0%AD%D0%B7%D0%BB%D3%99%D2%AF)

from apertium-tat.

mansayk avatar mansayk commented on June 23, 2024

And some more:
фасоль, фасолена
декольте, декольтесы
кольт, кольты
вольт, вольты

from apertium-tat.

jonorthwash avatar jonorthwash commented on June 23, 2024

The dictionary urls aren't giving me any information of the sort you seem to be describing:
screenshot from 2019-01-17 23-30-42

from apertium-tat.

jonorthwash avatar jonorthwash commented on June 23, 2024

^ательены/*ательены$

Do Russian words ending in ‹е› generally take back vowel endings? That is, is this part of a larger pattern, or is it an exception?

from apertium-tat.

jonorthwash avatar jonorthwash commented on June 23, 2024

Related issue: we have the lexicon set up to do both ноябрьдә and ноябрьда. Which is correct?

from apertium-tat.

jonorthwash avatar jonorthwash commented on June 23, 2024

Also, is it январенда or январендә? Once I got фасоленда working, январендә is now being produced as январенда. I'll hack it to only work with оль words for now, but this will need to be investigated.

from apertium-tat.

jonorthwash avatar jonorthwash commented on June 23, 2024

I think we should add some 'Orth' tag for "good" words in the dictionary and spellchecker would use only them...

Actually, we do the reverse. We add a tag <err_orth> for words that are attested but are considered orthographic errors, and we just automatically remove them when we generate the spell checker. So what we want (and as of eb360c7 now get) is the following:

$ echo "асфальтны" | apertium -d . tat-morph
^асфальтны/асфальт<n><acc>$^./.<sent>$

$ echo "асфальтне" | apertium -d . tat-morph
^асфальтне/асфальт<n><acc><err_orth>$^./.<sent>$

Have a look at the commit—with knowledge of how the word-class categorisation works, it's pretty simple to do for many words.

from apertium-tat.

mansayk avatar mansayk commented on June 23, 2024

"Акрополь" is strange. You can search for that word here:
http://suzlek.antat.ru
And it finds it.

from apertium-tat.

mansayk avatar mansayk commented on June 23, 2024

According to the aforementioned website the correct one is "ноябрьдә".

from apertium-tat.

mansayk avatar mansayk commented on June 23, 2024

And also it says, the correct one is "январенда".

from apertium-tat.

mansayk avatar mansayk commented on June 23, 2024

"фасоль"

  • correct "фасолена" according to orthographical dictionary.
  • correct "фасольгә" according to explanatory dictionary.
    So, it turned out both of them can be treated as correct?

from apertium-tat.

mansayk avatar mansayk commented on June 23, 2024

Do Russian words ending in ‹е› generally take back vowel endings? That is, is this part of a larger pattern, or is it an exception?

I cannot right now say it explicitly, but I think you are right. All words that came to my mind have endings with back vowels: ришельесы, ательесы, льесы, подпольесы.

from apertium-tat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.