The cree-sro-syllabics from eddieantonio

Erroneously fails to transcribe valid examples

See the attached file. Not all examples are false positives (things that sro2syllabics() should transcribe).

untranslatedlemmas.txt

Handle macrons in SRO→Syllabics

Both «ē» and «ê» should be valid.

Optionally apply Sandhi rule

Sandi merges syllables across morpheme boundaries.

For example, “miyw-âyâw” is written as ᒥᔼᔮᐤ and not as ᒥᕀᐤ-ᐋᔮᐤ, and especially not as miyw-ᐋᔮᐤ!

Handle look-alike characters in Syllabics→SRO

Wolvengrey.csv uses the wrong character for syllabics final "m".

The correct character for "m" is 'ᒼ' <U+14BC, CANADIAN SYLLABICS WEST-CREE M>
Wolvengrey.csv uses 'ᑦ' <U+1466, CANADIAN SYLLABICS T>

Wolvengrey.csv uses the wrong character for "hk"!

The correct character for "hk" is 'ᕽ' <U+157D, CANADIAN SYLLABICS HK>.
Wolvengrey.csv uses 'ᕁ' <U+1541 CANADIAN SYLLABICS SAYISI YI>

Additionally, U+1429 "ᐩ" may be used when U+1540 "ᕀ" should be used instead.

Add remaining nwV syllabics

The rare sequences should still be converted.

Syl->SRO: treat <U+1427 CANADIAN SYLLABICS FINAL MIDDLE DOT> as a "w" dot

Syllabics words with <U+1427 CANADIAN SYLLABICS FINAL MIDDLE DOT> should be replaced into their "composed" counterpart.

write test
implement
document

Create command line transcriptors (sro2syllabics) and (syllabics2sro)

Write initial implementation
Write --version
Support --macrons/--circumflexes options for syllabics2sro
Support --sandhi/--no-sandhi options for sro2syllabics
Write documentation in README
~~Write basic documentation in doc/cli.rst~~
Write full documentation in --help

Change name to something that is not "crk orthography"

Proposed name: cree_sro_syllabics:

Pros

gets rid of overly technical word: "orthography"
It's intended for all Western Cree dialects–not just Plains Cree (crk)
explains that the module will deal with SRO and syllabics

Cons

kinda generic

Change internal references to crk[-_]orthography
rename GitHub repository
redirect crk-orthography PyPi to new name
Change links in crk-orthography demo
~~Change ReadTheDocs things~~ leave this as "crk-orthography" for now

SRO->Syl: Transliterate hyphens into spaces

According to Arok:

...since we don't use hyphens in Syllabics, we have largely adopted spacing as the means to [indicate breaks of meaningful elements] in Syllabics. This means that wherever a (non-sandhi) hyphen appears in the SRO, there needs to be some indication of space in the Syllabics.

The two methods that have been used are: a) using a single space in Syllabics where SRO hyphens occur and using a double-space for word breaks; OR b) using a half-space in Syllabics where SRO hyphens occur and using a single-space for word breaks. The unicode font we are using for Syllabics does not appear to use much space at all for spaces, so single-spacing for hyphens and double-spacing between words would seem to work.

There are two methods that could be explored:

translating non-linebreak whitespace into double spaces and converting hyphens into a single U+0020 SPACE.
keeping spaces the same and transliterating the hyphens into a <U+2009 THIN SPACE>.

Arok again:

kâ-mahihkani-pimohtêt would (and should) display as ᑳ ᒪᐦᐃᐦᑲᓂ ᐱᒧᐦᑌᐟ, not ᑳᒪᐦᐃᐦᑲᓂᐱᒧᐦᑌᐟ

Edit: perhaps <U+202F NARROW NO-BREAK SPACE> is more appropriate, as Arok is concerned about line breaks occurring within words. This roughly follows in the usage in Mongolian, where NNBSP is used to separate words from grammatical suffixes, however, this is mostly useful for rendering and font-shaping rules in Mongolian.

From The Unicode Standard, Version 11.0, chapter 13.5, "Narrow No-Break Space":

Because separated suffixes are usually considered an integral part of the word as a whole, a
line break opportunity does not normally occur before a separated suffix. The whitespace
preceding the suffix is often narrower than an ordinary space, although the width may
expand during justification.

Does not properly handle h-V in sandhi cases

There's a bug in handling the following cases:

['âh-ayinânêw', 'âh-ayîtaw', 'mistah-âya']

Optionally produce macrons in Syllabics→SRO

Add a keyword argument to syllabics2sro:

syllabics2sro(produce_macrons=False):

Move all of the implementation into one file

...so that people can just copy the one file without having to use pip.

Incorporate files into __init__.py
- sro.py
- syllabics.py
- __version__.py
Document how to "install" by copying the single file
~~Figure out what will happen with __main__.py~~ it got deleted!

Handle alternate -y final in Syllabics→SRO

U+141D Dec:5149 CANADIAN SYLLABICS Y-CREE should be converted to simply "y".

From Wikipedia:

Some Plains Cree communities use a final for y which is different from the usual western final. This is a superposed dot ᐝ, instead of the usual ᐩ, as in ᓰᐱᐩ (ᓰᐱᐝ) sīpiy “river". When the dot y-final is placed after a syllabic which has a w-dot, the two dots combine to form a colon-like symbol, as in ᓅᐦᑖᐏᐩ (ᓅᐦᑖᐃ᛬) nōhtāwiy “my father".

sro2syllabics cannot transcribe words with internal hyphens

e.g.,

$ sro2syllabics
paskwâwi-mosotos
Traceback (most recent call last):
  File "/Users/eddieantonio/.local/share/virtualenvs/crk_orthography-FFf2rRmW/bin/sro2syllabics", line 11, in <module>
    load_entry_point('crk-orthography', 'console_scripts', 'sro2syllabics')()
  File "/Users/eddieantonio/Projects/crk_orthography/crk_orthography/__main__.py", line 46, in sro2syllabics_cli
    convert_with(sro2syllabics)
  File "/Users/eddieantonio/Projects/crk_orthography/crk_orthography/__main__.py", line 38, in convert_with
    print(converter(line), end='')
  File "/Users/eddieantonio/Projects/crk_orthography/crk_orthography/sro.py", line 176, in sro2syllabics
    return word_pattern.sub(transcode_match, nfc(sro))
  File "/Users/eddieantonio/Projects/crk_orthography/crk_orthography/sro.py", line 175, in transcode_match
    return transcode_sro_word_to_syllabics(match.group(0), sandhi)
  File "/Users/eddieantonio/Projects/crk_orthography/crk_orthography/sro.py", line 232, in transcode_sro_word_to_syllabics
    assert to_transcribe == '', 'could not transcribe %r' % (to_transcribe)
AssertionError: could not transcribe '-mosotos'

Support th-Cree

Add option to write unpointed text

SRO->Syl: Implement <U+166E CANADIAN SYLLABICS FULL STOP>!

Full-stops should be transliterated into full-stops!

Write test
Implement
Document
~~Refactor~~

Make word matching stricter

A feature of crk_orthography is that it attempts to match only Cree words; words that are "obviously" not Cree words are not converted.

What "obviously not Cree" means is words that do not conform to Cree phonotactics (as written in SRO).

However, the current algorithm will match words like "I'm" because the ' is considered the vowel short-i and this means the sequence short-i, short-i, final-m is "valid Cree". Since, generally, vowels are generally not found adjacent to each other, this is wrong.

eddieantonio / cree-sro-syllabics Goto Github PK

cree-sro-syllabics's Introduction

Cree SRO/Syllabics

Install

Usage

See also

License

cree-sro-syllabics's People

Contributors

Stargazers

Watchers

Forkers

cree-sro-syllabics's Issues

Pros

Cons

Recommend Projects

Recommend Topics

Recommend Org