Coder Social home page Coder Social logo

eddieantonio / cree-sro-syllabics Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 1.0 261 KB

Convert between nêhiyawêwin SRO and syllabics!

Home Page: https://crk-orthography.readthedocs.io/en/stable/?badge=stable

License: MIT License

Python 100.00%
cree syllabics transliteration transliterator sro python transcriptor canadian-aboriginal-syllabics converter nehiyawewin

cree-sro-syllabics's Introduction

Cree SRO/Syllabics

Build Status codecov Documentation Status PyPI package Calver YYYY.MM.DD

Python 3 library to convert between Western Cree standard Roman Orthography (SRO) to syllabics and back again!

Can be used for:

  • nêhiyawêwin/ᓀᐦᐃᔭᐍᐏᐣ/Cree Y-dialect
  • nīhithawīwin/ᓃᐦᐃᖬᐑᐏᐣ/Cree Th-dialect
  • nēhinawēwin/ᓀᐦᐃᓇᐍᐏᐣ/Cree N-dialect

Install

Using pip:

pip install cree-sro-syllabics

Or, you can copy-paste or download cree_sro_syllabics.py into your own Python 3 project!

Usage

Visit the full documentation here! Wondering about words like "syllabics", "transliterator", or "orthography"? Visit the glossary!

Convert SRO to syllabics:

>>> from cree_sro_syllabics import sro2syllabics
>>> sro2syllabics('nêhiyawêwin')
'ᓀᐦᔭᐍᐏᐣ'
>>> sro2syllabics('write nêhiyawêwin')
'write ᓀᐦᐃᔭᐍᐏᐣ'

Convert syllabics to SRO:

>>> from cree_sro_syllabics import syllabics2sro
>>> syllabics2sro('ᐊᒋᒧᓯᐢ')
'acimosis'
>>> syllabics2sro(' → ᒪᐢᑫᑯᓯᕽ  ᑎᕒᐁᐩᓬ ')
' → maskêkosihk  tireyl '

See also

nêhiyawêwin syllabics

License

Copyright © 2018–2021 National Research Council Canada.

Licensed under the MIT license.

cree-sro-syllabics's People

Contributors

dependabot[bot] avatar eddieantonio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cree-sro-syllabics's Issues

Optionally apply Sandhi rule

Sandi merges syllables across morpheme boundaries.

For example, “miyw-âyâw” is written as ᒥᔼᔮᐤ and not as ᒥᕀᐤ-ᐋᔮᐤ, and especially not as miyw-ᐋᔮᐤ!

Handle look-alike characters in Syllabics→SRO

Wolvengrey.csv uses the wrong character for syllabics final "m".

The correct character for "m" is 'ᒼ' <U+14BC, CANADIAN SYLLABICS WEST-CREE M>
Wolvengrey.csv uses 'ᑦ' <U+1466, CANADIAN SYLLABICS T>

Wolvengrey.csv uses the wrong character for "hk"!

The correct character for "hk" is 'ᕽ' <U+157D, CANADIAN SYLLABICS HK>.
Wolvengrey.csv uses 'ᕁ' <U+1541 CANADIAN SYLLABICS SAYISI YI>

Additionally, U+1429 "ᐩ" may be used when U+1540 "ᕀ" should be used instead.

Change name to something that is not "crk orthography"

Proposed name: cree_sro_syllabics:

Pros

  • gets rid of overly technical word: "orthography"
  • It's intended for all Western Cree dialects–not just Plains Cree (crk)
  • explains that the module will deal with SRO and syllabics

Cons

  • kinda generic

  • Change internal references to crk[-_]orthography
  • rename GitHub repository
  • redirect crk-orthography PyPi to new name
  • Change links in crk-orthography demo
  • Change ReadTheDocs things leave this as "crk-orthography" for now

SRO->Syl: Transliterate hyphens into spaces

According to Arok:

...since we don't use hyphens in Syllabics, we have largely adopted spacing as the means to [indicate breaks of meaningful elements] in Syllabics. This means that wherever a (non-sandhi) hyphen appears in the SRO, there needs to be some indication of space in the Syllabics.

The two methods that have been used are: a) using a single space in Syllabics where SRO hyphens occur and using a double-space for word breaks; OR b) using a half-space in Syllabics where SRO hyphens occur and using a single-space for word breaks. The unicode font we are using for Syllabics does not appear to use much space at all for spaces, so single-spacing for hyphens and double-spacing between words would seem to work.

There are two methods that could be explored:

  • translating non-linebreak whitespace into double spaces and converting hyphens into a single U+0020 SPACE.
  • keeping spaces the same and transliterating the hyphens into a <U+2009 THIN SPACE>.

Arok again:

kâ-mahihkani-pimohtêt would (and should) display as ᑳ ᒪᐦᐃᐦᑲᓂ ᐱᒧᐦᑌᐟ, not ᑳᒪᐦᐃᐦᑲᓂᐱᒧᐦᑌᐟ

  • come up with sensible API
  • write tests
  • implement
  • augment documentation
  • [later] implement in online syllabics converter

Edit: perhaps <U+202F NARROW NO-BREAK SPACE> is more appropriate, as Arok is concerned about line breaks occurring within words. This roughly follows in the usage in Mongolian, where NNBSP is used to separate words from grammatical suffixes, however, this is mostly useful for rendering and font-shaping rules in Mongolian.

From The Unicode Standard, Version 11.0, chapter 13.5, "Narrow No-Break Space":

Because separated suffixes are usually considered an integral part of the word as a whole, a
line break opportunity does not normally occur before a separated suffix. The whitespace
preceding the suffix is often narrower than an ordinary space, although the width may
expand during justification.

Move all of the implementation into one file

...so that people can just copy the one file without having to use pip.

  • Incorporate files into __init__.py
    • sro.py
    • syllabics.py
    • __version__.py
  • Document how to "install" by copying the single file
  • Figure out what will happen with __main__.py it got deleted!

Handle alternate -y final in Syllabics→SRO

U+141D Dec:5149 CANADIAN SYLLABICS Y-CREE should be converted to simply "y".

From Wikipedia:

Some Plains Cree communities use a final for y which is different from the usual western final. This is a superposed dot ᐝ, instead of the usual ᐩ, as in ᓰᐱᐩ (ᓰᐱᐝ) sīpiy “river". When the dot y-final is placed after a syllabic which has a w-dot, the two dots combine to form a colon-like symbol, as in ᓅᐦᑖᐏᐩ (ᓅᐦᑖᐃ᛬) nōhtāwiy “my father".

sro2syllabics cannot transcribe words with internal hyphens

e.g.,

$ sro2syllabics
paskwâwi-mosotos
Traceback (most recent call last):
  File "/Users/eddieantonio/.local/share/virtualenvs/crk_orthography-FFf2rRmW/bin/sro2syllabics", line 11, in <module>
    load_entry_point('crk-orthography', 'console_scripts', 'sro2syllabics')()
  File "/Users/eddieantonio/Projects/crk_orthography/crk_orthography/__main__.py", line 46, in sro2syllabics_cli
    convert_with(sro2syllabics)
  File "/Users/eddieantonio/Projects/crk_orthography/crk_orthography/__main__.py", line 38, in convert_with
    print(converter(line), end='')
  File "/Users/eddieantonio/Projects/crk_orthography/crk_orthography/sro.py", line 176, in sro2syllabics
    return word_pattern.sub(transcode_match, nfc(sro))
  File "/Users/eddieantonio/Projects/crk_orthography/crk_orthography/sro.py", line 175, in transcode_match
    return transcode_sro_word_to_syllabics(match.group(0), sandhi)
  File "/Users/eddieantonio/Projects/crk_orthography/crk_orthography/sro.py", line 232, in transcode_sro_word_to_syllabics
    assert to_transcribe == '', 'could not transcribe %r' % (to_transcribe)
AssertionError: could not transcribe '-mosotos'

Support th-Cree

Support th-Cree

  • test sro2syllabics
  • implement in sro2syllabics
  • refactor
  • test syllabics2sro
  • implement syllabics2sro
  • document
  • publish

Make word matching stricter

A feature of crk_orthography is that it attempts to match only Cree words; words that are "obviously" not Cree words are not converted.

What "obviously not Cree" means is words that do not conform to Cree phonotactics (as written in SRO).

However, the current algorithm will match words like "I'm" because the ' is considered the vowel short-i and this means the sequence short-i, short-i, final-m is "valid Cree". Since, generally, vowels are generally not found adjacent to each other, this is wrong.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.