Coder Social home page Coder Social logo

jfiander / syllabify Goto Github PK

View Code? Open in Web Editor NEW

This project forked from codyrobbins/syllabify

0.0 2.0 0.0 20 KB

A Ruby port of the Penn Phonetics Toolkit (P2TK) syllabifier.

Home Page: http://codyrobbins.com/software/syllabify

License: MIT License

Ruby 100.00%

syllabify's Introduction

Syllabify

A Ruby port of the Penn Phonetic Toolkit (P2TK) syllabifier. Unlike the P2TK syllabifier, this implementation works on transcriptions in IPA rather than Arpabet. Given a phonemic transcription in IPA, it automatically segments the phonemes into syllables.

Like the P2TK syllabifier, a phoneme inventory containing the legal consonants, nuclei (typically the language’s vowels), and onsets in the transcribed language must be created. This inventory is specified as plain text in YAML and a default phoneme inventory for English from the P2TK syllabifier is included. If you create inventories for other languages, please submit a pull request (or simply email it to me if you’re not a techie) and I will include it in subsequent releases of the gem.

Full documentation is at RubyDoc.info.

Transcription constraints

Any phonemes represented in IPA by digraphs (such as affricates, doubly-articulated consonants, and diphthongs) must be transcribed using a tie, otherwise there is no way to distinguish them from the phonemes of their individual components and syllabification will be incorrect in some cases.

For example, in English transcriptions the voiceless postalveolar affricate is customarily transcribed without the tie. For the purposes of syllabification, however, this is problematic because in English /t͡ʃ/ is a legal onset but /tʃ/ is not. If the English voiceless postalveolar affricate were to be transcribed without the tie as /tʃ/ then it would have to be included in the inventory of onsets, but doing so would cause the phoneme sequence /t/ followed by /ʃ/ to also be interpreted as an onset—which it isn’t. In this case, without the tie transcriptions where /tʃ/ represents two different phonemes rather than one—such as in nutshell /nʌtʃɛl/—will be incorrectly syllabified as /nʌ.tʃɛl/ rather than /nʌt.ʃɛl/. Similarly, without a tie it’s not possible to determine whether the diphthong /ɔɪ/ in clawing /klɔɪŋ/ is one or two separate phonemes.

In other words, tie glyphs together if they represent the same phoneme. The only digraphs requiring ties in English are the voiced and voiceless postalveolar affricates /t͡ʃ, d͡ʒ/ and the diphthongs /a͡ʊ, a͡ɪ, e͡ɪ, o͡ʊ, ɔ͡ɪ/.

How to enter ties

The tie is represented in Unicode by Combining Double Inverted Breve (U+0361). This character is entered between the two characters to be tied.

Ligatures

Phonemes that require transcription with ties could potentially be alternatively transcribed using their respective ligatures, but glyphs for all the potential ligatures aren’t defined in Unicode and use of the ligatures is no longer official IPA usage in any case.

Example

transcription = CodyRobbins::Syllabify.new(:en, 'dɪˌsɔrgənəˈze͡ɪʃən')

transcription.to_s      #=> 'dɪ.ˌsɔr.gə.nə.ˈze͡ɪ.ʃən'
transcription.syllables #=> [dɪ, ˌsɔr, gə, nə, ˈze͡ɪ, ʃən]

syllable = transcription.syllables[4]
syllable.stress  #=> 'ˈ'
syllable.onset   #=> 'z'
syllable.nucleus #=> 'e͡ɪ'
syllable.coda    #=> ''

syllable = transcription.syllables.last
syllable.stress  #=> nil
syllable.onset   #=> 'ʃ'
syllable.nucleus #=> 'ə'
syllable.coda    #=> 'n'

Colophon

See also

If you like this gem, you may also want to check out transliterate.

Tested with

  • Ruby 1.9.2-p290 — 18 October 2011

Contributing

To send patches, please fork on GitHub and submit a pull request.

Credits

© 2011 Cody Robbins. See LICENSE for details.

syllabify's People

Contributors

codyrobbins avatar jfiander avatar germanoid avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.