Coder Social home page Coder Social logo

quadrismegistus / prosodic Goto Github PK

View Code? Open in Web Editor NEW
268.0 14.0 41.0 52.49 MB

Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.

License: GNU General Public License v3.0

Python 5.66% Shell 0.01% Jupyter Notebook 0.31% CSS 2.93% JavaScript 90.93% HTML 0.09% SCSS 0.07%
metrical-parser linguistics nlp finnish-language-analysis poetry rhythm

prosodic's Introduction

Prosodic

Prosodic is a metrical-phonological parser written in Python. Currently, it can parse English and Finnish text, but adding additional languages is easy with a pronunciation dictionary or a custom python function. Prosodic was built by Ryan Heuser, Josh Falk, and Arto Anttila. Josh also maintains another repository, in which he has rewritten the part of this project that does phonetic transcription for English and Finnish. Sam Bowman has contributed to the codebase as well, adding several new metrical constraints.

This version, "Prosodic 2", is a near-total rewrite of the original Prosodic.

Supports Python>=3.8.

Install

1. Install python package

For now, pip-install directly from github:

pip install git+https://github.com/quadrismegistus/prosodic

2. Install espeak

Install espeak, free text-to-speak (TTS) software, to ‘sound out’ unknown words.

Usage

Web app

Prosodic has a new GUI (graphical user interface) in a web app. After installing, run:

prosodic

Then navigate to http://127.0.0.1:5000/. It should look like this:

prosodic-gui2

Python

Read texts

# import prosodic
import prosodic

# load a text
sonnet = prosodic.Text("""
Those hours, that with gentle work did frame
The lovely gaze where every eye doth dwell,
Will play the tyrants to the very same
And that unfair which fairly doth excel;
For never-resting time leads summer on
To hideous winter, and confounds him there;
Sap checked with frost, and lusty leaves quite gone,
Beauty o’er-snowed and bareness every where:
Then were not summer’s distillation left,
A liquid prisoner pent in walls of glass,
Beauty’s effect with beauty were bereft,
Nor it, nor no remembrance what it was:
But flowers distill’d, though they with winter meet,
Leese but their show; their substance still lives sweet.
""")

# can also load by filename
shaksonnets = prosodic.Text(fn='corpora/corppoetry_en/en.shakespeare.txt')

Stanzas, lines, words, syllables, phonemes

Texts in prosodic are organized into a tree structure. The .children of a Text object is a list of Stanza's, whose .parent objects point back to the Text. In turn, in each stanza's .children is a list of Line's, whose .parent's point back to the stanza; so on down the tree.

# Take a peek at this tree structure 
# and the features particular entities have
sonnet.show(maxlines=30, incl_phons=True)
Text()
|   Stanza(num=1)
|       Line(num=1, txt='Those hours, that with gentle work did frame')
|           WordToken(num=1, txt='Those', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='Those', lang='en', num_forms=1)
|                   WordForm(num=1, txt='Those')
|                       Syllable(ipa='ðoʊz', num=1, txt='Those', is_stressed=False, is_heavy=True)
|                           Phoneme(num=1, txt='ð', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='o', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=1, round=1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|           WordToken(num=2, txt=' hours', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='hours', lang='en', num_forms=2)
|                   WordForm(num=1, txt='hours')
|                       Syllable(ipa="'aʊ", num=1, txt='ho', is_stressed=True, is_heavy=True, is_strong=True, is_weak=False)
|                           Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                       Syllable(ipa='ɛːz', num=2, txt='urs', is_stressed=False, is_heavy=True, is_strong=False, is_weak=True)
|                           Phoneme(num=2, txt='ɛː', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=-1, long=1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                   WordForm(num=2, txt='hours')
|                       Syllable(ipa="'aʊrz", num=1, txt='hours', is_stressed=True, is_heavy=True)
|                           Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='r', syl=-1, son=1, cons=1, cont=1, delrel=0, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=0, lo=0, back=0, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|           WordToken(num=3, txt=',', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt=',', lang='en', num_forms=0, is_punc=True)
|           WordToken(num=4, txt=' that', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='that', lang='en', num_forms=3)
# take a peek at it in dataframe form
sonnet.df   # by-syllable dataframe representation
sonnet      # ...which will also be shown when text object displayed (in a notebook)
word_num_forms syll_is_stressed syll_is_heavy syll_is_strong syll_is_weak word_is_punc
stanza_num line_num line_txt sent_num sentpart_num wordtoken_num wordtoken_txt word_lang wordform_num syll_num syll_txt syll_ipa
1 1 Those hours, that with gentle work did frame 1 1 1 Those en 1 1 Those ðoʊz 1 0 1
2 hours en 1 1 ho 'aʊ 2 1 1 1 0
2 urs ɛːz 2 0 1 0 1
2 1 hours 'aʊrz 2 1 1
3 , en 0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
14 Leese but their show; their substance still lives sweet. 1 1 7 substance en 1 2 tance stəns 1 0 1 0 1
8 still en 1 1 still 'stɪl 1 1 1
9 lives en 1 1 lives 'lɪvz 1 1 1
10 sweet en 1 1 sweet 'swiːt 1 1 1
11 . en 0 0 0 1

195 rows × 6 columns

# you can loop over this directly if you want
for stanza in shaksonnets.stanzas:
    for line in sonnet:
        for wordtoken in line:
            for wordtype in wordtoken:
                for wordform in wordtype:
                    for syllable in wordform:
                        for phoneme in syllable:
                            # ...
                            pass
# or directly access components
print(f'''
Shakespeare's sonnets have:
  * {len(shaksonnets.stanzas):,} "stanzas"        (in this text, each one a sonnet)
  * {len(shaksonnets.lines):,} lines
  * {len(shaksonnets.wordtokens):,} wordtokens    (including punctuation)
  * {len(shaksonnets.wordtypes):,} wordtypes     (each token has one wordtype object)
  * {len(shaksonnets.wordforms):,} wordforms     (a word + IPA pronunciation; no punctuation)
  * {len(shaksonnets.syllables):,} syllables
  * {len(shaksonnets.phonemes):,} phonemes
''')
Shakespeare's sonnets have:
  * 154 "stanzas"        (in this text, each one a sonnet)
  * 2,155 lines
  * 20,317 wordtokens    (including punctuation)
  * 20,317 wordtypes     (each token has one wordtype object)
  * 17,601 wordforms     (a word + IPA pronunciation; no punctuation)
  * 21,915 syllables
  * 63,614 phonemes
# access lines

# text.line{num} will return text.lines[num-1]
assert sonnet.line1 is sonnet.lines[0]
assert sonnet.line10 is sonnet.lines[9]

# show the line
sonnet.line1
word_num_forms syll_is_stressed syll_is_heavy syll_is_strong syll_is_weak word_is_punc
line_num line_txt sent_num sentpart_num wordtoken_num wordtoken_txt word_lang wordform_num syll_num syll_txt syll_ipa
1 Those hours, that with gentle work did frame 1 1 1 Those en 1 1 Those ðoʊz 1 0 1
2 hours en 1 1 ho 'aʊ 2 1 1 1 0
2 urs ɛːz 2 0 1 0 1
2 1 hours 'aʊrz 2 1 1
3 , en 0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ...
6 gentle en 1 2 tle təl 1 0 1 0 1
7 work en 1 1 work 'wɛːk 1 1 1
8 did en 1 1 did dɪd 2 0 1
2 1 did 'dɪd 2 1 1
9 frame en 1 1 frame 'freɪm 1 1 1

15 rows × 6 columns

# build lines directly
line_from_richardIII = prosodic.Line('A horse, a horse, my kingdom for a horse!')
line_from_richardIII
�[34m�[1mtokenizing�[0m�[36m @ 2023-12-15 14:14:17,991�[0m
�[34m�[1m⎿ 0 seconds�[0m�[36m @ 2023-12-15 14:14:17,992�[0m
word_num_forms syll_is_stressed syll_is_heavy word_is_punc syll_is_strong syll_is_weak
line_txt sent_num sentpart_num wordtoken_num wordtoken_txt word_lang wordform_num syll_num syll_txt syll_ipa
A horse, a horse, my kingdom for a horse! 1 1 1 A en 1 1 A 1 0 1
2 horse en 1 1 horse 'hɔːrs 1 1 1
3 , en 0 0 0 1
4 a en 1 1 a 1 0 1
5 horse en 1 1 horse 'hɔːrs 1 1 1
... ... ... ... ... ... ... ... ... ... ... ... ...
8 kingdom en 1 2 dom dəm 1 0 1 0 1
9 for en 1 1 for fɔːr 1 0 1
10 a en 1 1 a 1 0 1
11 horse en 1 1 horse 'hɔːrs 1 1 1
12 ! en 0 0 0 1

13 rows × 6 columns

Metrical parsing

Parsing lines
# parse with default options by just reaching for best parse
plausible_parses = line_from_richardIII.parse()
plausible_parses
parse_score parse_is_bounded meterpos_num_slots *w_peak *w_stress *s_unstress *unres_across *unres_within
line_txt parse_rank parse_txt parse_meter parse_stress
A horse, a horse, my kingdom for a horse! 1 a HORSE a HORSE my KING dom FOR a HORSE -+-+-+-+-+ -+-+-+---+ 1.0 0.0 10 0 0 1 0 0
# see best parse
line_from_richardIII.best_parse
A horse a horse my kingdom for a horse
⎿ Parse(rank=1, meter='-+-+-+-+-+', stress='-+-+-+---+', score=1, is_bounded=0)
# parse with different options
diff_parses = line_from_richardIII.parse(constraints=('w_peak','s_unstress'))
diff_parses
parse_score parse_is_bounded meterpos_num_slots *w_peak *s_unstress
line_txt parse_rank parse_txt parse_meter parse_stress
A horse, a horse, my kingdom for a horse! 1 a HORSE a HORSE my KING dom FOR a HORSE -+-+-+-+-+ -+-+-+---+ 1.0 0.0 10 0 1
2 a HORSE a HORSE my KING dom FOR a.horse -+-+-+-+-- -+-+-+---+ 1.0 0.0 12 0 1
3 a HORSE a HORSE my KING dom.for A horse -+-+-+--+- -+-+-+---+ 1.0 0.0 12 0 1
4 a HORSE a HORSE my KING dom.for A.HORSE -+-+-+--++ -+-+-+---+ 1.0 0.0 14 0 1
5 a HORSE a HORSE my KING.DOM for.a HORSE -+-+-++--+ -+-+-+---+ 1.0 0.0 14 0 1
6 a HORSE a HORSE my KING dom FOR.A horse -+-+-+-++- -+-+-+---+ 2.0 0.0 12 0 2
Parsing texts
# small texts
sonnet.parse()
�[34m�[1mparsing 14 lines [5x]�[0m�[36m @ 2023-12-15 14:17:43,563�[0m
�[1;34m│ stanza 01, line 14: LEESE but.their SHOW their SUBS tance STILL lives SWEET: 100%|�[0;36m██████████�[0;36m| 14/14 [00:00<00:00, 45.78it/s]
�[34m�[1m⎿ 0.3 seconds�[0m�[36m @ 2023-12-15 14:17:43,873�[0m
parse_score parse_is_bounded meterpos_num_slots *w_peak *w_stress *s_unstress *unres_across *unres_within
stanza_num line_num line_txt parse_rank parse_txt parse_meter parse_stress
1 1 Those hours, that with gentle work did frame 1 those HO urs THAT with GEN tle WORK did FRAME -+-+-+-+-+ -+-+-+-+-+ 0.0 0.0 10 0 0 0 0 0
2 those HOURS that.with GEN tle WORK did FRAME -+--+-+-+ -+--+-+-+ 0.0 0.0 11 0 0 0 0 0
3 those HOURS that.with GEN tle WORK did FRAME -+--+-+-+ -+--+-+-+ 0.0 0.0 11 0 0 0 0 0
2 The lovely gaze where every eye doth dwell, 1 the LO vely GAZE where E very EYE doth DWELL -+-+-+-+-+ -+-+-+-+-+ 0.0 0.0 10 0 0 0 0 0
2 the LO vely GAZE where E ve.ry EYE doth DWELL -+-+-+--+-+ -+-+-+--+-+ 1.0 0.0 13 0 0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ...
13 But flowers distill'd, though they with winter meet, 1 but FLO wers DIS.TILL'D though THEY with WIN ter MEET -+-++-+-+-+ -+--+-+-+-+ 2.0 0.0 13 0 0 1 0 1
2 but FLO wers.dis TILL'D though THEY with WIN ter MEET -+--+-+-+-+ -+--+-+-+-+ 2.0 0.0 13 0 0 0 2 0
3 but FLO.WERS dis TILL'D though THEY with WIN ter MEET -++-+-+-+-+ -+--+-+-+-+ 2.0 0.0 13 0 0 1 0 1
4 but FLO wers DIS till'd THOUGH they.with WIN ter MEET -+-+-+--+-+ -+--+---+-+ 4.0 0.0 13 1 1 2 0 0
14 Leese but their show; their substance still lives sweet. 1 LEESE but.their SHOW their SUBS tance STILL lives SWEET +--+-+-+-+ +--+-+-+++ 1.0 0.0 12 0 1 0 0 0

37 rows × 8 columns

# and big texts
shaksonnets.parse()
�[34m�[1mparsing 2155 lines [5x]�[0m�[36m @ 2023-12-15 14:17:52,124�[0m
�[1;34m│ stanza 154, line 14: love's FI re HEATS.WA ter WA ter COOLS not LOVE       : 100%|�[0;36m██████████�[0;36m| 2155/2155 [00:56<00:00, 38.03it/s]
�[34m�[1m⎿ 57.4 seconds�[0m�[36m @ 2023-12-15 14:18:49,496�[0m
parse_score parse_is_bounded meterpos_num_slots *w_peak *w_stress *s_unstress *unres_across *unres_within
stanza_num line_num line_txt parse_rank parse_txt parse_meter parse_stress
1 1 FROM fairest creatures we desire increase, 1 from FAI rest CREA tures WE de SIRE in CREASE -+-+-+-+-+ -+-+-+-+-+ 0.0 0.0 10 0 0 0 0 0
2 from FAI rest CREA tures WE de SI re IN crease -+-+-+-+-+- -+-+-+-+-++ 1.0 0.0 11 0 1 0 0 0
3 from FAI rest CREA tures WE de SI re IN.CREASE -+-+-+-+-++ -+-+-+-+-++ 1.0 0.0 13 0 0 0 0 1
4 from FAI rest CREA tures WE de SI re.in CREASE -+-+-+-+--+ -+-+-+-+--+ 2.0 0.0 13 0 0 0 2 0
2 That thereby beauty's rose might never die, 1 that THE reby BEA uty's ROSE might NE ver DIE -+-+-+-+-+ -+++-+-+-+ 1.0 0.0 10 0 1 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
154 14 Love's fire heats water, water cools not love. 2 love's FI re HEATS wa.ter WA ter COOLS not LOVE -+-+--+-+-+ ++-++-+-+-+ 4.0 0.0 13 1 2 0 0 1
3 love's FI.RE heats WA ter WA ter COOLS not LOVE -++-+-+-+-+ ++-++-+-+-+ 4.0 0.0 13 0 2 1 0 1
4 LOVE'S fire HEATS wa.ter WA ter COOLS not LOVE +-+--+-+-+ ++++-+-+-+ 4.0 0.0 12 1 2 0 0 1
5 LOVE'S.FI re HEATS.WA ter WA ter COOLS not LOVE ++-++-+-+-+ ++-++-+-+-+ 4.0 0.0 15 0 0 0 4 0
6 love's FI re HEATS wa TER wa TER cools NOT love -+-+-+-+-+- ++-++-+-+++ 9.0 0.0 11 2 5 2 0 0

7277 rows × 8 columns

prosodic's People

Contributors

bhicks2 avatar evgenykochetkov avatar falkj avatar maldil avatar memduhg avatar quadrismegistus avatar ssb22 avatar tomaarsen avatar zouharvi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prosodic's Issues

Error on entering string: 'str' object has no attribute 'decode'

[0.0s] prosodic:en$ "test"
Traceback (most recent call last):
File "prosodic.py", line 197, in
text=input(msg).strip().decode('utf-8',errors='ignore')
AttributeError: 'str' object has no attribute 'decode'

When running py prosodic.py and typing anything, the above error occurs (Windows 10, Python 3.8.0)

loadConfigPy is not defined.

I am using python27 to run the program.
I am facing the following issue
File "prosodic.py", line 23, in
config=loadConfigPy(toprint=toprintconfig,dir_prosodic=dir_prosodic)
NameError: name 'loadConfigPy' is not defined

Can't install it

pip install git+git://github.com/quadrismegistus/prosodic.
git
Collecting git+git://github.com/quadrismegistus/prosodic.git
  Cloning git://github.com/quadrismegistus/prosodic.git to c:\users\xxx\appdata\local\temp\pip-req-build-s0719i
ak
  Running command git clone -q git://github.com/quadrismegistus/prosodic.git 'C:\Users\xxx\AppData\Local\Temp\p
ip-req-build-s0719iak'
  Running command git submodule update --init --recursive -q
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\xxx\pycharmprojects\prosodicstresses\venv\scripts\python.exe' -c 'import sys, setuptool
s, tokenize; sys.argv[0] = '"'"'C:\\Users\\xx\\AppData\\Local\\Temp\\pip-req-build-s0719iak\\setup.py'"'"'; __
file__='"'"'C:\\Users\\xx\\AppData\\Local\\Temp\\pip-req-build-s0719iak\\setup.py'"'"';f=getattr(tokenize, '"'
"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __f
ile__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\xx\AppData\Local\Temp\pip-pip-egg-info-66a5w1hj'
         cwd: C:\Users\xx\AppData\Local\Temp\pip-req-build-s0719iak\
    Complete output (9 lines):
    C:\Users\xx\AppData\Local\Temp\pip-req-build-s0719iak\setup.py:20: DeprecationWarning: the imp module is d
eprecated in favour of importlib; see the module's documentation for alternative uses
      import sys,os,imp
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\xx\AppData\Local\Temp\pip-req-build-s0719iak\setup.py", line 39, in <module>
        long_description = fh.read()
      File "C:\Users\xxx\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 11140: character maps to <undefined>
    ----------------------------------------
WARNING: Discarding git+git://github.com/quadrismegistus/prosodic.git. Command errored out with exit status 1: py
thon setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

adding language: Esperanto

I'm trying to adapt prosodic to Esperanto: its stress is always paroxytonic abelo (en. bee) [a.'be.lo] but in poetry there can be elision and the word would become oxytonic abel'

Esperanto is as phonematic as Finnish, so I decided to use the orth feature, but I'm puzzled in LANG_stress.py because I don't understand its code :( Could you help me? I want to use prosodic for my MA research.

Anything I can do to help? Analyse and include more words?

I just came across this application the other week and have found it enormously helpful in my own writing.

It's a wonderful piece of kit and should contribute significantly to a lot of really fun poetry.

I noticed that there are a few words that I have tried to process that are not being properly analysed so I was wondering if I might be able to help with that?

For instance, beauty is one and another is unfurls.

Have you considered a non-command line interface for use in program libraries?

This is really cool, but I want to scan bulk-lines of text for iambic pentameter and there doesn't seem to be a convenient way to do this. I can hack this by importing and calling functions, but I was wondering if you considered creating an interface suitable for importing into other projects and easily called for commands?

expected string or buffer

trying to parse this line .
t = p.Text("""As in a dark beginning of all things, A mute featureless semblance of the Unknown Repeating for ever the unconscious act, Prolonging for ever the unseeing will, Cradled the cosmic drowse of ignorant Force Whose moved creative slumber kindles the suns And carries our lives in its somnambulist whirl.""")
I get the following error 'TypeError: expected string or buffer'

Phoneme /g/ is represented by different characters in words from `./dicts/en/english.tsv` and words transcribed using TTS

>>> import prosodic as p
>>> text = p.Text("google good")
000001  google                  P:'ɡʉː.ɡʌl                              S:PU    W:HH
000002  good                    P:'gʊd                                  S:P     W:H
>>> text.ents(cls='Word')[0].children[0]
<Syllable.goo> ['ɡʉː]
>>> text.ents(cls='Word')[0].children[0].children[0].onset
<Onset> [ɡ]
>>> text.ents(cls='Word')[0].children[0].children[0].onset.children[0]
ɡ
>>> text.ents(cls='Word')[0].children[0].children[0].onset.children[0].feats
{}
>>> text.ents(cls='Word')[1].children[0].children[0].onset.children[0]
g
>>> text.ents(cls='Word')[1].children[0].children[0].onset.children[0].feats
{'approx': False, 'cons': True, 'son': False, 'syll': False, 'constr': False, 'spread': False, 'voice': True, 'long': None, 'cont_acoust': False, 'cont_artic': False, 'delrel': False, 'lat': False, 'nas': False, 'strid': False, 'tap': False, 'trill': False, 'coronal': False, 'dorsal': True, 'labial': False, 'labiodental': False, 'ant': False, 'dist': False, 'back': True, 'front': None, 'high': True, 'low': False, 'tense': None, 'round': False}

g in "good" is represented by a regular 'g' character (U+0067) and correctly loads features from ./lib/ipa.py

ɡs in "google" are represented by latin small letter script g (U+0261), and as a result has no feats

Example output files

Can you provide some example (non-trivial) output files? Such as the long-form poem in the readme. Even better if you can link/directly load the file in the readme.

Elisions in poetry

In historical English poetry syllables are often elided:

sweet as love, which overflows her bower
--> with|MU|sic|SWEET|as|LOVE|which|OV|er|FLOWS|her|BOW'R

scattering unbeholden
--> SCAT|tring|UN|be|HOLD|en

How can we account for this? Eliding syllables that, phonetically, end ɛː?

scattering P:'skæ.tɛː.ɪŋ S:PUU W:LHH
tower P:'taʊ.ɛː S:PU W:HH
showers P:'ʃaʊ.ɛːz S:PU W:HH
curious P:'kjʊ.riː.əs S:PUU W:LHH
wondering P:'wʌn.dɛː.ɪŋ S:PUU W:HHH

Syllable `token` doesn't always match syllable phonemes

Here are some examples from Shakespeare's sonnet 1:

>>> import prosodic as p
>>> from_first_sonnet = p.Text("thereby beauty's self-substantial cruel within niggarding glutton")
>>> for w in from_first_sonnet.ents(cls='Word'): print(w.children)
[<Syllable.the> ['ðɛr], <Syllable.reby> ['baɪ]]
[<Syllable.bea> ['bjʉː], <Syllable.uty's> [tɪz]]
[<Syllable.self-self> ['sɛlf], <Syllable.-> [sʌb], <Syllable.subs> ['stæn], <Syllable.tantial> [ʃʌl]]
[<Syllable.cr> ['kruː], <Syllable.uel> [əl]]
[<Syllable.wit> [], <Syllable.hin> ['ðɪn]]
[<Syllable.nig> ['], <Syllable.gar> [ɡʌ], <Syllable.ding> [dɪŋ]]
[<Syllable.glut> ['ɡlʌ], <Syllable.ton> [tʌn]]

AttributeError: module 'regex' has no attribute 'Pattern'

Hi, I'm getting this error on my Macbook Pro, can you help me?

Python 3.7.2 (v3.7.2:9a3ffc0492, Dec 24 2018, 02:44:43)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import prosodic as p
text = p.Text("Shall I compare thee to a summer's day?")
Traceback (most recent call last):
File "", line 1, in
File "/Users/ale/Desktop/prosodic/prosodic/lib/Text.py", line 85, in init
self.init_text(lines)
File "/Users/ale/Desktop/prosodic/prosodic/lib/Text.py", line 319, in init_text
newwords=self.dict.get(tok,stress_ambiguity=self.stress_ambiguity)
File "/Users/ale/Desktop/prosodic/prosodic/lib/Dictionary.py", line 551, in get
words=self.getprep(word,config=self.config)
File "/Users/ale/Desktop/prosodic/prosodic/dicts/en/english.py", line 100, in get
sylls_text = syllabify_orth(token,num_sylls=num_sylls)
File "/Users/ale/Desktop/prosodic/prosodic/dicts/en/english.py", line 354, in syllabify_orth
return syllabify_orth_with_nltk(token,num_sylls=num_sylls)
File "/Users/ale/Desktop/prosodic/prosodic/dicts/en/english.py", line 337, in syllabify_orth_with_nltk
from nltk.tokenize import SyllableTokenizer
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nltk/init.py", line 137, in
from nltk.text import *
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nltk/text.py", line 29, in
from nltk.tokenize import sent_tokenize
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nltk/tokenize/init.py", line 65, in
from nltk.tokenize.casual import TweetTokenizer, casual_tokenize
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nltk/tokenize/casual.py", line 272, in
class TweetTokenizer:
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nltk/tokenize/casual.py", line 357, in TweetTokenizer
def WORD_RE(self) -> regex.Pattern:
AttributeError: module 'regex' has no attribute 'Pattern'

SyntaxError: invalid syntax

Hi,
I've just installed prosodic on a lubuntu with python3 (but I tried also with python 2.7) and I renamed config_default.py to config.py
But every time I run "prosodic", I get this:

File "/usr/local/bin/prosodic", line 5
cmd = f'python {path_to_prosodic_py} {argstr}'
^
SyntaxError: invalid syntax

Is there anything I can do to fix it?
Note: the ^ is under the ' at the end of {argstr}'

Unable to run app

I went through the installation steps, and it says to run this command:

pip install git+https://github.com/quadrismegistus/prosodic@develop

It looks like the develop branch no longer exists, so that doesn't work. I tried:

pip install git+https://github.com/quadrismegistus/prosodic

And that seems to work correctly and I see the prosodic and prosodic-2.0.0.dev1.dist-info folders in my python site-packages. But when I run prosodic I get this:

zsh: command not found: prosodic

Any ideas? Should I be installing it from a different branch? Or running that command from somewhere specific?

Thanks!

Font needed

Hello,
Can you tell me what font I must download to avoid this?
IMG_20201023_143433

Cannot Get Prosodic to Run

C:\Users\cplio>py -m prosodic
Traceback (most recent call last):
File "C:\Users\cplio\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 188, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "C:\Users\cplio\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 147, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "C:\Users\cplio\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 111, in get_module_details
import(pkg_name)
File "C:\Users\cplio\AppData\Local\Programs\Python\Python39\lib\site-packages\prosodic_init
.py", line 17, in
from tools import *
File "C:\Users\cplio\AppData\Local\Programs\Python\Python39\lib\site-packages\prosodic\lib\tools.py", line 77
print ">> loaded settings:"
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(">> loaded settings:")?

Support for Unicode

Hi @quadrismegistus! Really love this library. I am wondering if there is support for Unicode text (or if I am doing something wrong). I am doing this in Python 2.7:

import prosodic as p

# input_text is some string
text = p.Text(input_text)
text.parse()

I get this error when putting in this poem as text:

[2018-11-12 11:14:39,062: ERROR/ForkPoolWorker-6] raised unexpected: UnicodeDecodeError('ascii', 'O|no|IT|is|AN|ev|ER-|fix\xc3\xa8d|MARK', 24, 25, 'ordinal not in range(128)')
Traceback (most recent call last):

...

  line 16, in parseText
    text.parse()
  File "/usr/local/lib/python2.7/site-packages/prosodic/lib/Text.py", line 484, in parse
    ent.scansion(meter=meter,conscious=True)
  File "/usr/local/lib/python2.7/site-packages/prosodic/lib/Line.py", line 137, in scansion
    self.om("\t".join( [unicode(x) for x in [makeminlength(unicode(self),config['linelen']), makeminlength(unicode(bp) if bp else '', config['linelen']),meterstr,len(self.allParses(meter)),count,lowestScore,str_ot] ] ),conscious=conscious)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)

Can't install, prosodic/tagged_samples is missing

I'm on Arch Linux using Python 3.9 and can't install prosodic. I get an error that prosodic/tagged_samples is missing while the wheel is being built.

Maybe it needs to be added to MANIFEST.in?

At first I thought this might be related to #20, but the levenshtein package installs without issue for me.

python-Levenshtein can't build wheel

Just a heads up: I cannot install prosodic because pip fails to build a wheel for python-Levenshtein.

I looks like multiple people have this problem, but nothing is being done about it: https://github.com/ztane/python-Levenshtein/issues

Is there a chance you could switch to a different implementation of Levenshtein distance?

Cheers.

Cannot install prosodic ; incompatible with updated pip

I get the following error when trying to install prosodic via pip install git+git://github.com/quadrismegistus/prosodic.git

ERROR: prosodic==1.5.0 did not indicate that it installed an .egg-info directory. Only setup.py projects generating .egg-info directories are supported.

I believe this is due to upgraded pip ; I see a similar issue on another repo : https://github.com/oracle/Skater/issues/292

I'm using pip 20.2.3 and python 3.8.5

cannot install. problem with imp?

using:
py -m pip install git+https://github.com/quadrismegistus/prosodic.git

receiving this error:
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "C:\Users\tarci\AppData\Local\Temp\pip-req-build-66pyz14l\setup.py", line 20, in <module>
import sys,os,imp
ModuleNotFoundError: No module named 'imp'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`

Please update lexconvert

Hi, I notice you're using a 6-year-old version of lexconvert. You might like to update it to the upstream repo, as I remember something about finding a couple of bugs in the conversion table which I fixed as I refactored it into a better data structure. Unfortunately I seem to have misplaced my notes on what exactly those bugs were, but if you update you should get a better result.

isIambic is incorrect

The isIambic function is just looking at the first two stresses which may not be consistent across the whole line. Furthermore, it seems inconsistent to have the iambic meter but not trochaic, anapaestic and dactylic. I could rewrite them in a more general way because I'll have some wrapper for it in my application anyway. However, I am not sure if these kinds of PR are welcomed in this repo and whether someone has time to merge them.

def isIambic(self):
    if len(self.positions) < 2:
        return None
    else:
        return self.positions[0].meterVal == 'w' and self.positions[1].meterVal == 's'

Syllable feature('prom.stress') not the same as capitalization in parse?

Hi,

I noticed something odd.

I am parsing the en.alliteration.txt with this code:

import sys
import prosodic as p
#t = p.Text('Shall I compare thee')
t = p.Text('../corpora/corppoetry_en/en.alliteration.txt')
t.parse()
for p in t.bestParses():
        print('PARSE: ', p)
        meter_list = []
        if p is not None:
                ws = p.words()
                print('WORDS: ', ws)
                for w in ws:
                        ss = w.syllables()
                        print('SYLLABLES:', ss)
                        for s in ss:
                                f = s.feature('prom.stress')
                                print(f)
                                if f < .5:
                                        meter_list.append('-')
                                else:
                                        meter_list.append('+')
        print(meter_list)

But it seems that in the output, the character capitalization of the /PARSE does not match the assigned prominence values in the syllable features. See for example the values of 'THE' and 'BLES'.

PARSE:  from|THE|o|RI|gi.nal|COM|mon.ger|MA|nic|LANGUAG|e
WORDS:  [<Word.from> [fr\u028cm], <Word.the> [ð\u0259], <Word.original> [\u025b\u02d0.'\u026a.\u02a4\u0259.n\u0259l], <Word.common> ['k\u0251.m\u0259n], <Word.germanic> [\u02a4\u025b\u02d0.'mæ.n\u026ak], <Word.language> ['læ\u014b.gw\u0259\u02a4]]
SYLLABLES: [<Syllable.from> [fr\u028cm]]
0.0
SYLLABLES: [<Syllable.the> [ð\u0259]]
0.0
SYLLABLES: [<Syllable.o> [\u025b\u02d0], <Syllable.ri> ['\u026a], <Syllable.gi> [\u02a4\u0259], <Syllable.nal> [n\u0259l]]
0.0
1.0
0.0
0.0
SYLLABLES: [<Syllable.com> ['k\u0251], <Syllable.mon> [m\u0259n]]
1.0
0.0
SYLLABLES: [<Syllable.Ger> [\u02a4\u025b\u02d0], <Syllable.ma> ['mæ], <Syllable.nic> [n\u026ak]]
0.0
1.0
0.0
SYLLABLES: [<Syllable.languag> ['læ\u014b], <Syllable.e> [gw\u0259\u02a4]]
1.0
0.0
['-', '-', '-', '+', '-', '-', '+', '-', '-', '+', '-', '+', '-']
PARSE:  MA|ny|STRESSED.SYL|la|BLES|were|LOST
WORDS:  [<Word.many> ['m\u025b.ni\u02d0], <Word.stressed> ['str\u025bst], <Word.syllables> ['s\u026a.l\u0259.b\u0259lz], <Word.were> [w\u025b\u02d0], <Word.lost> ['l\u0254\u02d0st]]
SYLLABLES: [<Syllable.ma> ['m\u025b], <Syllable.ny> [ni\u02d0]]
1.0
0.0
SYLLABLES: [<Syllable.stressed> ['str\u025bst]]
1.0
SYLLABLES: [<Syllable.syl> ['s\u026a], <Syllable.la> [l\u0259], <Syllable.bles> [b\u0259lz]]
1.0
0.0
0.0
SYLLABLES: [<Syllable.were> [w\u025b\u02d0]]
0.0
SYLLABLES: [<Syllable.lost> ['l\u0254\u02d0st]]
1.0
['+', '-', '+', '+', '-', '-', '-', '+']

My bet is that the capitalization values of the /parse are correct, but the syllable feature goes by the CMU lexicon?

Python 3 version

This library is so cool but it would be even cooler to have a python 3 version of that + possible interfaces. As in another issue mentioned it can be hacked by input/output functions, but I have difficulties in using it under python 3 and 2to3 doesn't seem to work well on this.

I wonder if there is any plan to port into Python 3?

Method to get phonetic transcription

Thank for your amazing repo!
But do you have any function to get the phonetic transcription which is split syllable in the show() method?

I mean the text.show() return nothing, it just prints out the phonetic transcription and stress, I hope to have a function to take splitted phonetic.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.