quadrismegistus / prosodic Goto Github PK

Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.

License: GNU General Public License v3.0

Python 5.66% Shell 0.01% Jupyter Notebook 0.31% CSS 2.93% JavaScript 90.93% HTML 0.09% SCSS 0.07%

metrical-parser linguistics nlp finnish-language-analysis poetry rhythm

prosodic's Introduction

Prosodic

Prosodic is a metrical-phonological parser written in Python. Currently, it can parse English and Finnish text, but adding additional languages is easy with a pronunciation dictionary or a custom python function. Prosodic was built by Ryan Heuser, Josh Falk, and Arto Anttila. Josh also maintains another repository, in which he has rewritten the part of this project that does phonetic transcription for English and Finnish. Sam Bowman has contributed to the codebase as well, adding several new metrical constraints.

This version, "Prosodic 2", is a near-total rewrite of the original Prosodic.

Supports Python>=3.8.

Install

1. Install python package

For now, pip-install directly from github:

pip install git+https://github.com/quadrismegistus/prosodic

2. Install espeak

Install espeak, free text-to-speak (TTS) software, to ‘sound out’ unknown words.

Mac: brew install espeak. (First install homebrew if not already installed.)
Linux: apt-get install espeak
Windows: Download and install from http://espeak.sourceforge.net/download.html.

Usage

Web app

Prosodic has a new GUI (graphical user interface) in a web app. After installing, run:

prosodic

Then navigate to http://127.0.0.1:5000/. It should look like this:

Python

Read texts

# import prosodic
import prosodic

# load a text
sonnet = prosodic.Text("""
Those hours, that with gentle work did frame
The lovely gaze where every eye doth dwell,
Will play the tyrants to the very same
And that unfair which fairly doth excel;
For never-resting time leads summer on
To hideous winter, and confounds him there;
Sap checked with frost, and lusty leaves quite gone,
Beauty o’er-snowed and bareness every where:
Then were not summer’s distillation left,
A liquid prisoner pent in walls of glass,
Beauty’s effect with beauty were bereft,
Nor it, nor no remembrance what it was:
But flowers distill’d, though they with winter meet,
Leese but their show; their substance still lives sweet.
""")

# can also load by filename
shaksonnets = prosodic.Text(fn='corpora/corppoetry_en/en.shakespeare.txt')

Stanzas, lines, words, syllables, phonemes

Texts in prosodic are organized into a tree structure. The .children of a Text object is a list of Stanza's, whose .parent objects point back to the Text. In turn, in each stanza's .children is a list of Line's, whose .parent's point back to the stanza; so on down the tree.

# Take a peek at this tree structure 
# and the features particular entities have
sonnet.show(maxlines=30, incl_phons=True)

Text()
|   Stanza(num=1)
|       Line(num=1, txt='Those hours, that with gentle work did frame')
|           WordToken(num=1, txt='Those', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='Those', lang='en', num_forms=1)
|                   WordForm(num=1, txt='Those')
|                       Syllable(ipa='ðoʊz', num=1, txt='Those', is_stressed=False, is_heavy=True)
|                           Phoneme(num=1, txt='ð', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='o', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=1, round=1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|           WordToken(num=2, txt=' hours', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='hours', lang='en', num_forms=2)
|                   WordForm(num=1, txt='hours')
|                       Syllable(ipa="'aʊ", num=1, txt='ho', is_stressed=True, is_heavy=True, is_strong=True, is_weak=False)
|                           Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                       Syllable(ipa='ɛːz', num=2, txt='urs', is_stressed=False, is_heavy=True, is_strong=False, is_weak=True)
|                           Phoneme(num=2, txt='ɛː', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=-1, long=1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                   WordForm(num=2, txt='hours')
|                       Syllable(ipa="'aʊrz", num=1, txt='hours', is_stressed=True, is_heavy=True)
|                           Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='r', syl=-1, son=1, cons=1, cont=1, delrel=0, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=0, lo=0, back=0, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|           WordToken(num=3, txt=',', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt=',', lang='en', num_forms=0, is_punc=True)
|           WordToken(num=4, txt=' that', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='that', lang='en', num_forms=3)

# take a peek at it in dataframe form
sonnet.df   # by-syllable dataframe representation
sonnet      # ...which will also be shown when text object displayed (in a notebook)

												word_num_forms	syll_is_stressed	syll_is_heavy	syll_is_strong	syll_is_weak	word_is_punc
stanza_num	line_num	line_txt	sent_num	sentpart_num	wordtoken_num	wordtoken_txt	word_lang	wordform_num	syll_num	syll_txt	syll_ipa
1	1	Those hours, that with gentle work did frame	1	1	1	Those	en	1	1	Those	ðoʊz	1	0	1
					2	hours	en	1	1	ho	'aʊ	2	1	1	1	0
								1	2	urs	ɛːz	2	0	1	0	1
								2	1	hours	'aʊrz	2	1	1
					3	,	en	0	0			0					1
	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
	14	Leese but their show; their substance still lives sweet.	1	1	7	substance	en	1	2	tance	stəns	1	0	1	0	1
					8	still	en	1	1	still	'stɪl	1	1	1
					9	lives	en	1	1	lives	'lɪvz	1	1	1
					10	sweet	en	1	1	sweet	'swiːt	1	1	1
					11	.	en	0	0			0					1

195 rows × 6 columns

# you can loop over this directly if you want
for stanza in shaksonnets.stanzas:
    for line in sonnet:
        for wordtoken in line:
            for wordtype in wordtoken:
                for wordform in wordtype:
                    for syllable in wordform:
                        for phoneme in syllable:
                            # ...
                            pass

# or directly access components
print(f'''
Shakespeare's sonnets have:
  * {len(shaksonnets.stanzas):,} "stanzas"        (in this text, each one a sonnet)
  * {len(shaksonnets.lines):,} lines
  * {len(shaksonnets.wordtokens):,} wordtokens    (including punctuation)
  * {len(shaksonnets.wordtypes):,} wordtypes     (each token has one wordtype object)
  * {len(shaksonnets.wordforms):,} wordforms     (a word + IPA pronunciation; no punctuation)
  * {len(shaksonnets.syllables):,} syllables
  * {len(shaksonnets.phonemes):,} phonemes
''')

Shakespeare's sonnets have:
  * 154 "stanzas"        (in this text, each one a sonnet)
  * 2,155 lines
  * 20,317 wordtokens    (including punctuation)
  * 20,317 wordtypes     (each token has one wordtype object)
  * 17,601 wordforms     (a word + IPA pronunciation; no punctuation)
  * 21,915 syllables
  * 63,614 phonemes

# access lines

# text.line{num} will return text.lines[num-1]
assert sonnet.line1 is sonnet.lines[0]
assert sonnet.line10 is sonnet.lines[9]

# show the line
sonnet.line1

											word_num_forms	syll_is_stressed	syll_is_heavy	syll_is_strong	syll_is_weak	word_is_punc
line_num	line_txt	sent_num	sentpart_num	wordtoken_num	wordtoken_txt	word_lang	wordform_num	syll_num	syll_txt	syll_ipa
1	Those hours, that with gentle work did frame	1	1	1	Those	en	1	1	Those	ðoʊz	1	0	1
				2	hours	en	1	1	ho	'aʊ	2	1	1	1	0
							1	2	urs	ɛːz	2	0	1	0	1
							2	1	hours	'aʊrz	2	1	1
				3	,	en	0	0			0					1
				...	...	...	...	...	...	...	...	...	...	...	...	...
				6	gentle	en	1	2	tle	təl	1	0	1	0	1
				7	work	en	1	1	work	'wɛːk	1	1	1
				8	did	en	1	1	did	dɪd	2	0	1
				8	did	en	2	1	did	'dɪd	2	1	1
				9	frame	en	1	1	frame	'freɪm	1	1	1

15 rows × 6 columns

# build lines directly
line_from_richardIII = prosodic.Line('A horse, a horse, my kingdom for a horse!')
line_from_richardIII

�[34m�[1mtokenizing�[0m�[36m @ 2023-12-15 14:14:17,991�[0m
�[34m�[1m⎿ 0 seconds�[0m�[36m @ 2023-12-15 14:14:17,992�[0m

										word_num_forms	syll_is_stressed	syll_is_heavy	word_is_punc	syll_is_strong	syll_is_weak
line_txt	sent_num	sentpart_num	wordtoken_num	wordtoken_txt	word_lang	wordform_num	syll_num	syll_txt	syll_ipa
A horse, a horse, my kingdom for a horse!	1	1	1	A	en	1	1	A	eɪ	1	0	1
			2	horse	en	1	1	horse	'hɔːrs	1	1	1
			3	,	en	0	0			0			1
			4	a	en	1	1	a	eɪ	1	0	1
			5	horse	en	1	1	horse	'hɔːrs	1	1	1
			...	...	...	...	...	...	...	...	...	...	...	...	...
			8	kingdom	en	1	2	dom	dəm	1	0	1		0	1
			9	for	en	1	1	for	fɔːr	1	0	1
			10	a	en	1	1	a	eɪ	1	0	1
			11	horse	en	1	1	horse	'hɔːrs	1	1	1
			12	!	en	0	0			0			1

13 rows × 6 columns

Metrical parsing

Parsing lines

# parse with default options by just reaching for best parse
plausible_parses = line_from_richardIII.parse()
plausible_parses

					parse_score	parse_is_bounded	meterpos_num_slots	*w_peak	*w_stress	*s_unstress	*unres_across	*unres_within
line_txt	parse_rank	parse_txt	parse_meter	parse_stress
A horse, a horse, my kingdom for a horse!	1	a HORSE a HORSE my KING dom FOR a HORSE	-+-+-+-+-+	-+-+-+---+	1.0	0.0	10	0	0	1	0	0

# see best parse
line_from_richardIII.best_parse

A horse a horse my kingdom for a horse

⎿ Parse(rank=1, meter='-+-+-+-+-+', stress='-+-+-+---+', score=1, is_bounded=0)

# parse with different options
diff_parses = line_from_richardIII.parse(constraints=('w_peak','s_unstress'))
diff_parses

					parse_score	parse_is_bounded	meterpos_num_slots	*w_peak	*s_unstress
line_txt	parse_rank	parse_txt	parse_meter	parse_stress
A horse, a horse, my kingdom for a horse!	1	a HORSE a HORSE my KING dom FOR a HORSE	-+-+-+-+-+	-+-+-+---+	1.0	0.0	10	0	1
	2	a HORSE a HORSE my KING dom FOR a.horse	-+-+-+-+--	-+-+-+---+	1.0	0.0	12	0	1
	3	a HORSE a HORSE my KING dom.for A horse	-+-+-+--+-	-+-+-+---+	1.0	0.0	12	0	1
	4	a HORSE a HORSE my KING dom.for A.HORSE	-+-+-+--++	-+-+-+---+	1.0	0.0	14	0	1
	5	a HORSE a HORSE my KING.DOM for.a HORSE	-+-+-++--+	-+-+-+---+	1.0	0.0	14	0	1
	6	a HORSE a HORSE my KING dom FOR.A horse	-+-+-+-++-	-+-+-+---+	2.0	0.0	12	0	2

Parsing texts

# small texts
sonnet.parse()

�[34m�[1mparsing 14 lines [5x]�[0m�[36m @ 2023-12-15 14:17:43,563�[0m
�[1;34m￨ stanza 01, line 14: LEESE but.their SHOW their SUBS tance STILL lives SWEET: 100%|�[0;36m██████████�[0;36m| 14/14 [00:00<00:00, 45.78it/s]
�[34m�[1m⎿ 0.3 seconds�[0m�[36m @ 2023-12-15 14:17:43,873�[0m

							parse_score	parse_is_bounded	meterpos_num_slots	*w_peak	*w_stress	*s_unstress	*unres_across	*unres_within
stanza_num	line_num	line_txt	parse_rank	parse_txt	parse_meter	parse_stress
1	1	Those hours, that with gentle work did frame	1	those HO urs THAT with GEN tle WORK did FRAME	-+-+-+-+-+	-+-+-+-+-+	0.0	0.0	10	0	0	0	0	0
			2	those HOURS that.with GEN tle WORK did FRAME	-+--+-+-+	-+--+-+-+	0.0	0.0	11	0	0	0	0	0
			3	those HOURS that.with GEN tle WORK did FRAME	-+--+-+-+	-+--+-+-+	0.0	0.0	11	0	0	0	0	0
	2	The lovely gaze where every eye doth dwell,	1	the LO vely GAZE where E very EYE doth DWELL	-+-+-+-+-+	-+-+-+-+-+	0.0	0.0	10	0	0	0	0	0
	2	The lovely gaze where every eye doth dwell,	2	the LO vely GAZE where E ve.ry EYE doth DWELL	-+-+-+--+-+	-+-+-+--+-+	1.0	0.0	13	0	0	0	0	1
	...	...	...	...	...	...	...	...	...	...	...	...	...	...
	13	But flowers distill'd, though they with winter meet,	1	but FLO wers DIS.TILL'D though THEY with WIN ter MEET	-+-++-+-+-+	-+--+-+-+-+	2.0	0.0	13	0	0	1	0	1
			2	but FLO wers.dis TILL'D though THEY with WIN ter MEET	-+--+-+-+-+	-+--+-+-+-+	2.0	0.0	13	0	0	0	2	0
			3	but FLO.WERS dis TILL'D though THEY with WIN ter MEET	-++-+-+-+-+	-+--+-+-+-+	2.0	0.0	13	0	0	1	0	1
			4	but FLO wers DIS till'd THOUGH they.with WIN ter MEET	-+-+-+--+-+	-+--+---+-+	4.0	0.0	13	1	1	2	0	0
	14	Leese but their show; their substance still lives sweet.	1	LEESE but.their SHOW their SUBS tance STILL lives SWEET	+--+-+-+-+	+--+-+-+++	1.0	0.0	12	0	1	0	0	0

37 rows × 8 columns

# and big texts
shaksonnets.parse()

�[34m�[1mparsing 2155 lines [5x]�[0m�[36m @ 2023-12-15 14:17:52,124�[0m
�[1;34m￨ stanza 154, line 14: love's FI re HEATS.WA ter WA ter COOLS not LOVE       : 100%|�[0;36m██████████�[0;36m| 2155/2155 [00:56<00:00, 38.03it/s]
�[34m�[1m⎿ 57.4 seconds�[0m�[36m @ 2023-12-15 14:18:49,496�[0m

							parse_score	parse_is_bounded	meterpos_num_slots	*w_peak	*w_stress	*s_unstress	*unres_across	*unres_within
stanza_num	line_num	line_txt	parse_rank	parse_txt	parse_meter	parse_stress
1	1	FROM fairest creatures we desire increase,	1	from FAI rest CREA tures WE de SIRE in CREASE	-+-+-+-+-+	-+-+-+-+-+	0.0	0.0	10	0	0	0	0	0
			2	from FAI rest CREA tures WE de SI re IN crease	-+-+-+-+-+-	-+-+-+-+-++	1.0	0.0	11	0	1	0	0	0
			3	from FAI rest CREA tures WE de SI re IN.CREASE	-+-+-+-+-++	-+-+-+-+-++	1.0	0.0	13	0	0	0	0	1
			4	from FAI rest CREA tures WE de SI re.in CREASE	-+-+-+-+--+	-+-+-+-+--+	2.0	0.0	13	0	0	0	2	0
	2	That thereby beauty's rose might never die,	1	that THE reby BEA uty's ROSE might NE ver DIE	-+-+-+-+-+	-+++-+-+-+	1.0	0.0	10	0	1	0	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
154	14	Love's fire heats water, water cools not love.	2	love's FI re HEATS wa.ter WA ter COOLS not LOVE	-+-+--+-+-+	++-++-+-+-+	4.0	0.0	13	1	2	0	0	1
			3	love's FI.RE heats WA ter WA ter COOLS not LOVE	-++-+-+-+-+	++-++-+-+-+	4.0	0.0	13	0	2	1	0	1
			4	LOVE'S fire HEATS wa.ter WA ter COOLS not LOVE	+-+--+-+-+	++++-+-+-+	4.0	0.0	12	1	2	0	0	1
			5	LOVE'S.FI re HEATS.WA ter WA ter COOLS not LOVE	++-++-+-+-+	++-++-+-+-+	4.0	0.0	15	0	0	0	4	0
			6	love's FI re HEATS wa TER wa TER cools NOT love	-+-+-+-+-+-	++-++-+-+++	9.0	0.0	11	2	5	2	0	0

7277 rows × 8 columns

prosodic's People

Contributors

Stargazers

Watchers

Forkers

mecathcart sleepinyourhat marchdown frnsys quifor cretic kinguistics maksymdelta bhicks2 pbaljeka dharuarlo brianhie fergusq 007v bclinthall jediwarpraptor andrewchidden songys ryan-kasi kislerdm muleina roininja maximromanov memduhg tomaarsen tadklimp sebastiaanver afeinberg liaoweixia mathigatti bradc591 nedjunk kkibria yschohere ebell495 maldil zouharvi evgenykochetkov ahonydasein hmcgovern

prosodic's Issues

Error on entering string: 'str' object has no attribute 'decode'

[0.0s] prosodic:en$ "test"
Traceback (most recent call last):
File "prosodic.py", line 197, in
text=input(msg).strip().decode('utf-8',errors='ignore')
AttributeError: 'str' object has no attribute 'decode'

When running py prosodic.py and typing anything, the above error occurs (Windows 10, Python 3.8.0)

Does anyone know how to just get the poem in it's stressed/unstressed form?

I know when you call parse with a Text object you get the entire report on it. I was wondering how I could just get the one section where the poem is written out with certain words stressed and unstressed.

Thanks!

loadConfigPy is not defined.

I am using python27 to run the program.
I am facing the following issue
File "prosodic.py", line 23, in
config=loadConfigPy(toprint=toprintconfig,dir_prosodic=dir_prosodic)
NameError: name 'loadConfigPy' is not defined

Can't install it

pip install git+git://github.com/quadrismegistus/prosodic.
git
Collecting git+git://github.com/quadrismegistus/prosodic.git
  Cloning git://github.com/quadrismegistus/prosodic.git to c:\users\xxx\appdata\local\temp\pip-req-build-s0719i
ak
  Running command git clone -q git://github.com/quadrismegistus/prosodic.git 'C:\Users\xxx\AppData\Local\Temp\p
ip-req-build-s0719iak'
  Running command git submodule update --init --recursive -q
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\xxx\pycharmprojects\prosodicstresses\venv\scripts\python.exe' -c 'import sys, setuptool
s, tokenize; sys.argv[0] = '"'"'C:\\Users\\xx\\AppData\\Local\\Temp\\pip-req-build-s0719iak\\setup.py'"'"'; __
file__='"'"'C:\\Users\\xx\\AppData\\Local\\Temp\\pip-req-build-s0719iak\\setup.py'"'"';f=getattr(tokenize, '"'
"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __f
ile__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\xx\AppData\Local\Temp\pip-pip-egg-info-66a5w1hj'
         cwd: C:\Users\xx\AppData\Local\Temp\pip-req-build-s0719iak\
    Complete output (9 lines):
    C:\Users\xx\AppData\Local\Temp\pip-req-build-s0719iak\setup.py:20: DeprecationWarning: the imp module is d
eprecated in favour of importlib; see the module's documentation for alternative uses
      import sys,os,imp
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\xx\AppData\Local\Temp\pip-req-build-s0719iak\setup.py", line 39, in <module>
        long_description = fh.read()
      File "C:\Users\xxx\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 11140: character maps to <undefined>
    ----------------------------------------
WARNING: Discarding git+git://github.com/quadrismegistus/prosodic.git. Command errored out with exit status 1: py
thon setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

even when espeak is installed, prosodic is unable to run on windows

Even if you use the source forge link to install espeak, you still get the error on windows that you are missing espeak. FYI espeak on windows does not have the file libespeak.dylib or libespeak.so

Issues with English syllabifier

List of problems:

"eer" sounds:

sincerest: sɪn.'seɪ.ʌ.ɹʌs

adding language: Esperanto

I'm trying to adapt prosodic to Esperanto: its stress is always paroxytonic abelo (en. bee) [a.'be.lo] but in poetry there can be elision and the word would become oxytonic abel'

Esperanto is as phonematic as Finnish, so I decided to use the orth feature, but I'm puzzled in LANG_stress.py because I don't understand its code :( Could you help me? I want to use prosodic for my MA research.

Anything I can do to help? Analyse and include more words?

I just came across this application the other week and have found it enormously helpful in my own writing.

It's a wonderful piece of kit and should contribute significantly to a lot of really fun poetry.

I noticed that there are a few words that I have tried to process that are not being properly analysed so I was wondering if I might be able to help with that?

For instance, beauty is one and another is unfurls.

Have you considered a non-command line interface for use in program libraries?

This is really cool, but I want to scan bulk-lines of text for iambic pentameter and there doesn't seem to be a convenient way to do this. I can hack this by importing and calling functions, but I was wondering if you considered creating an interface suitable for importing into other projects and easily called for commands?

Is there a way to get the straight text syllables instead of the phonetic ones?

I need to input: "Even know the ocean is blue" and get "E-ven know the o-cean is blue".

expected string or buffer

trying to parse this line .
t = p.Text("""As in a dark beginning of all things, A mute featureless semblance of the Unknown Repeating for ever the unconscious act, Prolonging for ever the unseeing will, Cradled the cosmic drowse of ignorant Force Whose moved creative slumber kindles the suns And carries our lives in its somnambulist whirl.""")
I get the following error 'TypeError: expected string or buffer'

Phoneme /g/ is represented by different characters in words from `./dicts/en/english.tsv` and words transcribed using TTS

>>> import prosodic as p
>>> text = p.Text("google good")
000001  google                  P:'ɡʉː.ɡʌl                              S:PU    W:HH
000002  good                    P:'gʊd                                  S:P     W:H
>>> text.ents(cls='Word')[0].children[0]
<Syllable.goo> ['ɡʉː]
>>> text.ents(cls='Word')[0].children[0].children[0].onset
<Onset> [ɡ]
>>> text.ents(cls='Word')[0].children[0].children[0].onset.children[0]
ɡ
>>> text.ents(cls='Word')[0].children[0].children[0].onset.children[0].feats
{}
>>> text.ents(cls='Word')[1].children[0].children[0].onset.children[0]
g
>>> text.ents(cls='Word')[1].children[0].children[0].onset.children[0].feats
{'approx': False, 'cons': True, 'son': False, 'syll': False, 'constr': False, 'spread': False, 'voice': True, 'long': None, 'cont_acoust': False, 'cont_artic': False, 'delrel': False, 'lat': False, 'nas': False, 'strid': False, 'tap': False, 'trill': False, 'coronal': False, 'dorsal': True, 'labial': False, 'labiodental': False, 'ant': False, 'dist': False, 'back': True, 'front': None, 'high': True, 'low': False, 'tense': None, 'round': False}

g in "good" is represented by a regular 'g' character (U+0067) and correctly loads features from ./lib/ipa.py

ɡs in "google" are represented by latin small letter script g (U+0261), and as a result has no feats

Example output files

Can you provide some example (non-trivial) output files? Such as the long-form poem in the readme. Even better if you can link/directly load the file in the readme.

Elisions in poetry

In historical English poetry syllables are often elided:

sweet as love, which overflows her bower
--> with|MU|sic|SWEET|as|LOVE|which|OV|er|FLOWS|her|BOW'R

scattering unbeholden
--> SCAT|tring|UN|be|HOLD|en

How can we account for this? Eliding syllables that, phonetically, end ɛː?

scattering P:'skæ.tɛː.ɪŋ S:PUU W:LHH
tower P:'taʊ.ɛː S:PU W:HH
showers P:'ʃaʊ.ɛːz S:PU W:HH
curious P:'kjʊ.riː.əs S:PUU W:LHH
wondering P:'wʌn.dɛː.ɪŋ S:PUU W:HHH

Syllable `token` doesn't always match syllable phonemes

Here are some examples from Shakespeare's sonnet 1:

>>> import prosodic as p
>>> from_first_sonnet = p.Text("thereby beauty's self-substantial cruel within niggarding glutton")
>>> for w in from_first_sonnet.ents(cls='Word'): print(w.children)
[<Syllable.the> ['ðɛr], <Syllable.reby> ['baɪ]]
[<Syllable.bea> ['bjʉː], <Syllable.uty's> [tɪz]]
[<Syllable.self-self> ['sɛlf], <Syllable.-> [sʌb], <Syllable.subs> ['stæn], <Syllable.tantial> [ʃʌl]]
[<Syllable.cr> ['kruː], <Syllable.uel> [əl]]
[<Syllable.wit> [wɪ], <Syllable.hin> ['ðɪn]]
[<Syllable.nig> ['nɪ], <Syllable.gar> [ɡʌ], <Syllable.ding> [dɪŋ]]
[<Syllable.glut> ['ɡlʌ], <Syllable.ton> [tʌn]]

AttributeError: module 'regex' has no attribute 'Pattern'

Hi, I'm getting this error on my Macbook Pro, can you help me?

Python 3.7.2 (v3.7.2:9a3ffc0492, Dec 24 2018, 02:44:43)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import prosodic as p
text = p.Text("Shall I compare thee to a summer's day?")
Traceback (most recent call last):
File "", line 1, in
File "/Users/ale/Desktop/prosodic/prosodic/lib/Text.py", line 85, in init
self.init_text(lines)
File "/Users/ale/Desktop/prosodic/prosodic/lib/Text.py", line 319, in init_text
newwords=self.dict.get(tok,stress_ambiguity=self.stress_ambiguity)
File "/Users/ale/Desktop/prosodic/prosodic/lib/Dictionary.py", line 551, in get
words=self.getprep(word,config=self.config)
File "/Users/ale/Desktop/prosodic/prosodic/dicts/en/english.py", line 100, in get
sylls_text = syllabify_orth(token,num_sylls=num_sylls)
File "/Users/ale/Desktop/prosodic/prosodic/dicts/en/english.py", line 354, in syllabify_orth
return syllabify_orth_with_nltk(token,num_sylls=num_sylls)
File "/Users/ale/Desktop/prosodic/prosodic/dicts/en/english.py", line 337, in syllabify_orth_with_nltk
from nltk.tokenize import SyllableTokenizer
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nltk/init.py", line 137, in
from nltk.text import *
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nltk/text.py", line 29, in
from nltk.tokenize import sent_tokenize
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nltk/tokenize/init.py", line 65, in
from nltk.tokenize.casual import TweetTokenizer, casual_tokenize
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nltk/tokenize/casual.py", line 272, in
class TweetTokenizer:
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nltk/tokenize/casual.py", line 357, in TweetTokenizer
def WORD_RE(self) -> regex.Pattern:
AttributeError: module 'regex' has no attribute 'Pattern'

Importing prosodic doesn't prevent stdout

Hello, when I follow the examples to use it as a module the python code still produces the output, is it by design or am I doing something wrong?

Are there meters for Trochaic, Anapestic, or Dactylic?

I can only find iambic meters in the meters folder and I saw that the readme had

I was wondering if there was a way to access these meters to parse the poem with the given meter. All I've seen are iambic meters.

SyntaxError: invalid syntax

Hi,
I've just installed prosodic on a lubuntu with python3 (but I tried also with python 2.7) and I renamed config_default.py to config.py
But every time I run "prosodic", I get this:

File "/usr/local/bin/prosodic", line 5
cmd = f'python {path_to_prosodic_py} {argstr}'
^
SyntaxError: invalid syntax

Is there anything I can do to fix it?
Note: the ^ is under the ' at the end of {argstr}'

Allow mutliple config.txt's to facilitate working with different meters

It would be nice to have, e.g., config_Shakespeare.txt and config_Hopkins.txt, either of which could be called at initialization of Prosodic.

Unable to run app

I went through the installation steps, and it says to run this command:

pip install git+https://github.com/quadrismegistus/prosodic@develop

It looks like the develop branch no longer exists, so that doesn't work. I tried:

pip install git+https://github.com/quadrismegistus/prosodic

And that seems to work correctly and I see the prosodic and prosodic-2.0.0.dev1.dist-info folders in my python site-packages. But when I run prosodic I get this:

zsh: command not found: prosodic

Any ideas? Should I be installing it from a different branch? Or running that command from somewhere specific?

Thanks!

Font needed

Hello,
Can you tell me what font I must download to avoid this?

Cannot Get Prosodic to Run

C:\Users\cplio>py -m prosodic
Traceback (most recent call last):
File "C:\Users\cplio\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 188, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "C:\Users\cplio\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 147, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "C:\Users\cplio\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 111, in get_module_details
import(pkg_name)
File "C:\Users\cplio\AppData\Local\Programs\Python\Python39\lib\site-packages\prosodic_init.py", line 17, in
from tools import *
File "C:\Users\cplio\AppData\Local\Programs\Python\Python39\lib\site-packages\prosodic\lib\tools.py", line 77
print ">> loaded settings:"
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(">> loaded settings:")?

Support for Unicode

Hi @quadrismegistus! Really love this library. I am wondering if there is support for Unicode text (or if I am doing something wrong). I am doing this in Python 2.7:

import prosodic as p

# input_text is some string
text = p.Text(input_text)
text.parse()

I get this error when putting in this poem as text:

[2018-11-12 11:14:39,062: ERROR/ForkPoolWorker-6] raised unexpected: UnicodeDecodeError('ascii', 'O|no|IT|is|AN|ev|ER-|fix\xc3\xa8d|MARK', 24, 25, 'ordinal not in range(128)')
Traceback (most recent call last):

...

  line 16, in parseText
    text.parse()
  File "/usr/local/lib/python2.7/site-packages/prosodic/lib/Text.py", line 484, in parse
    ent.scansion(meter=meter,conscious=True)
  File "/usr/local/lib/python2.7/site-packages/prosodic/lib/Line.py", line 137, in scansion
    self.om("\t".join( [unicode(x) for x in [makeminlength(unicode(self),config['linelen']), makeminlength(unicode(bp) if bp else '', config['linelen']),meterstr,len(self.allParses(meter)),count,lowestScore,str_ot] ] ),conscious=conscious)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)

Can't install, prosodic/tagged_samples is missing

I'm on Arch Linux using Python 3.9 and can't install prosodic. I get an error that prosodic/tagged_samples is missing while the wheel is being built.

Maybe it needs to be added to MANIFEST.in?

At first I thought this might be related to #20, but the levenshtein package installs without issue for me.

python-Levenshtein can't build wheel

Just a heads up: I cannot install prosodic because pip fails to build a wheel for python-Levenshtein.

I looks like multiple people have this problem, but nothing is being done about it: https://github.com/ztane/python-Levenshtein/issues

Is there a chance you could switch to a different implementation of Levenshtein distance?

Cheers.

Cannot install prosodic ; incompatible with updated pip

I get the following error when trying to install prosodic via pip install git+git://github.com/quadrismegistus/prosodic.git

ERROR: prosodic==1.5.0 did not indicate that it installed an .egg-info directory. Only setup.py projects generating .egg-info directories are supported.

I believe this is due to upgraded pip ; I see a similar issue on another repo : https://github.com/oracle/Skater/issues/292

I'm using pip 20.2.3 and python 3.8.5

cannot install. problem with imp?

using:
py -m pip install git+https://github.com/quadrismegistus/prosodic.git

receiving this error:
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "C:\Users\tarci\AppData\Local\Temp\pip-req-build-66pyz14l\setup.py", line 20, in <module>
import sys,os,imp
ModuleNotFoundError: No module named 'imp'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`

Please update lexconvert

Hi, I notice you're using a 6-year-old version of lexconvert. You might like to update it to the upstream repo, as I remember something about finding a couple of bugs in the conversion table which I fixed as I refactored it into a better data structure. Unfortunately I seem to have misplaced my notes on what exactly those bugs were, but if you update you should get a better result.

detect whether a line is keeping or breaking meter?

What command would give the sequence of weak and strong stresses for a line, so that I could compare it to a given meter and see whether or not the line has that meter or is violating it?

isIambic is incorrect

The isIambic function is just looking at the first two stresses which may not be consistent across the whole line. Furthermore, it seems inconsistent to have the iambic meter but not trochaic, anapaestic and dactylic. I could rewrite them in a more general way because I'll have some wrapper for it in my application anyway. However, I am not sure if these kinds of PR are welcomed in this repo and whether someone has time to merge them.

def isIambic(self):
    if len(self.positions) < 2:
        return None
    else:
        return self.positions[0].meterVal == 'w' and self.positions[1].meterVal == 's'

Syllable feature('prom.stress') not the same as capitalization in parse?

Hi,

I noticed something odd.

I am parsing the en.alliteration.txt with this code:

import sys
import prosodic as p
#t = p.Text('Shall I compare thee')
t = p.Text('../corpora/corppoetry_en/en.alliteration.txt')
t.parse()
for p in t.bestParses():
        print('PARSE: ', p)
        meter_list = []
        if p is not None:
                ws = p.words()
                print('WORDS: ', ws)
                for w in ws:
                        ss = w.syllables()
                        print('SYLLABLES:', ss)
                        for s in ss:
                                f = s.feature('prom.stress')
                                print(f)
                                if f < .5:
                                        meter_list.append('-')
                                else:
                                        meter_list.append('+')
        print(meter_list)

But it seems that in the output, the character capitalization of the /PARSE does not match the assigned prominence values in the syllable features. See for example the values of 'THE' and 'BLES'.

PARSE:  from|THE|o|RI|gi.nal|COM|mon.ger|MA|nic|LANGUAG|e
WORDS:  [<Word.from> [fr\u028cm], <Word.the> [ð\u0259], <Word.original> [\u025b\u02d0.'\u026a.\u02a4\u0259.n\u0259l], <Word.common> ['k\u0251.m\u0259n], <Word.germanic> [\u02a4\u025b\u02d0.'mæ.n\u026ak], <Word.language> ['læ\u014b.gw\u0259\u02a4]]
SYLLABLES: [<Syllable.from> [fr\u028cm]]
0.0
SYLLABLES: [<Syllable.the> [ð\u0259]]
0.0
SYLLABLES: [<Syllable.o> [\u025b\u02d0], <Syllable.ri> ['\u026a], <Syllable.gi> [\u02a4\u0259], <Syllable.nal> [n\u0259l]]
0.0
1.0
0.0
0.0
SYLLABLES: [<Syllable.com> ['k\u0251], <Syllable.mon> [m\u0259n]]
1.0
0.0
SYLLABLES: [<Syllable.Ger> [\u02a4\u025b\u02d0], <Syllable.ma> ['mæ], <Syllable.nic> [n\u026ak]]
0.0
1.0
0.0
SYLLABLES: [<Syllable.languag> ['læ\u014b], <Syllable.e> [gw\u0259\u02a4]]
1.0
0.0
['-', '-', '-', '+', '-', '-', '+', '-', '-', '+', '-', '+', '-']
PARSE:  MA|ny|STRESSED.SYL|la|BLES|were|LOST
WORDS:  [<Word.many> ['m\u025b.ni\u02d0], <Word.stressed> ['str\u025bst], <Word.syllables> ['s\u026a.l\u0259.b\u0259lz], <Word.were> [w\u025b\u02d0], <Word.lost> ['l\u0254\u02d0st]]
SYLLABLES: [<Syllable.ma> ['m\u025b], <Syllable.ny> [ni\u02d0]]
1.0
0.0
SYLLABLES: [<Syllable.stressed> ['str\u025bst]]
1.0
SYLLABLES: [<Syllable.syl> ['s\u026a], <Syllable.la> [l\u0259], <Syllable.bles> [b\u0259lz]]
1.0
0.0
0.0
SYLLABLES: [<Syllable.were> [w\u025b\u02d0]]
0.0
SYLLABLES: [<Syllable.lost> ['l\u0254\u02d0st]]
1.0
['+', '-', '+', '+', '-', '-', '-', '+']

My bet is that the capitalization values of the /parse are correct, but the syllable feature goes by the CMU lexicon?

Python 3 version

This library is so cool but it would be even cooler to have a python 3 version of that + possible interfaces. As in another issue mentioned it can be hacked by input/output functions, but I have difficulties in using it under python 3 and 2to3 doesn't seem to work well on this.

I wonder if there is any plan to port into Python 3?

Method to get phonetic transcription

Thank for your amazing repo!
But do you have any function to get the phonetic transcription which is split syllable in the show() method?

I mean the text.show() return nothing, it just prints out the phonetic transcription and stress, I hope to have a function to take splitted phonetic.