Coder Social home page Coder Social logo

anceps's Introduction

Anceps - a Latin scansion tool

Anceps

About

Anceps is a tool for computer-assisted scansion of Latin poetry. It can automatically scan the majority of verses in any Latin meter and can greatly speed up the process of manual scansion of difficult lines. Additionally, it can identify scansions that are inconsistent with the historical period during which the given text was written and is otherwise helpful for identifying metrical anomalies. Anceps is being developed by Sasha Fedchin (Bard College, Tufts University, Quantitative Criticism Lab).

Anceps differs from existing scansion tools in several respects:

  1. Using meter as a constraint. Many scansion tools begin by using a POS tagger to macronize all the words in a verse and only then proceed to scanning the verse. By contrast, anceps uses meter as a constraint by considering all the possible ways the words in a verse might be macronized that lead to the verse being consistent with the meter. Hence, anceps's performance does not depend on the quality of the POS tagger but only on the dictionary of natural vowel quantities which all scansion tools must employ. The downside is that anceps cannot be used to macronize prose.

  2. Using period- and author-specific frequency-based dictionaries. The traditional approach to automated scansion is to use a dictionary that lists all theoretically possible ways a word can be macronized. For this purpose, anceps uses the dictionary based on queries to Morpheus which was created by Johan Winge for his scansion tool. However, theory-guided dictionaries like this one are not always reliable because they do not specify how frequent a particular scansion is. Consider patris, the genetive of pater. The a in patris can be either long or short depending on the author and the period. Poets of the golden age (Horace, Ovid, Vergil) were more likely to scan a short, but some of the later poets employed both long and short scansions interchangeably (Statius, Lucan), or even preferred the long scansion (Silius Italicus). To account for differences like these, anceps can be used with a frequency-based dictionary that for each form specifies how many times it was seen in the works of each author. Such dictionaries can be built by parsing the scansions of hexameters and pentameters made with Pedecerto and available on the MusisQue DeoQue database. With a dictionary like this, anceps can pinpoint unusual scansions and assign confidence scores for each scanned verse.

  3. Command-line interface for manual scansion. Anceps was originally designed to scan lines of trimeter, which is inherently more difficult than scanning hexameter or pentameter. A substantial portion of the lines (around 10%) must be scanned manually because the scansion can depend on such things as whether an adjective is in Nom. n. Pl. or Abl. f. Sg. No parser would be able to differentiate between the two forms with full certainty, but a human can! Hence, anceps provides an easy-to-use command-line interface that promts the user to select the correct scansion whenever the program is unsure.

  4. Configuring for use with any meter. All that is needed to configure anceps for use with a new meter is to add a one- or two- line definition of that meter to meter.py. For instance, anceps can be easily configured for scansion of Correr's or Dati's trimeter, which is different from that of Seneca.

Documentation

Below is the description of how anceps can be run and configured. New features may be added in the future.

Scansion

To scan a text containing one verse of poetry per line, run the following command:

python -m src.scan.scan fileToScan outputFile meter -manual_file=fileWithManualScansions -dictionary=MqDqDictionary

For instance, to scan an excerpt from Seneca's Agamemnon included in this repository, run:

python -m src.scan.scan data/texts/Agamemnon.txt data/fullScansions/Agamemnon.json trimeter -manual_file=data/manualScansions/Agamemnon.txt -dictionary=data/MqDqMacrons.json

There are various argument that can be passed to scan.py. For example, you can use the --interactive flag to allow the program to promt the user to select the correct scansion when the program is uncertain. To see the full list of available flags and arguments, run scan.py with the -h flag.

Anceps can be easily extended for use with any meter. Consider the following lines, which is all one needs to add to meter.py to configure anceps for scansion of hexameter, pentamerer, and elegiacs:

SPONDEE = LONG + LONG
DACTYL = LONG + SHORT + SHORT
H_FOOT = (SPONDEE, DACTYL)  # defines a foot of hexameter
H_FINAL_FOOT = (LONG + UNK, )  # final foot (final syllable can be either long or short)

# one can optionally replace the fifth H_FOOT with [DACTYL]:
HEXAMETER = Meter((H_FOOT, H_FOOT, H_FOOT, H_FOOT, H_FOOT, H_FINAL_FOOT), "hexameter")
PENTAMETER = Meter((H_FOOT, H_FOOT, [LONG], H_FOOT, H_FOOT, [UNK]), "pentameter")

Meter.METERS["elegiacs"] = (HEXAMETER, PENTAMETER)

Creating an MqDq-based dictionary

To create an MqDq-based dictionary, you first have to download MusisQue DeoQue on your local machine. This can be done by running scraping.py. You can either download the database in full or specify a particular set of authors you wish to download as shown below:

python -m src.mqdq.scraping mqdq -dir=data/MqDq/ -authors Vergilius Horatius

The full set of authors currently available on MqDq can be obtained by running the following command:

python -m src.mqdq.scraping mqdq --list-all-authors

After downloading the texts on your local machine, run dictionary.py to create a dictionary based on these texts:

python -m src.mqdq.dictionary data/MqDq/ data/MqdqMacrons.json

Dependencies and Versions

Anceps should be run with python3 with the following packages installed: tqdm, requests, selenium

Firefox is also required as a driver that selenium can use to download MqDq data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.