Coder Social home page Coder Social logo

chrisdrymon / cl-mishmash Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 1.0 44 KB

Non-Machine Learning Computational Linguistic Projects

Python 100.00%
new-testament computational-linguistics ancient-greek greek greek-new-testament bible bible-study linguistics corpus-linguistics biblical-studies

cl-mishmash's Introduction

Computational-Linguistics

Non-Machine Learning Computational Linguistic Projects

ArtDistance.py This script records the position of heads relative to their articles.

Details: This script reads every Perseus or PROIEL treebank (in XML format) from an input directory (in my case '/home/chris/Desktop/KoineTB') and creates a file "ArtDistance.csv" in the output directory (in my case 'home/chris/Desktop') which pairs the position of a head relative to its article with the frequency of that occurrence. If, for instance, the output csv file is...

-2, 20 1, 900 4, 18

...this means that over the course of all the treebanks, a head occurred two words before its article 20 times, that a head occurred one word after its article 900 times, and that a head occurred four words after its article 18 times.

Perseus treebanks give every word a sentence a unique sequential id. PROIEL treebanks give every word in the entire treebank a unique sequential id. So this script works by simply subtracting the head's id number from the article's id number. Unfortunately, the count is not yet 100% accurate because Perseus treebanks count punctuation as a word. PROIEL treebanks sometimes provide ellipsed words and give them unusual id numbers. So if an article points to an ellipsed word it might think the head is 10,000 words away. Furthermore, I think there may be some slight inconsistencies within the tagging of articles that can cause some problems. I'm working on solutions to all these.

cl-mishmash's People

Contributors

chrisdrymon avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

octokas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.