Coder Social home page Coder Social logo

peterdekker / prediction-histling Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 8.42 MB

Notebook accompanying paper "Word prediction in computational historical linguistics"

Home Page: https://doi.org/10.15398/jlm.v8i2.268

License: GNU General Public License v3.0

Python 86.32% Jupyter Notebook 12.52% Batchfile 0.63% Shell 0.53%
notebook historical-linguistics prediction computational-linguistics cognate cognates neural-networks deep-learning machine-learning

prediction-histling's Introduction

Word Prediction in Computational Historical Linguistics

This is a Jupyter notebook and Python library to demonstrate the use of word prediction using deep learning as an aid in historical linguistics. This notebook accompanies the following paper: Dekker, P., & Zuidema, W. (2021). Word Prediction in Computational Historical Linguistics. Journal of Language Modelling, 8(2), 295โ€“336. The results yielded by this demonstrational notebook may differ somewhat from the results in the article.

Any questions or problems?

Installation

Linux/Mac

  • Please first install Python 3, pip, Python venv and development hearders for libxml2, libz and libopenblas via your package management system. For GPU support, also install pygpu and headers for libgpuarray. E.g. for Ubuntu:
sudo apt install python3-pip python3-venv libxml2-dev libopenblas-dev libz-dev python3-pygpu libgpuarray-dev
  • Open a terminal and move to the directory where this README is located.

  • Now, run the install script, as a normal user (without sudo):

    ./install.sh
    

    If permission is denied, issue the following command once:

    sudo chmod +x install.sh
    

    and then run the install script.

  • Every time you want to run the notebook, run the run.sh script as a normal user (without sudo):

    ./run.sh
    

    A browser window will open. Now, click the notebook: Word prediction in computational historical linguistics.ipynb. The first time you use it, pick the kernel ph-env from menu Kernel > Change kernel > env.

Windows (experimental)

Running this notebook on Windows is not fully tested. If you run into any problems, file an issue.

  • Open a command prompt (Windows key + R, then issue "cmd").
  • Change the directory where this README is located:
cd PREDICTION-HISTLING\DIRECTORY
  • If you don't have Python yet, install it now:
python_install.bat
  • Close the command prompt after this (required!)

Now we're ready to install our notebook:

  • Open a command prompt (again: Windows key + R, then type "cmd").
  • Change to the directory where this README is located:
  • Invoke the install script:
install.bat

Every time you would like to run the notebook, invoke our run script:

  • Open a command prompt (Windows key + R, then issue cmd).
  • Change to the directory
  • Invoke the run script:
run.bat
  • A browser window will open. Now, click the notebook: Word prediction in computational historical linguistics.ipynb. The first time you use it, pick the kernel ph-env from menu Kernel > Change kernel > env.

Thanks to Mathieu Fannee for describing the steps of running Python virtual environments on Windows.

prediction-histling's People

Contributors

peterdekker avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

prediction-histling's Issues

Possibility to import any cldf dataset

Give filename/url as argument to data loading function, instead of corpus name which is hardcoded.
This assumes a fixed (CLDF) format of any dataset, and maybe an intermediate data object, which can then be given as train and valtest argument.

Make LexStat cognate detection optional?

Because of the long time it takes. If you do not perform it, prediction algorithms wil not be able to use a val/test set which has been filtered on cognacy. Or they have to be filtered using a edit distance cutoff

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.