Coder Social home page Coder Social logo

ighina / latin-ise-wsd Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 49.38 MB

A large scale automatic analysis of selected lemmas sense change across centuries based on the Latin-ISE corpus and the original BERT-based word sense disambiguation system by Bamman et al. (2020)

License: MIT License

Shell 0.01% Python 0.31% Jupyter Notebook 99.69%

latin-ise-wsd's Introduction

Latin WSD experiments

This repository contains the code and the preliminary results of word sense disambiguation experiments in Latin.

The repository itself is a pseudo-fork of the original LatinBERT repository and it closely follows the word sense disambiguation setting of Bamman and Burns (2020), while using their BERT model pre-trained on latin corpora as described in the original paper.

Compared with Bamman and Burns (2020), we used all macro-senses from the Lewis and Short Latin-English dictionary. We ran the algorithm on the LatinISE corpus (McGillivray and Kilgarriff 2013).

B. McGillivray, A. Kilgarriff, Tools for historical corpus research, and a corpus of Latin, in: P. Bennett, M. Durrell, S. Scheible, R. J. Whitt (Eds.), New Methods in Historical Corpus Linguistics, Narr, Tübingen, 2013.

Install

Tested on Python 3.8.12 and 3.7.12.

1.) Create a conda environment (optional):

conda create --name latinbert python=3
conda activate latinbert

2.) Install PyTorch according to your own system requirements (GPU vs. CPU, CUDA version): https://pytorch.org.

3.) Install the remaining libraries:

pip install -r requirements.txt

4.) Install Latin tokenizer models:

python3 -c "from cltk.data.fetch import FetchCorpus; corpus_downloader = FetchCorpus(language='lat');corpus_downloader.import_corpus('lat_models_cltk')"

Use

All the code related to the word sense disambiguation analysis is contained in the wsd folder of this same repository, while this parent directory contains the pre-trained LatinBERT and other utils functions. See https://github.com/Ighina/Latin-ISE-WSD/tree/master/wsd#readme for details.

latin-ise-wsd's People

Contributors

ighina avatar barbaramcg avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.