Coder Social home page Coder Social logo

evalsan's Introduction

Official code for the paper "Evaluating Neural Word Embeddings for Sanskrit".

EvalSan: Evaluation Toolkit for Sanskrit Embeddings

SanEval is a toolkit for evaluating the quality of Sanskrit embeddings. We assess their generalization power by using them as features on a broad and diverse set of tasks. We include a suite of 4 intrinsic tasks which evaluate on what linguistic properties are encoded in word embeddings. Our goal is to ease the study and the development of general-purpose fixed-size word representations for Sanskrit.

Dependencies

This code is written in python. The dependencies are:

  • Python 3.6
pip install -r requirements.txt

Evaluation tasks

Intrinsic tasks

  • SanEval includes a series of Intrinsic tasks to evaluate what linguistic properties are encoded in your word embeddings.
  • We use SLP1 transliteration scheme for our data. You can change it to another scheme using this code.
Task Metric #dev #test
Relatedness F-score 4.5k 9k
Similarity Accuracy na 3k
Categorization Syntactic Purity na 1.1k
Categorization Semantic Purity na 150
Analogy Syntactic Accuracy na 10k
Analogy Semantic Accuracy na 6.4k

Pretrained models

  • You can download the pretrained models from this link. README.md is given for each model.
  • Place the models folder in the parent directory path.
  • Pretrained vectors can be downloaded from this link. Place this folder in EvalSan/evaluations/Intrinsic/ path. This vectors are being used in evaluation script.

How to train the models

Please refer to the models folder for more details.

bash train_embeddings.sh

How to run evaluation

To evaluate your word embeddings, run the following command:

bash run_SanEval.sh

Citation

If you use our tool, we'd appreciate if you cite the following paper:

@inproceedings{sandhan-etal-2023-evaluating,
    title = "Evaluating Neural Word Embeddings for {S}anskrit",
    author = "Sandhan, Jivnesh  and
      Paranjay, Om Adideva  and
      Digumarthi, Komal  and
      Behra, Laxmidhar  and
      Goyal, Pawan",
    booktitle = "Proceedings of the Computational {S}anskrit {\&} Digital Humanities: Selected papers presented at the 18th World {S}anskrit Conference",
    month = jan,
    year = "2023",
    address = "Canberra, Australia (Online mode)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.wsc-csdh.2",
    pages = "21--37",
}

License

This project is licensed under the terms of the Apache license 2.0.

evalsan's People

Contributors

jivnesh avatar

Stargazers

Kirtivardhan Singh avatar

Watchers

James Cloos avatar  avatar

evalsan's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.