Coder Social home page Coder Social logo

koleslena / vidyut Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ambuda-org/vidyut

0.0 0.0 0.0 756 KB

Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.

Shell 0.21% JavaScript 0.94% Python 0.08% Rust 96.99% PowerShell 0.16% Makefile 0.67% HTML 0.96%

vidyut's Introduction

विद्युत्

Vidyut provides reliable infrastructure for Sanskrit software. Our main focus is on building libraries for natural language processing.

Vidyut compiles to fast, safe, and memory-efficient native code, and it can be bound to other programming languages with minimal work. We commit to providing first-class support for Python bindings through vidyut-py, and we are eager to help you create bindings for your language of choice.

Vidyut is an ambitious and transformative project, and you can help us make it a success. If you simply want to join our community of Sanskrit enthusiasts, see the Community section -- we are very friendly and welcome members of all backgrounds. For specific details on how you can contribute, see the Contributing section instead.

Vidyut is under active development as part of the Ambuda project and is published under the MIT license.

Build status

Contents

Installation

Vidyut is meant for programmers who are building Sanskrit software. If you are not comfortable writing software or using tools like a command line interface, we recommend that you use the tools on Ambuda instead.

We currently offer two ways to use Vidyut:

Through Python

We provide first-class support for Python through the vidyut Python package, which we define in the vidyut-py repo. If you have Python installed on your machine, you can install Vidyut as follows.

$ pip install vidyut

Through Rust

Vidyut is implemented in Rust, which provides low-level control with high-level ergonomics. You can install Rust on your computer by following the instructions here.

Once you've installed Rust, you can try cloning the Vidyut repo and running our tests:

$ git clone https://github.com/ambuda-org/vidyut.git
$ cd vidyut
$ make test

Your first build will likely take a few minutes, but future builds will be much faster.

Next, we recommend creating and collecting our rich linguistic data:

$ make create_all_data

This command will take several minutes, but most users will not need to re-run this command after the first run completes.

To learn how to navigate this repo, see the Components section. For details on how to get involved, see the Contributing section.

Components

Vidyut contains several standard components for common Sanskrit processing tasks. These components work together well, but you can also use them independently depending on your use case.

In Rust, components of this kind are called crates.

vidyut-cheda segments Sanskrit expressions into words then annotates those words with their morphological data. Our segmenter is optimized for real-time and interactive usage: it is fast, low-memory, and capably handles pathological input.

For details, see the vidyut-cheda README.

vidyut-kosha defines a key-value store that can compactly map tens of millions of Sanskrit words to their inflectional data. Depending on the application, storage costs can be as low as 1 byte per word. This storage efficiency comes at the cost of increased lookup time, but in practice, we have found that this increase is negligible and well worth the efficiency gains elsewhere.

For details, see the vidyut-kosha README.

vidyut-prakriya generates Sanskrit words with their prakriyās (derivations) according to the rules of Paninian grammar. Our long-term goal is to provide a complete implementation of the Ashtadhyayi.

For details, see the vidyut-prakriya README.

vidyut-sandhi contains various utilities for working with sandhi changes between words. It is fast, simple, and appropriate for most use cases.

For details, see the vidyut-sandhi README.

Documentation

To view documentation for all crates (including private modules and structs), run make docs. This command will generate Rust's standard documentation and open it in your default web browser.

Contributing

Vidyut is an ambitious and tranformative project, and you can help us build it. Depending on your background and skills, there are different ways you can contribute.

First, we recommend joining our community so that you can follow along with progress on Ambuda and Vidyut and participate in discussions around them.

If you use a tool that depends on Vidyut, please file GitHub issues when you see errors or surprising behavior. Please also feel free to file issues for feature requests. We'll do our best to accommodate them.

If you know Sanskrit, please give us detailed feedback on any mistakes you see and what you think the correction should be. This kind of work is especially valuable for vidyut-prakriya.

If you can program, we encourage you to learn some Rust and get involved with Vidyut directly. We encourage you to be bold and make pull requests for work that you think will improve the project. Or if you would like some pointers on where to get started, you can explore the issues in our issue tracker. All of our open work items are listed there, and we encourage you to create a PR for any open issue. Issues tagged with sanskrit require some basic familiarity with Sanskrit, and issues tagged with vyakarana require a much deeper level of Sanskrit grammatical knowledge.

If you are familiar with machine learning as well, we are always eager for improvements to vidyut-cheda. Our current model use simple bigram statistics; there is plenty of room to improve!

If you want to pursue an open-ended research project, here are the components we are most excited about:

  • dependency parsing and anvaya generation
  • search indexing that accounts for sandhi and Sanskrit's complex morphology.
  • transliteration, perhaps through a port of Aksharamukha
  • meter recognition
  • support for Vedic Sanskrit
  • implementations of non-Paninian grammars

And if there's something else you're excited about, please let us know about it -- we'll probably be excited about it too!

Community

If you're excited about our work on Vidyut, we would love to have you join our community.

  • Most of our conversation occurs on Ambuda's Discord server on the #vidyut channel, where you can chat directly with our team and get fast answers to your questions. We also schedule time to spend together virtually, usually on a weekly frequency.

  • Occasional discussion related to Vidyut might also appear on ambuda-discuss or on standard mailing lists like sanskrit-programmers.

  • You can also follow along with project announcements on ambuda-announce.

बलमिति विद्युति

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.