Coder Social home page Coder Social logo

vipranarayan14 / aksharas Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 2.0 617 KB

An utility for analysing akṣaras and varṇas in an Devanagari text.

License: MIT License

JavaScript 6.93% TypeScript 93.07%
character-counter characters devanagari indic-languages indic-scripts sanskrit sanskrit-language syllabification syllables

aksharas's Introduction

Aksharas

npm (scoped) npm type definitions NPM

Aksharas is an utility for analysing akṣaras and varṇas in a Devanagari text.

Installation

npm i @vipran/aksharas

Usage

import Aksharas from "@vipran/aksharas";

// OR for CommonJS:
// const Aksharas = require("@vipran/aksharas").default;

const input = "सर्वे भवन्तु सुखिनः।"

const results = Aksharas.analyse(input);

const aksharas = results.aksharas.map(akshara => akshara.value);

console.log(aksharas); // "स", "र्वे", "भ", "व", "न्तु", "सु", "खि", "नः"

API

Aksharas.analyse()

Accepts a string input and returns a Results object.

const input: string = 'नमः';
const results: Results = Aksharas.analyse(input);

Aksharas.TokenType

It is an enum with the following values:

  • TokenType.Akshara
  • TokenType.Symbol
  • TokenType.Whitespace
  • TokenType.Invalid
  • TokenType.Unrecognised

These can be used to filter the tokens in the Results object. Example:

import Aksharas from "@vipran/aksharas";
// OR import Aksharas, { TokenType } ...

const input = "हे! हरेऽत्र नागच्छ।";
const results = Aksharas.analyse(input);
const symbols = results.all
  .filter((token) => token.type === Aksharas.TokenType.Symbol)
  .map((token) => token.value);

console.log(symbols); // "ऽ", "।"

Aksharas.VarnaType

It is an enum with the following values:

  • VarnaType.Svara
  • VarnaType.Vyanjana

These can be used to filter the varnas in Results.varnas. Example:

import Aksharas from "@vipran/aksharas";
// OR import Aksharas, { VarnaType } ...

const input = "गुरुः";
const results = Aksharas.analyse(input);

const svaras = results.varnas
  .filter((varna) => varna.type === Aksharas.VarnaType.Svara)
  .map((varna) => varna.value);

console.log(svaras); // "उ", "उः"

Results

The Results object contains the following properties:

  • all
    • type: Token[]
    • An array of Token objects containing all the tokens analysed from input string. It includes Devanagari akṣaras, Devanagari symbols (१, २, ।, ॥, etc.) and non-devangari characters (i.e. characters in other scripts, special characters, whitespace characters, etc.)
  • aksharas
    • type: Token[]
    • Devanagari syllables like रा, सी, etc. Here, halanta consonants such as क्, च्, य्, etc. are also considered as aksharas when they are at the end of a word.
  • varnas
    • type: Varna[]
    • Devanagari consonants and vowels in the input. (Only in v0.4.0 or above.)
  • symbols
    • type: Token[]
    • Devanagari symbols such as १, २, ।, ॥, etc.
  • whitespaces
    • type: Token[]
    • All whitespace characters: \s, \t, \n, etc.
  • invalid
    • type: Token[]
    • All Devanagari characters whose occurance in the input string do not conform to the definition of an akṣara. For example, a virāma or a vowel mark which is not preceded by a consonant is invalid. ("अ्", "गोु", etc.)
  • unrecognised
    • type: Token[]
    • Non-devangari characters (i.e. characters in other scripts and special characters such as @, #, etc.)
  • chars
    • type: string[]
    • All Unicode characters in the input string. Same as String.prototype.split().

Token

Many of the properties in the Results object consists of an array of Token-s. A Token object has the following properties:

  • type
  • value
    • type: string
    • Conatins an analysed part of the input string.
  • from
    • type: number
    • From index - representing the start position of the token in the input string.
  • to
    • type: number
    • To index - representing the end position of the token in the input string.
  • attributes
    • type: Record<string, any>
    • An optional key-value object which may contain other attributes of the token. It is currently used only in the Akshara tokens for storing the varnas in that akshara.

Varna

Results.varnas consists of an array of Varna objects. A Varna object has the following properties:

  • type
  • value
    • type: string
    • Conatins an analysed part of the input string.

License

MIT © Prasanna Venkatesh T S

aksharas's People

Contributors

vipranarayan14 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

aksharas's Issues

Using aksharas to build a utility for comparing text

Hello. We are reaching out for building a utility that can compare texts for Indic languages, in particular Kannada. The functionality should approximate the working of this online tool https://countwordsfree.com/comparetexts. At this moment our understanding is that character counting as implemented in aksharas would be a prerequisite. In addition, we have been using aksharas-web for plain character counting in Kannada text by transliterating to Devanagari. We would like to build an interface for Kannada as well. Hoping to hear your thoughts on it, and on the GitHub workflow to take this forward.

Consider "ॐ" as an akshara

Example

Input text: "ॐ"

Current behaviour

"ॐ" is considered a symbol (TokenType.Symbol).

Expected behaviour

It should be considered an akshara (TokenType.Akshara). It should be treated as equivalent to "ओम्". It should have varnasLength as 2 (and varnas as ओ and म्).

Add support for varnas in `analyse` results

Currently, the results returned by the analyse function contain only the aksharas from the input text. Add support for varnas also. Later, remove varnasLength from results since varnas itself can be used to find the length.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.