Coder Social home page Coder Social logo

arpitshahi227 / govarnam Goto Github PK

View Code? Open in Web Editor NEW

This project forked from varnamproject/govarnam

0.0 0.0 0.0 388 KB

Easily Type Indian Languages on computer and mobile. GoVarnam is a cross-platform transliteration library. Manglish -> Malayalam, Thanglish -> Tamil, Hinglish -> Hindi plus another 10 languages. GoVarnam is a near-Go port of libvarnam

Home Page: https://varnamproject.github.io

License: Other

Makefile 1.25% C 6.33% Go 88.91% Shell 3.51%

govarnam's Introduction

Varnam

Varnam is an Indian language transliteration library. GoVarnam is a Go port of libvarnam with some core architectural changes. Not every part of libvarnam is ported.

It is stable to use daily as an input method. See it in action here: https://varnamproject.github.io/editor/

An Input Method Engine for Linux operating systems via IBus is available here: https://github.com/varnamproject/govarnam-ibus

Installation

See instructions in website: https://varnamproject.github.io/download/


Usage

Test it out:

varnamcli -s ml namaskaaram

Learn a word:

varnamcli -s ml -learn കുന്നംകുളം

Train a word with a particular pattern:

varnamcli -s ml -train college കോളേജ്

Learning Words From A File

You can import all language words from any text file. Varnam will separate english words and non-english words and learn accordingly.

varnamcli -s ml -learn-from-file file.html

You can download news articles or Wikipedia pages in HTML format to learn words from them.

Export Learnings

You can export your local learnings with:

varnamcli -s ml -export my-words

The file extension will be .vlf [Varnam Learnings File]

Import Learnings

You can import learnings from a .vlf :

varnamcli -s ml -import my-words-1.vlf

Development

Build

This repository have 3 things :

  1. GoVarnam library
  2. GoVarnam Command Line Utility (CLI)
  3. Go bindings for GoVarnam

GoVarnam is written in Go, but to be a standard library that can be used with any other programming languages, we compile it to a C library. This is done by :

go build -buildmode "c-shared" -o libgovarnam.so

(Shortcut to doing above is make library)

The output libgovarnam.so is a shared library that can be dynamically linked in any other programming languages. Some examples :

  • Go bindings for GoVarnam: See govarnamgo folder in this repo
  • Java bindings for GoVarnam: IN PROGRESS

Wait, it means we need to write another Go file to interface with GoVarnam library ! This is because we're interfacing with a shared library and not the Go library.

Files & Folders

  • govarnam - The library files
  • main.go, c-shared* - Files that help in making the govarnam a C shared library
  • govarnamgo - Go bindings for the library. For use with other Go projects
  • cli - A CLI tool for varnam. Uses govarnamgo to interface with the library.
  • symbol-frequency-calculator - For populating the weight column in VST files

CLI (Command Line Utility)

The command line utility (CLI) is written in Go, uses govarnamgo to interface with the library.

You need to separately build the CLI:

cd cli

# Show the path to libgovarnam.so
export LD_LIBRARY_PATH=$(realpath ../):$LD_LIBRARY_PATH

go build -o varnamcli .

Hacking

This section is straight on getting your hands in. Explanation of how GoVarnam works is at the bottom.

  • Clone of course
  • Do go get
  • You will need a .vst file. Get it from schemes folder in a release. Paste it in schemes folder
  • Do make library to compile

When you make changes to govarnam source code, you will need to do make library for the changes to build on and then test with CLI.

You can run tests (to make sure nothing broke) with :

make test

GoVarnam BTS

Read GoVarnam Spec: https://docs.google.com/document/d/1l5cZAkly_-kl7UkfeGmObSam-niWCJo4wq-OvAEaDvQ/edit?usp=sharing

Changes from libvarnam

  • ml.vst has been changed to add a new weight column in symbols table. Get the new ml.vst here. The symbol with the least weight has more significance. This is calculated according to popularity from corpus. You can populate a ml.vst with weight values by a Python script. See that in the subfolder. The previous ruby script is used for making the VST. That is the same. ml.vst from libvarnam is incompatible with govarnam.

  • patterns_content is renamed to patterns in GoVarnam

  • patterns table in learnings DB won't store malayalam patterns. Instead, for each input, all possible malayalam words are calculated (from symbols VARNAM_MATCH_ALL) and searched in words. These are returned as suggestions. Previously, pattern would store every pattern to a word. english => malayalam.

  • patterns in govarnam is used solely for English words. Computer => കമ്പ്യൂട്ടർ. These English words won't work out with our VST tokenizer cause the words are not really transliterable in our language. It would be kambyoottar => Computer

Miscellaneous

To build without SQLite :

go build -tags libsqlite3 -buildmode=c-shared -o libgovarnam.so

Release Process

  • git tag
  • make build release

Pack ibus engine:

  • make build-ubuntu18 release

govarnam's People

Contributors

subins2000 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.