Coder Social home page Coder Social logo

awdavis / rtt-linguiflow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wycliffe-usa/rtt-linguiflow

0.0 1.0 0.0 9.58 MB

License: GNU General Public License v2.0

Shell 31.70% HTML 5.02% Ruby 6.89% Python 15.27% Perl 40.96% Emacs Lisp 0.16%

rtt-linguiflow's Introduction

Build Status:

Master

Build Status

Development

Build Status

Wycliffe@Urbana15: #Hacktranslation

We aim to bring the power of natural language processing (NLP) to bear on the work of Bible translation.

The problem

Bible translation is a task that takes many years of dedicated, hard work from a large number of people. One of the major tasks, accounting for a large portion of the time required in translation, is gathering properly defined vocabulary and identifying parts of speech. An existing process called Rapid Word Collection does an excellent job of collecting a large number of words in a very short period of time. However, the process is almost entirely manual and dependent on human intervention. We would like to develop processes leveraging NLP tools (or even other things. We aren't picky!) that would allow us to 'discover' more words in the target language in a shorter amount of time, using fewer resources.

Many of these languages are poor in written literature, so large sets of words are hard to come by. Some are still largely unwritten. Almost all are in areas where internet connectivity is scarce, slow, or unreliable. The translation activity is happening in remote areas where power is also unreliable and shipping equipment to the site is a gamble.

Tools at your disposal

I have provided Vagrant machines that will get you up and running with NLP tool sets and some sample data (but not in any of the languages where translation is taking place). Additionally I will provide one or more "unknown" languages in the sample data to serve as material for proof-of-concept when working to develop NLP tools against poorly documented languages.

Each of the tools can be found in one or more Vagrant machines. Get instructions on setup here.

Tools I plan on providing:

  • MGIZA: A multithreaded implementation of GIZA, a word alignment tool. This is rolled into the mgiza box.
  • Python's NLTK
  • TensorFlow, for you tryhards really smart people who understand neural networks.
  • Word2Vec using Gensim instead of the original.

Any additional tools requested I will try to provide, but the tight turnaround time means I might not be able to do it during Urbana '15. As work on this project continues, I will maintain and support these boxes to the best of my ability. Issues and pull requests are welcome and encouraged.

Setup instructions

The only thing you will have to set up on your own will be Vagrant and the Vagrant provider (a hypervisor such as VirtualBox or VMware, etc). Platform specific instructions are on Vagrant's site. Generally speaking, once Vagrant is installed, just pop open a terminal, navigate to the directory containing the Vagrantfile you need to use, and do a vagrant up. A couple minutes later, you have a working VM with a development environment built for you. Once it's up, you can vagrant ssh into it and begin work.

rtt-linguiflow's People

Contributors

bbriggs avatar ercshieh avatar stealthybox avatar waffle-iron avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.