Coder Social home page Coder Social logo

microsoft / inmt Goto Github PK

View Code? Open in Web Editor NEW
52.0 6.0 17.0 12.3 MB

Interactive Neural Machine Translation tool

Home Page: https://microsoft.github.io/inmt/

License: MIT License

Jupyter Notebook 61.51% Dockerfile 0.01% Python 17.26% CSS 2.76% JavaScript 8.03% HTML 4.03% Makefile 0.01% TeX 0.58% Shell 0.58% Perl 0.64% Smalltalk 0.04% Emacs Lisp 0.34% NewLisp 0.03% Ruby 0.03% Slash 0.01% SystemVerilog 0.01% SCSS 4.13%
machine-translation interactive-machine-learning

inmt's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

inmt's Issues

OpenNMT as a git submodule to this repository

Currently, we use a non-versioned copy of OpenNMT which becomes difficult to update when there is an update upstream. Figure out a way to update OpenNMT while keeping our wrappers intact.

Data folder?

Where can I get the parallel data to train this on?

Restructuring the results

Currently, the results from the API is in this form:

{
  "result": "Today 's weather is beautiful\nToday is \nToday 's \nThis day \nThe weather \nToday , ",
  "attn": [
    1,
    1,
    1,
    1
  ],
  "partial": "",
  "ppl": 3.230648083472593,
  "avg": -1.172682762145996
}

While this works for our case, it would be great to restructure it:

  • "result" can be two lists - "full_sentence": [f1] and "part_sentence: [p1, p2, p3, ...]. The length of each list is controlled by #22
  • "attn" can be restructured so that it makes sense independently. Maybe add the source sentence as well in the output?
  • "ppl" and "avg" can be moved into a "metrics" section as both represent a way to measure the output.

Fix text pointer interaction

There are some issues with the text pointer interactions. The common ones observed are:

  • If a translation box is left blank, the pointer behavior on preview tab is buggy.
  • After clicking the preview and coming back, the pointer behavior is buggy.
  • Reverse pointer does not work in the preview tab.

Screenshot 2020-07-11 at 9 35 36 AM

i can't open the model download link

the page show that:
This link has been disabled.
Sorry, access through this link has been removed by admin policy. Please contact the person who shared it with you.

Record keystrokes for different instances

Currently, recording of keystrokes get replaced every time the translator opens the translation interface.

TODOs:

  • Add mechanism to append keystrokes rather than replacing them.
  • Record interaction keystrokes - probably use xpath to record these.

Fix BPE pre/post processing

This includes:

  • Source Side BPE tokens - This includes applying BPE code while translating.
  • Target Side BPE tokens - This includes joining of suggestions.

Implement Translation Memory (TM)

Currently, every time a translation is needed, the translation API is called and the request is processed for that source and partial input. Implementing Translation Memory would mean that we can use translation suggestions which were previously done by the translators.

Should the TM be user specific or global? It can probably be a combination of both - maybe batch and compare the individual TM and add that to global if it is commonly occurring.

Should the TM suggestion be shown with a different color explicitly to specify it comes from the TM whereas the others come from NMT?

Allow editing sentences in between

Currently, we are performing beam search based on the inputs at the end. However, there should be a method to change inputs in between, so that the suggestion can be made based on the prefix and suffix.

Relevant Literature:

  1. Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation
  2. Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search
  3. Guided Open Vocabulary Image Captioning with Constrained Beam Search

Fix Transliteration Helper Interaction

Currently, the transliteration helping mechanism takes the english input and then retrieves the appropriate transliteration and shows to the user. However, there are interaction flaws with this:

  • Showing up of English characters in the translation box, which overlaps with the hindi suggestion.

Screenshot 2020-07-11 at 10 31 09 AM

  • Issues for users of Language Keyboards as even with that input, the transliterator calls the API. This can be fixed by limiting call to transliteration API only if the input unicode lies in the range of English (Latin) script.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.