microsoft / inmt Goto Github PK

Interactive Neural Machine Translation tool

Home Page: https://microsoft.github.io/inmt/

License: MIT License

Jupyter Notebook 61.51% Dockerfile 0.01% Python 17.26% CSS 2.76% JavaScript 8.03% HTML 4.03% Makefile 0.01% TeX 0.58% Shell 0.58% Perl 0.64% Smalltalk 0.04% Emacs Lisp 0.34% NewLisp 0.03% Ruby 0.03% Slash 0.01% SystemVerilog 0.01% SCSS 4.13%

machine-translation interactive-machine-learning

inmt's People

Stargazers

Watchers

Forkers

manikant92 amirstudy durgaprasd ngolimi smithnlp anuragshukla06 translatorswb taffywrinkle claudiusgonzo mohdsanadzakirizvi surafelml jahanvinshah global-localhost global19 global19-atlassian-net sebastinsanty meghatiya

inmt's Issues

Decouple translation engine configurations

Currently, the translation engine configurations are ingrained in the code, decouple the configurations to a separate json file and include in the .gitignore file.

OpenNMT as a git submodule to this repository

Currently, we use a non-versioned copy of OpenNMT which becomes difficult to update when there is an update upstream. Figure out a way to update OpenNMT while keeping our wrappers intact.

Restructuring the results

Currently, the results from the API is in this form:

{
  "result": "Today 's weather is beautiful\nToday is \nToday 's \nThis day \nThe weather \nToday , ",
  "attn": [
    1,
    1,
    1,
    1
  ],
  "partial": "",
  "ppl": 3.230648083472593,
  "avg": -1.172682762145996
}

While this works for our case, it would be great to restructure it:

"result" can be two lists - "full_sentence": [f1] and "part_sentence: [p1, p2, p3, ...]. The length of each list is controlled by #22
"attn" can be restructured so that it makes sense independently. Maybe add the source sentence as well in the output?
"ppl" and "avg" can be moved into a "metrics" section as both represent a way to measure the output.

Record keystrokes for different instances

Currently, recording of keystrokes get replaced every time the translator opens the translation interface.

TODOs:

Add mechanism to append keystrokes rather than replacing them.
Record interaction keystrokes - probably use xpath to record these.

Fix BPE pre/post processing

This includes:

Source Side BPE tokens - This includes applying BPE code while translating.
Target Side BPE tokens - This includes joining of suggestions.

Please update a new download model address, thanks.

Look at the question of the last person, do not know who to find the new model download address.

Add option for number of suggestions

Currently, the maximum number of suggestions in the drop-down box is 5. Can we add that as a parameter to the API?

secret in file

this secret needs to be removed and rotated ASAP.

inmt/InteractiveTranslation/settings.py

Lines 23 to 24 in c8074bd

    
           # SECURITY WARNING: keep the secret key used in production secret! 
        
           SECRET_KEY = '^@f24c3b-j&1z8l9^&8ut&bx7%2oz+hj%vt28g1k5dpre#t$5t'

Fix text pointer interaction

There are some issues with the text pointer interactions. The common ones observed are:

If a translation box is left blank, the pointer behavior on preview tab is buggy.
After clicking the preview and coming back, the pointer behavior is buggy.
Reverse pointer does not work in the preview tab.

Implement Translation Memory (TM)

Currently, every time a translation is needed, the translation API is called and the request is processed for that source and partial input. Implementing Translation Memory would mean that we can use translation suggestions which were previously done by the translators.

Should the TM be user specific or global? It can probably be a combination of both - maybe batch and compare the individual TM and add that to global if it is commonly occurring.

Should the TM suggestion be shown with a different color explicitly to specify it comes from the TM whereas the others come from NMT?

FileNotFoundError when running "python manage.py"

Fix Transliteration Helper Interaction

Currently, the transliteration helping mechanism takes the english input and then retrieves the appropriate transliteration and shows to the user. However, there are interaction flaws with this:

Showing up of English characters in the translation box, which overlaps with the hindi suggestion.

Issues for users of Language Keyboards as even with that input, the transliterator calls the API. This can be fixed by limiting call to transliteration API only if the input unicode lies in the range of English (Latin) script.

	# SECURITY WARNING: keep the secret key used in production secret!
	SECRET_KEY = '^@f24c3b-j&1z8l9^&8ut&bx7%2oz+hj%vt28g1k5dpre#t$5t'

microsoft / inmt Goto Github PK

inmt's People

Stargazers

Watchers

Forkers

inmt's Issues

Recommend Projects

Recommend Topics

Recommend Org