Comments (20)
If the above is the case, it demonstrates a weakness of the engine itself: it shouldn't be the presentation layer responsibility to determine the type of the content being input, the user should just be able to input whatever they and and the engine itself infer that. There will be use cases where this won't be possible, like SOA applications for example. Unfortunately this is a bad design decision that can lead to even more issues
from firefox-translations.
Thanks @abhi-agg!
from firefox-translations.
@jelmervdl Maybe this is related to #35, where I'm seeing multiple exceptions in the engine but can't reproduce on different languages?
from firefox-translations.
@andrenatal it could be very well related. There are paragraphs with literal <
and >
in them. Those paragraphs would cause the html parsing bit in bergamot-translator to throw (but because of how emscripten is configured… abort()?) since it tries to parse that as an open tag and fail.
from firefox-translations.
The extension will be called upon to translate text/plain
and text/HTML
documents, so the extension will need to toggle HTML on/off anyway, using this:
https://github.com/browsermt/bergamot-translator/blob/c0f311a8c067372057a6f301c42b40bbe30a9c1a/src/translator/response_options.h#L22
(The TODO comment is outdated and will be fixed.)
from firefox-translations.
@abhi-agg could you please take a look at this?
from firefox-translations.
@andrenatal I am on it.
from firefox-translations.
The extension will be called upon to translate
text/plain
andtext/HTML
documents, so the extension will need to toggle HTML on/off anyway
@kpu By documents
, do you literally mean the user will upload documents to translate or do you only mean web pages?
from firefox-translations.
The user could browse to a web page with a text/plain content type. The browser displays the text, which may include stray < and > that should not be interpreted as HTML.
from firefox-translations.
As in, a user with a German Firefox UI visits https://neural.mt/test.txt . The browser offers to translate it to German (or at least it should, but that's not the point here). The text should be sent to the engine with HTML off.
from firefox-translations.
Did some quick tests via wasm test page:
- Assuming right model config was set and response options are set to do html translation
Hello < world.
returned following for various models:- De:
Hallo Welt.
- Es:
Hola, mundo.
- Cz:
Ahoj, svět.
- Et:
Tere < maailm.
(Seems like<
is gone in De, Es and Cz language)
- De:
Hello < world
causes engine to assert (because given string is a bad-formed html)
- Non-html translation for
Hello < world. Hello < world.
returned follwing:- De:
Hello < Welt. Hallo Welt.
- Es:
Hola < mundo. Hola, mundo.
- Cz:
Hello < world. Ahoj, svět.
- Et:
Tere < maailm. Tere < maailm.
- De:
from firefox-translations.
That looks consistent with expectations. In HTML mode the processing looks like:
- Input
Hello < world.
- HTML maps to internal text
Hello < world.
- Internal Estonian target
Tere < maailm.
(Which I note is what happened in text mode.) - HTML escapes to
Tere < maailm.
from firefox-translations.
Wasn't this fixed by: #53?
Can we close it?
from firefox-translations.
There's a few things going on:
alignment: soft
was not passed at load time so HTML didn't know what to do with word alignments, resulting in poor HTML placement quality. This was mentioned as part of the initial issue and is now resolved as #53.- The extension is currently extracting text and sending it to the engine with HTML on. This means a stray
<
in the HTML is converted by the extension to<
which then confuses HTML mode in the engine. This was also mentioned as part of the issue and remains open to my knowledge. We believe this is the cause of the crash in #35. - The extension has no means to toggle the HTML flag. This will be required at some point since users may browse to
text/plain
andtext/html
pages. I have suggested doing this as part of the fix of point 2 above. My understanding is @abhi-agg has a draft version of a fix; he posted test output above. - The extension should use HTML mode instead of sending text, which is #52.
So I think this issue could be retitled as "text is being sent to HTML mode"
from firefox-translations.
What happens in the case of a legit instance of <
existing in the page's text being sent to the engine?
from firefox-translations.
If you have HTML mode on and provide ill-formed HTML, it will throw an informative exception to the request and the engine will be fine (to take more requests). However, the -fno-exceptions
limitation of WASM maps this exception to process death. In Thursday's plenary, I raised the possibility of an API change to allow returning an error code, as the current API does not have the means to express errors by anything other than exceptions. See browsermt/bergamot-translator#316 .
HTML mode was originally scoped to be used with Firefox's innerHTML where it's not possible to pass erroneous HTML.
We could of course improvise something to just skip the character or treat it as if it were <
but this seemed like it would only cause trouble of being inconsistent with Firefox's interpretation of broken HTML.
from firefox-translations.
Note that you can specify whether to use HTML mode every time you call translate()
. For translating innerHTML
it should be on, but for translating textContent
or the outbound form inputs it should not be on.
Right now the extension does some batching, calling translate()
with a VectorString
with multiple input strings, but only one ResponseOptions
object which specifies whether they are all HTML or not. I.e.
const responseOptions = {qualityScores: true, alignment: true, html: true};
let input = new this.WasmEngineModule.VectorString();
messages.forEach(message => {
input.push_back(message.sourceParagraph);
});
let result = this.translationService.translate(translationModel, input, responseOptions);
We can alter the emscripten bindings of bergamot-translator a bit and accept a ResponseOptions object per input string, if that makes the extension's code easier. Something like:
const responseOptions = {qualityScores: true, alignment: true};
let input = new this.WasmEngineModule.VectorSomethingSomething();
messages.forEach(message => {
input.push_back(message.sourceParagraph, {...responseOptions, html: message.isHTML});
});
let result = this.translationService.translate(translationModel, input);
We could of course improvise something to just skip the character or treat it as if it were < but this seemed like it would only cause trouble of being inconsistent with Firefox's interpretation of broken HTML.
Trying to parse HTML with a fallback sounds riksy. For example, if someone were to type When output<input and checked>unchecked
into their textarea, that will be interpreted as valid HTML (and input and checked
would go untranslated)
from firefox-translations.
Based on my conversation with Abhi, we'll need the InPageTranslation.js parser to always determine if the string being passed to the engine contains plain or html text and set the proper flag on ResponseOptions. Is that right @abhi-agg ?
from firefox-translations.
All you need to do to fix this issue for the time being is turn the HTML flag to off. This is a one line change. When you have HTML translation again, turn it on.
The HTML feature is designed to operate on snippets of HTML. It does not make sense to expect automatic content identification from text vs snippets of HTML because that would introduce bugs in text translation where words are mysteriously not translated inside angle brackets.
Does Firefox render text/plain content containing some tags as HTML?
from firefox-translations.
All you need to do to fix this issue for the time being is turn the HTML flag to off. This is a one line change. When you have HTML translation again, turn it on.
Submitted the PR that should close this issue.
If the above is the case, it demonstrates a weakness of the engine itself: it shouldn't be the presentation layer responsibility to determine the type of the content being input, the user should just be able to input whatever they and and the engine itself infer that. There will be use cases where this won't be possible, like SOA applications for example. Unfortunately this is a bad design decision that can lead to even more issues
This leaves ⬆️ up for discussion.
from firefox-translations.
Related Issues (20)
- Wrong EN - DE translation / Missing instant feedback possibility HOT 1
- User preferences are not respected HOT 1
- A good model to feed English to Japanese translation to HOT 7
- Wrong translation (repetition) HOT 1
- Suggestion: Ability to hide toolbar after translation is completed HOT 1
- Japanese language support HOT 1
- Error when highlighting in red is enabled HOT 1
- [Language Request] Add Romanian Translation HOT 1
- Video is not accessible, actual documentation would be better HOT 1
- "An error occurred while loading the translation engine" with Firefox and Linux HOT 2
- Request: set <html lang="..."> attribute HOT 1
- REQUEST: Add an option to restore the Translation Banner (if user has accidentally clicked "Never offer translations") HOT 3
- Add ability to choose destination language other than browser language HOT 3
- Use dictionary lookup for single words HOT 1
- Translate button is not visible in almost all Light themes HOT 8
- Typos in translation russian->english HOT 6
- Please don't file issues, bugs and feature requests are now reported on Bugzilla (https://bugzilla.mozilla.org/) and language requests on Mozilla Connect (https://connect.mozilla.org/) HOT 2
- Japanese language translation support HOT 1
- From external webextension HOT 3
- Different behaviours between Firefox 115 esr w/ translation extension and Firefox 117 beta with support built-in HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from firefox-translations.