Calculates BLEU scores for Firefox Translations models using bergamot-translator and compares them to other translation systems.
git clone https://github.com/mozilla/firefox-translations-evaluation.git
cd firefox-translations-evaluation
Use install/download-model.sh
to get Firefox Translations models or use your own ones.
Recommended memory size for Docker is 8gb.
export MODELS=<absolute path to a local directory with models>
# Specify Azure key and location if you want to add Azure Translator API for comparison
export AZURE_TRANSLATOR_KEY=<Azure translator resource API key>
# optional, specify if it's different than default 'global'
export AZURE_LOCATION=<location>
# Specify GCP credentials json path if you want to add Google Translator API for comparison
export GCP_CREDS_PATH=<absolute path to .json>
# Build and run docker container
bash start_docker.sh
On completion, your terminal should be attached to the launched container.
From inside docker container run:
python3 eval/evaluate.py --translators=bergamot,microsoft,google --pairs=all --skip-existing --models-dir=/models/models/prod --results-dir=/models/evaluation/prod
More options:
python3 eval/evaluate.py --help
install/install-bergamot-translator.sh
- clones and compiles bergamot-translator and marian (launched in docker image).
install/download-models.sh
- downloads current Mozilla production models.
- bergamot - uses compiled bergamot-translator in wasm mode
- marian - uses compiled marian
- google - users Google Translation API
- microsoft - users Azure Cognitive Services Translator API
Use --skip-existing
option to reuse already calculated scores saved as results/xx-xx/*.bleu
files.
It is useful to continue evaluation if it was interrupted
or to rebuild a full report reevaluating only selected translators.
SacreBLEU - all available datasets for a language pair are used for evaluation.
Flores - parallel evaluation dataset for 101 languages.
With option --pairs=all
, language pairs will be discovered
in the specified models folder (option --models-dir
)
and evaluation will run for all of them.
Results will be written to the specified directory (option --results-dir
).
Evaluation results for models that are used in Firefox Translation can be found in firefox-translations-models/evaluation