Coder Social home page Coder Social logo

argosopentech / argos-translate Goto Github PK

View Code? Open in Web Editor NEW
3.2K 51.0 238.0 3.6 MB

Open-source offline translation library written in Python

Home Page: https://www.argosopentech.com

License: MIT License

Python 93.22% Shell 6.78%
python machine-translation transformers translation language-models linux nlp open-source

argos-translate's Introduction

Argos Translate

Demo | Website | Docs | Forum | GitHub | PyPI

Open-source offline translation library written in Python

Argos Translate uses OpenNMT for translations and can be used as either a Python library, command-line, or GUI application. Argos Translate supports installing language model packages which are zip archives with a ".argosmodel" extension containing the data needed for translation. LibreTranslate is an API and web-app built on top of Argos Translate.

Argos Translate also manages automatically pivoting through intermediate languages to translate between languages that don't have a direct translation between them installed. For example, if you have a es → en and en → fr translation installed you are able to translate from es → fr as if you had that translation installed. This allows for translating between a wide variety of languages at the cost of some loss of translation quality.

Supported languages

Arabic, Azerbaijani, Catalan, Chinese, Czech, Danish, Dutch, English, Esperanto, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Persian, Polish, Portuguese, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian, and more

Request a language

Installation

Install with Python

Argos Translate is available from PyPI and can be easily installed or updated with pip.

pip install argostranslate

Install GUI:

pip install argostranslategui

Installation for macOS

  1. Download the latest macOS release.
  2. Extract the archive.
  3. Copy the .app file to the Applications directory.

Python source installation into virtualenv

Download a copy of this repo and install with pip.

git clone https://github.com/argosopentech/argos-translate.git
cd argos-translate
virtualenv env
source env/bin/activate
pip install -e .

Examples

import argostranslate.package
import argostranslate.translate

from_code = "en"
to_code = "es"

# Download and install Argos Translate package
argostranslate.package.update_package_index()
available_packages = argostranslate.package.get_available_packages()
package_to_install = next(
    filter(
        lambda x: x.from_code == from_code and x.to_code == to_code, available_packages
    )
)
argostranslate.package.install_from_path(package_to_install.download())

# Translate
translatedText = argostranslate.translate.translate("Hello World", from_code, to_code)
print(translatedText)
# '¡Hola Mundo!'

Command Line Interface

argospm update
argospm install translate-en_de
argos-translate --from en --to de "Hello World!"
# Hallo Welt!

Install all translation packages:

argospm install translate

Web App Screenshot

const res = await fetch("https://translate.argosopentech.com/translate", {
	method: "POST",
	body: JSON.stringify({
		q: "Hello!",
		source: "en",
		target: "es"
	}),
	headers: {
		"Content-Type": "application/json"}
	});

console.log(await res.json());

{
    "translatedText": "¡Hola!"
}

Graphical user interface

The GUI code is in a separate repository.

Screenshot Screenshot2 Argos Translate macOS Screenshot

GPU Acceleration

To enable GPU support, you need to set the ARGOS_DEVICE_TYPE env variable to cuda or auto.

$ ARGOS_DEVICE_TYPE=cuda argos-translate --from-lang en --to-lang es "Hello World"
Hola Mundo

The above env variable passes the device type to CTranslate2.

HTML Translation

The translate-html library is built on top of Argos Translate and Beautiful Soup and parses and translates HTML. The LibreTranslate API also has support for translating HTML.

Files Translation

The argos-translate-files library is built on top of Argos Translate and parses and translates files. The LibreTranslate API also has support for translating files.

Uninstall

pip uninstall argostranslate

You may choose to also delete temporary and cached files:

rm -r ~/.local/cache/argos-translate
rm -r ~/.local/share/argos-translate

Related Projects

Contributing

Contributions are welcome! Available issues are on the GitHub issues page. Contributions of code, data, and pre-trained models can all be accepted.

Support

For support please use the LibreTranslate Forum or GitHub Issues.

For questions about CTranslate2 or general machine translation research the OpenNMT Forum is a good resource.

Services

Custom models trained on your own data are available for $1000/language (negotiable).

I am also available for hire to do support, consulting, or custom software development.

Donate

If you find this software useful donations are appreciated.

  • GitHub Sponsor
  • PayPal
  • Bitcoin: 16UJrmSEGojFPaqjTGpuSMNhNRSsnspFJT
  • Ethereum: argosopentech.eth
  • Litecoin: MCwu7RRWeCRJdsv2bXGj2nnL1xYxDBvwW5
  • BCH: bitcoincash:qzvpxe8y5kq45kahqkyv3p88sjrhlymj2v6xdrj3cv

Paid supporters receive priority support.

Hosting affiliate links

You can help support Argos Translate financially by purchasing hosting through these referral links:

Argos Translate 2 beta

A beta version of Argos Translate 2 is available to install from source from the v2 branch on GitHub. Argos Translate 2 has a multilingual model architecture, more extensive unit testing, and a more experimental orientation.

Contributing

Contributions are welcome! Bug reports, pull requests, documentation writing, and feature ideas are all appreciated.

License

Argos Translate is dual licensed under either the MIT License or Creative Commons CC0.

argos-translate's People

Contributors

aabur avatar aleufroy avatar andrewkdinh avatar andriyor avatar argosopentech avatar dingedi avatar ederin avatar guillaumekln avatar hollorol avatar jakeroggenbuck avatar jonmagon avatar jorgesumle avatar kolserdav avatar mikemoritz avatar milahu avatar mmacu avatar mmokhi avatar mwip avatar pierotofy avatar pirhoo avatar pj-finlay avatar rushter avatar technologyclassroom avatar tpcgold avatar vuizur avatar yogeshwaran01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

argos-translate's Issues

Weird english -> japanese translations (bad training data?)

I'm using argos-translate via libretranslate, so if this is the wrong place for this, I'll move it.

I'm testing out the english -> japanese translations and I think some bad data might have gotten into the training data.

"Hello" is being translated as "お問い合わせ" which translates to "Contact Us" (something you'd expect to see at the bottom of a webpage used for training?)

"Goodbye" is being translated as "フィードバック" (feedback). Again, something you'd expect to see at the bottom of a webpage).

"Help me!" is also being translated as "お問い合わせ".

Not exactly sure how I help, but I figured I'd point out the issue.

OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already intialized

Hi,
Thanks for the amazing project. This was a fresh install on a new python 3.8.3 virtual environment using pip install and launching straight away. The web app launches, but after a few key strokes, the process crashes with the following error logged to the terminal:
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. Abort trap: 6

Any advice appreciated here, thanks again.

Auto Code Formatting

Ideally there would be some sort of auto code formatting and linting. Related to this there is currently an issue with some of the documentation being formatted:
image

The goal is to comply with PEP 8 and PEP 257 to the extent possible.

ERROR: Can't pickle ctranslate2.translator.Translator objects

Hi,

Thank you for the great project!

I am using argos translate in my project. I have created a customized sklearn transformer where I call the argos models for translation. My customized transformer is part of sklearn pipeline. However, when I set the pipeline hyperparameter n_jobs to a value higher than 1, I receive the error:

TypeError: can't pickle ctranslate2.translator.Translator objects

Any ideas/advice how I can solve this issue? Are you planning to make ctranslate2 objects picklable?

Thanks again!

PyQt signals logic in GUI

Fix packages_changed = pyqtSignal() in gui.py to correctly update all views when the state of packages has changed.

Port to more platforms

The easiest are probably MacOS and Windows using py2app and py2exe but other platforms to consider could be mobile, BSD, Debian, Red Hat, FlatPak, or BSD. I'd like to be able to run builds on Linux as much as possible but this may not be possible for some platforms.

There's also a decision to be made if we want to use tools like py2app/py2exe or go all in on pyqtdeploy.

There are probably some challenges for doing local translation on mobile so a better strategy may be to build/port simple mobile apps that connect to the LibreTranslate API.

Cursor not themed in snap

Using the snap, the cursor does not follow the system theme. It's more obvious if you change the cursor theme to something that looks different, like "redglass", but even the Ubuntu default Yaru theme and size is not followed. One Qt app snap that this does work in is KeePassXC. It looks like their snapcraft.yaml has some additions plugs for theming.

Support Language Detection

The plan for this was to train a model using the existing infrastructure that maps from input text to a language code. This would require adding a way to generate this data in the training scripts and what is hopefully a pretty small code change to support this. I'd be pretty optimistic about this just working pretty well out of the box but it may take some tweaking.

PIP shows conflicting dependencies

$ pip install argostranslate
Collecting argostranslate
  Using cached argostranslate-1.0.5-py3-none-any.whl (13 kB)
  Using cached argostranslate-1.0.3-py3-none-any.whl (12 kB)
  Using cached argostranslate-1.0-py3-none-any.whl (12 kB)
ERROR: Cannot install argostranslate==1.0, argostranslate==1.0.3 and argostranslate==1.0.5 because these package versions have conflicting dependencies.

The conflict is caused by:
    argostranslate 1.0.5 depends on ctranslate2==1.14.0
    argostranslate 1.0.3 depends on ctranslate2==1.14.0
    argostranslate 1.0 depends on ctranslate2

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

Configuration/environment:

--------------------------------------------------------------------------------
  Date: Wed Dec 23 15:59:00 2020 CET

                OS : Darwin
            CPU(s) : 12
           Machine : x86_64
      Architecture : 64bit
       Environment : Python

  Python 3.7.7 (default, Mar 23 2020, 17:31:31)  [Clang 4.0.1
  (tags/RELEASE_401/final)]

             numpy : 1.19.4
           IPython : 7.19.0
            scooby : 0.5.6
--------------------------------------------------------------------------------
pip 20.3.3

Failed to install with pip

Trying to install argostranslate on Debian unstable with pip I get the error message:

Package sentencepiece was not found in the pkg-config search path.
Perhaps you should add the directory containing `sentencepiece.pc'
to the PKG_CONFIG_PATH environment variable
No package 'sentencepiece' found
Failed to find sentencepiece pkgconfig

Is there anything I can do?

Improve Training scripts

The training scripts have lots of room for improvement. The long term plan is to rewrite them in OpenNMT for PyTorch in a fully automatable way but there are other potential improvements:

  • Auto download data from the Opus Parallel Corpus
  • Auto stop training after a set number of epochs
  • Cleaner implementation/better docs

List license for language models.

I see that this repository is licensed under the MIT license, but the language training models are hosted outside of this repository that can be downloaded with HTTPS, IPFS or torrent.

Does the same MIT license apply to the models as well, or are they distributed under a different license?

This should probably be listed somewhere.

Localization

Currently there is no custom localization but this would be nice to have. Qt provides some nice tools for doing this and the apps strings could be translated using the app itself.

Command-line Usage

Would it be possible to support command-line usage? I searched the documentation but found nothing. I would like to automate translating texts and also text files into multiple languages.

As an example I suggest the following:
argos-translate -text "Hello World!" -from en -to de
argos-translate -file Novel.txt -from en -to de

Add Tests

We currently don't have any tests, but it would be nice to. Not being able to include a .argostranslate file in the tests easily will make this more difficult but at least having some tests would be good.

Question about non-deterministic results

While investigating using argos-translate as a library, I have noticed non-deterministic results when translating a short test string "Hello world!" using your pre-trained models. For English -> Russian, it returns "Здравствуй мир!" on some hosts, and "Здравствуй!" on others. The results on a given host are deterministic on repeated runs and environments (at least in my testing so far).

I first tried to follow the advice here thinking it could be a random seed issue to no avail:
OpenNMT/OpenNMT-py#392
pytorch/pytorch#7068 (comment)

I was not able to determine any significant differences between hosts (both running on cpu), and the output of ct2_verbose is identical:

[ct2_verbose] CPU: GenuineIntel (SSE4.1=true, AVX=true, AVX2=true)
[ct2_verbose] Selected CPU ISA: AVX2
[ct2_verbose] Use Intel MKL: true
[ct2_verbose] SGEMM CPU backend: MKL
[ct2_verbose] GEMM_S16 CPU backend: MKL
[ct2_verbose] GEMM_S8 CPU backend: MKL (u8s8 preferred: true)
[ct2_verbose] Use packed GEMM: false

Manually setting num_hypotheses=2 in the ctranslate2 Translator shows that it appears to be a score difference:

Host #1:
	{'tokens': ['▁З', 'д', 'рав', 'ству', 'й', '!'], 'score': -2.7840166091918945}
	{'tokens': ['▁З', 'д', 'рав', 'ству', 'й', '▁мир', '!'], 'score': -2.841048240661621}

Host #2:
	{'tokens': ['▁З', 'д', 'рав', 'ству', 'й', '▁мир', '!'], 'score': -2.7670412063598633}
	{'tokens': ['▁З', 'д', 'рав', 'ству', 'й', '!'], 'score': -2.7944717407226562}

Setting beam_size=1 so it uses greedy search did produce the same result on both hosts, but I don't think that is a valid solution.

I created a gist to provide some debugging output, and didn't notice any difference in the actual argos-translate parsing logic, so it seems to be much deeper: https://gist.github.com/mikemoritz/a5bf76193ccb16d018a1af9ec584fb41

My questions are:

  • Are there other options you would recommend setting to increase the likelihood of deterministic results? If so, could these be surfaced as options within argos-translate?
  • Is it possible that "Hello world!" is a bad test string? If so, do you have any recommendations?
  • Do you think it could still be a random seed issue that may need to be implemented within argos-translate?
  • Is there additional debugging within ctranslate2 and/or torch that you would recommend to highlight differences between the hosts?

Thanks!

Filter HTML entities in training scripts data

Hello, I've noticed a bug when translating something to French.
Sometime, there is the HTML entity ' appearing instead of the apostrophe.
Some examples:
argos-translate1
argos-translate2
argos-translate5

Though it doesn't happen in these cases
argos-translate4
argos-translate3

Importing argostranslate can fail on snap package dirs

This is probably most a test environment issue, but this can happen:

../../../venv/lib/python3.8/site-packages/argostranslate/package.py:6: in <module>
    from argostranslate import settings
../../../venv/lib/python3.8/site-packages/argostranslate/settings.py:17: in <module>
    for package_dir in content_snap_packages.iterdir():
/home/mike/.pyenv/versions/3.8.6/lib/python3.8/pathlib.py:1121: in iterdir
    for name in self._accessor.listdir(self):
E   FileNotFoundError: [Errno 2] No such file or directory: '/snap/pycharm-community/223/snap_custom/content_snap_packages'

PR: #19

Support for emojis in text translation

Given is the English text: "Well done 👍"

The text itself gets translated perfectly in any language. However, depending on the target language the emoji is translated to "" or "?" or "Benachrichtigung" (in German).

Would it be possible to detect the emoji and leave that character as it is?
Hint: in Unicode 13.0 there are 4 character ranges allocated for emojis: U+1F300 (127744) to U+1FAD6 (129750), 126980 to 127569, 169 to 174 and 8205 to 12953

arm64 support (Librem 5 phone etc.)

This won't install on a Librem phone...

$snap install argos-translate
error: snap "argos-translate" is not available on stable for this architecture (arm64) but exists on other architectures (amd64)

This seems a very handy library and why does it not run on arm computers?

Emoji translations

Using emojis within texts at best gets dropped, and in some cases changes translations to something else.

I know this is a training matter...
But it came to my mind (after some testing and trial-error), that maybe by using something like .encode("unicode_escape")* we could let them stay the same (as it often will, so far that I tested) and then afterwards we get it decoded back...

Basically, since we never have to "translate" those characters, I'm thinking maybe we could filter/keep them...

*P.S. not exactly this encode statement, but to be figured out 😅

Better model distribution

Currently models are distributed by Google Drive (not ideal) and a slow BitTorrent, so there's lots of room for improvement:

  • More Torrent seeders
  • Create individual torrent files for each model
  • HTTP or FTP mirrors
  • I avoided git distribution because I was worried about running into GitHub limits but we may want to link to the LibreTranslate Git Mirror
  • Open to other ideas too

The plan was to make a separate repo for storing model distribution information so let me know if your interested.

Close in system tray

Please allow for Argos Translate to close into system tray rather than take up room in the panel.

Decrease Snapcraft distribution size

Currently the Snapcraft image is ~1GB, ~700MB of this is a torch cuda shared object file. If this could be removed automatically in the Snapcraft build process somehow (or maybe on option for all python installs?) then the download and startup time for Snapcraft would greatly improve (these are currently both issues).

Do Screenshots on a Qt based Distro

Screenshots promoted by argos are token from a GNOME display environment. Gnome itself has not the best integration with Qt. Gnome uses Gtk.

Maybe take these screenshots on a display environment with a better Qt integration like KDE Plasma.

GPU Support

When I first wrote this CTranslate, which does inference, didn't support GPU translation from PyPI. This has since changed and this would be a nice feature to have. All this may take is updating the CTranslate version in requirements.txt and adding documentation but if someone with more CUDA knowledge could look into this I would appreciate it. Also it would be nice to support open-source alternatives to CUDA.

Argos Translate also prints an error message about torch not being able to connect to CUDA:

/usr/local/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

Torch is only used by Stanza which does sentence boundary detection so it supporting GPU inference isn't as important as CTranslate supporting GPUs for performance but this error message should be supressed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.