Coder Social home page Coder Social logo

notetiene / docker-languagetool Goto Github PK

View Code? Open in Web Editor NEW

This project forked from erikvl87/docker-languagetool

0.0 0.0 0.0 97 KB

Dockerfile for LanguageTool server - configurable

Home Page: https://hub.docker.com/r/erikvl87/languagetool

License: GNU Lesser General Public License v2.1

Shell 51.63% Dockerfile 48.37%

docker-languagetool's Introduction

Build Status Tests Status Docker Pulls Latest GitHub tag

Dockerfile for LanguageTool

This repository contains a Dockerfile to create a Docker image for LanguageTool.

LanguageTool is an Open Source proofreading software for English, French, German, Polish, Russian, and more than 20 other languages. It finds many errors that a simple spell checker cannot detect.

Setup

Setup using Docker Hub

docker pull erikvl87/languagetool
docker run --rm -p 8010:8010 erikvl87/languagetool

This will pull the latest tag from Docker Hub. Optionally, specify a tag to pin onto a fixed version. These versions are derived from the official LanguageTool releases. Updates to the Dockerfile for already published versions are released with a -dockerupdate-{X} postfix in the tag (where {X} is an incremental number).

Setup using the Dockerfile

This approach could be used when you plan to make changes to the Dockerfile.

git clone https://github.com/Erikvl87/docker-languagetool.git --config core.autocrlf=input
docker build -t languagetool .
docker run --rm -it -p 8010:8010 languagetool

Configuration

Java heap size

LanguageTool will be started with a minimal heap size (-Xms) of 256m and a maximum (-Xmx) of 512m. You can overwrite these defaults by setting the environment variables Java_Xms and Java_Xmx.

An example startup configuration:

docker run --rm -it -p 8010:8010 -e Java_Xms=512m -e Java_Xmx=2g erikvl87/languagetool

LanguageTool HTTPServerConfig

You are able to use the HTTPServerConfig configuration options by prefixing the fields with langtool_ and setting them as environment variables.

An example startup configuration:

docker run --rm -it -p 8010:8010 -e langtool_pipelinePrewarming=true -e Java_Xms=1g -e Java_Xmx=2g erikvl87/languagetool

Using n-gram datasets

LanguageTool can make use of large n-gram data sets to detect errors with words that are often confused, like their and there.

Source: https://dev.languagetool.org/finding-errors-using-n-gram-data

Download the n-gram dataset(s) onto your local machine and unzip them into a local ngrams directory:

home/
├─ john/
│  ├─ ngrams/
│  │  ├─ en/
│  │  │  ├─ 1grams/
│  │  │  ├─ 2grams/
│  │  │  ├─ 3grams/
│  │  ├─ nl/
│  │  │  ├─ 1grams/
│  │  │  ├─ 2grams/
│  │  │  ├─ 3grams/

Mount the local ngrams directory to the /ngrams directory in the Docker container using the -v configuration and set the languageModel configuration to the /ngrams folder.

An example startup configuration:

docker run --rm -it -p 8010:8010 -e langtool_languageModel=/ngrams -v /home/john/ngrams:/ngrams:ro erikvl87/languagetool

Improving the spell checker

You can improve the spell checker without touching the dictionary. For single words (no spaces), you can add your words to one of these files:

  • spelling.txt: words that the spell checker will ignore and use to generate corrections if someone types a similar word
  • ignore.txt: words that the spell checker will ignore but not use to generate corrections
  • prohibited.txt: words that should be considered incorrect even though the spell checker would accept them

Source: https://dev.languagetool.org/hunspell-support

The following Dockerfile contains an example on how to add words to spelling.txt. It assumes you have your own list of words in en_spelling_additions.txt next to the Dockerfile.

FROM erikvl87/languagetool

# Improving the spell checker
# http://wiki.languagetool.org/hunspell-support
USER root
COPY en_spelling_additions.txt en_spelling_additions.txt
RUN  (echo; cat en_spelling_additions.txt) >> org/languagetool/resource/en/hunspell/spelling.txt
USER languagetool

You can build & run the custom Dockerfile with the following two commands:

docker build -t languagetool-custom .
docker run --rm -it -p 8010:8010 languagetool-custom

You can add words to other languages by changing the en language tag in the target path. Note that for some languages, e.g. for nl the spelling.txt file is not in the hunspell folder: org/languagetool/resource/nl/spelling/spelling.txt.

Docker Compose

This image can also be used with Docker Compose. An example docker-compose.yml is located at the root of this project.

Usage

By default this image is configured to listen on port 8010 which deviates from the default port of LanguageTool 8081.

An example cURL request:

curl --data "language=en-US&text=a simple test" http://localhost:8010/v2/check

Please refer to the official LanguageTool documentation for further usage instructions.

Known issues & workarounds

If you experience problems when connecting local server to the official Firefox extension, see cors-workaround.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.