Coder Social home page Coder Social logo

languages.jl's Introduction

Languages.jl

Build Status Languages Languages

Introduction

Languages.jl is a Julia package for working with human languages. It provides:

  • Lists of words from each language for basic categories:

    • Articles
      • Indefinite Articles
      • Definite Articles
    • Prepositions
    • Pronouns
    • Stopwords

    These methods are supported only for English and German currently.

    This package also detects the script and language for written text in a wide variety of languages.

Usage

using Languages

articles(Languages.English())
stopwords(Languages.English())

All word lists are returned as vectors of UTF-8 strings.

Script detection

Script detection model works by checking the unicode character ranges present within the input text

Languages.detect_script("To be or not to be") # => Languages.LatinScript()

Language Detection

A trigram based model is used to detect the language for the text. The model is filtered based on the detected script.

We detect 84 of the most common languages spoken around the world. This usually covers most languages with more than 10 million native speakers.

detector = LanguageDetector() detector("To be or not to be")

(Languages.English(), Languages.LatinScript(), 1.0)

The LanguageDetector model returns the language, the script, and the confidence when applied to a string.

The language and script detection code in this package is heavily inspired from the rust package whatlang-rs. That package is in turn derived from franc. See LICENSE.whatlang-rs for details.

Deprecations

The API of this package has been refurbished recently. If you have used this package earlier, please be aware of these changes.

  • The language names have been shortened. So English instead of EnglishLanguage. However, the language names are no longer exported. So they should be referred to with the package name: Languages.English
    • Every language is a type. However all functions now accept and return instances of these types, rather than the types themselves.

languages.jl's People

Contributors

aviks avatar johnmyleswhite avatar zgornel avatar femtocleaner[bot] avatar asafmanela avatar dpshelio avatar nalimilan avatar abieler avatar tanmaykm avatar

Watchers

James Cloos avatar Nick To avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.